Java: Split string when an uppercase letter is found

JavaRegexString

Java Problem Overview


I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)

I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.

Please notice the first letter is not uppercase.

Java Solutions


Solution 1 - Java

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:

String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");

Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.

See javadoc for more info on Java regexp syntax.

EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:

String[] r = s.split("(?=\\p{Lu})");

Solution 2 - Java

String[] camelCaseWords = s.split("(?=[A-Z])");

Solution 3 - Java

For anyone that wonders how the Pattern is when the String to split might start with an upper case character:

String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));

gives: [This, Is, My, String]

Solution 4 - Java

Since String::split takes a regular expression you can use a look-ahead:

String[] x = "thisIsMyString".split("(?=[A-Z])");

Solution 5 - Java

Try this;

static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");

...

Solution 6 - Java

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.

(?<=.)(?=(\\p{Upper}))

TestText = Test, Text
thisIsATest = this, Is, A, Test

Solution 7 - Java

A simple scala/java suggestion that does not split at entire uppercase strings like NYC:

def splitAtMiddleUppercase(token: String): Iterator[String] = {
   val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
   regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}

test with:

val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
   println(example + " -> "  + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}

it produces:

    catch22 -> [catch22]
    iPhone -> [i, Phone]
    eReplacement -> [e, Replacement]
    TotalRecall -> [Total, Recall]
    NYC -> [NYC]
    JGHSD87 -> [JGHSD87]
    interÜber -> [inter, Über]

Modify the regex to cut at digits too.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGuidoView Question on Stackoverflow
Solution 1 - JavaaxtavtView Answer on Stackoverflow
Solution 2 - JavaBozhoView Answer on Stackoverflow
Solution 3 - JavaMulderView Answer on Stackoverflow
Solution 4 - JavaRoToRaView Answer on Stackoverflow
Solution 5 - JavaSpigolo VivoView Answer on Stackoverflow
Solution 6 - JavaThe Shoe ShinerView Answer on Stackoverflow
Solution 7 - JavaBorisView Answer on Stackoverflow