Java: Split string when an uppercase letter is found

Java Problem Overview

I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)

I have a String such as "thisIsMyString" and I need to convert it to a String[] {"this", "Is", "My", "String"}.

Please notice the first letter is not uppercase.

Java Solutions

Solution 1 - Java

You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:

String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");

Y(?=X) matches Y followed by X, but doesn't include X into match. So (?=\\p{Upper}) matches an empty sequence followed by a uppercase letter, and split uses it as a delimiter.

See javadoc for more info on Java regexp syntax.

EDIT: By the way, it doesn't work with thisIsMyÜberString. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:

String[] r = s.split("(?=\\p{Lu})");

Solution 2 - Java

String[] camelCaseWords = s.split("(?=[A-Z])");

Solution 3 - Java

For anyone that wonders how the Pattern is when the String to split might start with an upper case character:

String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));

gives: [This, Is, My, String]

Solution 4 - Java

Since String::split takes a regular expression you can use a look-ahead:

String[] x = "thisIsMyString".split("(?=[A-Z])");

Solution 5 - Java

Try this;

static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");

...

Solution 6 - Java

This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.

(?<=.)(?=(\\p{Upper}))

TestText = Test, Text
thisIsATest = this, Is, A, Test

Solution 7 - Java

A simple scala/java suggestion that does not split at entire uppercase strings like NYC:

def splitAtMiddleUppercase(token: String): Iterator[String] = {
   val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
   regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}

test with:

val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
   println(example + " -> "  + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}

it produces:

    catch22 -> [catch22]
    iPhone -> [i, Phone]
    eReplacement -> [e, Replacement]
    TotalRecall -> [Total, Recall]
    NYC -> [NYC]
    JGHSD87 -> [JGHSD87]
    interÜber -> [inter, Über]

Modify the regex to cut at digits too.

Content Type	Original Author	Original Content on Stackoverflow
Question	Guido	View Question on Stackoverflow
Solution 1 - Java	axtavt	View Answer on Stackoverflow
Solution 2 - Java	Bozho	View Answer on Stackoverflow
Solution 3 - Java	Mulder	View Answer on Stackoverflow
Solution 4 - Java	RoToRa	View Answer on Stackoverflow
Solution 5 - Java	Spigolo Vivo	View Answer on Stackoverflow
Solution 6 - Java	The Shoe Shiner	View Answer on Stackoverflow
Solution 7 - Java	Boris	View Answer on Stackoverflow