Java: Split string when an uppercase letter is found
JavaRegexStringJava Problem Overview
I think this is an easy question, but I am not able to find a simple solution (say, less than 10 lines of code :)
I have a String
such as "thisIsMyString"
and I need to convert it to a String[] {"this", "Is", "My", "String"}
.
Please notice the first letter is not uppercase.
Java Solutions
Solution 1 - Java
You may use a regexp with zero-width positive lookahead - it finds uppercase letters but doesn't include them into delimiter:
String s = "thisIsMyString";
String[] r = s.split("(?=\\p{Upper})");
Y(?=X)
matches Y
followed by X
, but doesn't include X
into match. So (?=\\p{Upper})
matches an empty sequence followed by a uppercase letter, and split
uses it as a delimiter.
See javadoc for more info on Java regexp syntax.
EDIT: By the way, it doesn't work with thisIsMyÜberString
. For non-ASCII uppercase letters you need a Unicode uppercase character class instead of POSIX one:
String[] r = s.split("(?=\\p{Lu})");
Solution 2 - Java
String[] camelCaseWords = s.split("(?=[A-Z])");
Solution 3 - Java
For anyone that wonders how the Pattern is when the String to split might start with an upper case character:
String s = "ThisIsMyString";
String[] r = s.split("(?<=.)(?=\\p{Lu})");
System.out.println(Arrays.toString(r));
gives: [This, Is, My, String]
Solution 4 - Java
Since String::split
takes a regular expression you can use a look-ahead:
String[] x = "thisIsMyString".split("(?=[A-Z])");
Solution 5 - Java
Try this;
static Pattern p = Pattern.compile("(?=\\p{Lu})");
String[] s1 = p.split("thisIsMyFirstString");
String[] s2 = p.split("thisIsMySecondString");
...
Solution 6 - Java
This regex will split on Caps, omitting the first. So it should work for camel-case and proper-case.
(?<=.)(?=(\\p{Upper}))
TestText = Test, Text
thisIsATest = this, Is, A, Test
Solution 7 - Java
A simple scala/java suggestion that does not split at entire uppercase strings like NYC:
def splitAtMiddleUppercase(token: String): Iterator[String] = {
val regex = """[\p{Lu}]*[^\p{Lu}]*""".r
regex.findAllIn(token).filter(_ != "") // did not find a way not to produce empty strings in the regex. Open to suggestions.
}
test with:
val examples = List("catch22", "iPhone", "eReplacement", "TotalRecall", "NYC", "JGHSD87", "interÜber")
for( example <- examples) {
println(example + " -> " + splitAtMiddleUppercase(example).mkString("[", ", ", "]"))
}
it produces:
catch22 -> [catch22]
iPhone -> [i, Phone]
eReplacement -> [e, Replacement]
TotalRecall -> [Total, Recall]
NYC -> [NYC]
JGHSD87 -> [JGHSD87]
interÜber -> [inter, Über]
Modify the regex to cut at digits too.