Split string based on regex

PythonRegexSplit

Python Problem Overview


What is the best way to split a string like "HELLO there HOW are YOU" by upper case words (in Python)?

So I'd end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']


EDIT:

I have tried:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

It doesn't seem to work, though.

Python Solutions


Solution 1 - Python

I suggest

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

Check this demo.

Solution 2 - Python

You could use a lookahead:

re.split(r'[ ](?=[A-Z]+\b)', input)

This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

Note that the square brackets are only for readability and could as well be omitted.

If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

re.split(r'[ ](?=[A-Z])', input)

Now this splits at every space followed by any upper-case letter.

Solution 3 - Python

Your question contains the string literal "\b[A-Z]{2,}\b", but that \b will mean backspace, because there is no r-modifier.

Try: r"\b[A-Z]{2,}\b".

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser179169View Question on Stackoverflow
Solution 1 - PythonΩmegaView Answer on Stackoverflow
Solution 2 - PythonMartin EnderView Answer on Stackoverflow
Solution 3 - Pythondruid62View Answer on Stackoverflow