Regular expression: match start or whitespace

PythonRegex

Python Problem Overview


Can a regular expression match whitespace or the start of a string?

I'm trying to replace currency the abbreviation GBP with a £ symbol. I could just match anything starting GBP, but I'd like to be a bit more conservative, and look for certain delimiters around it.

>>> import re
>>> text = u'GBP 5 Off when you spend GBP75.00'

>>> re.sub(ur'GBP([\W\d])', ur'£\g<1>', text) # matches GBP with any prefix
u'\xa3 5 Off when you spend \xa375.00'

>>> re.sub(ur'^GBP([\W\d])', ur'£\g<1>', text) # matches at start only
u'\xa3 5 Off when you spend GBP75.00'

>>> re.sub(ur'(\W)GBP([\W\d])', ur'\g<1>£\g<2>', text) # matches whitespace prefix only
u'GBP 5 Off when you spend \xa375.00'

Can I do both of the latter examples at the same time?

Python Solutions


Solution 1 - Python

Use the OR "|" operator:

>>> re.sub(r'(^|\W)GBP([\W\d])', u'\g<1>£\g<2>', text)
u'\xa3 5 Off when you spend \xa375.00'

Solution 2 - Python

\b is word boundary, which can be a white space, the beginning of a line or a non-alphanumeric symbol (\bGBP\b).

Solution 3 - Python

This replaces GBP if it's preceded by the start of a string or a word boundary (which the start of a string already is), and after GBP comes a numeric value or a word boundary:

re.sub(u'\bGBP(?=\b|\d)', u'£', text)

This removes the need for any unnecessary backreferencing by using a lookahead. Inclusive enough?

Solution 4 - Python

A left-hand whitespace boundary - a position in the string that is either a string start or right after a whitespace character - can be expressed with

(?<!\S)   # A negative lookbehind requiring no non-whitespace char immediately to the left of the current position
(?<=\s|^) # A positive lookbehind requiring a whitespace or start of string immediately to the left of the current position
(?:\s|^)  # A non-capturing group matching either a whitespace or start of string 
(\s|^)    # A capturing group matching either a whitespace or start of string

See a regex demo. Python 3 demo:

import re
rx = r'(?<!\S)GBP([\W\d])'
text = 'GBP 5 Off when you spend GBP75.00'
print( re.sub(rx, r'£\1', text) )
# => £ 5 Off when you spend £75.00

Note you may use \1 instead of \g<1> in the replacement pattern since there is no need in an unambiguous backreference when it is not followed with a digit.

BONUS: A right-hand whitespace boundary can be expressed with the following patterns:

(?!\S)   # A negative lookahead requiring no non-whitespace char immediately to the right of the current position
(?=\s|$) # A positive lookahead requiring a whitespace or end of string immediately to the right of the current position
(?:\s|$)  # A non-capturing group matching either a whitespace or end of string 
(\s|$)    # A capturing group matching either a whitespace or end of string

Solution 5 - Python

I think you're looking for '(^|\W)GBP([\W\d])'

Solution 6 - Python

Yes, why not?

re.sub(u'^\W*GBP...

matches the start of the string, 0 or more whitespaces, then GBP...

edit: Oh, I think you want alternation, use the |:

re.sub(u'(^|\W)GBP...

Solution 7 - Python

You can always trim leading and trailing whitespace from the token before you search if it's not a matching/grouping situation that requires the full line.

Solution 8 - Python

It works in Perl:

$text = 'GBP 5 off when you spend GBP75';
$text =~ s/(\W|^)GBP([\W\d])/$1\$$2/g;
printf "$text\n";

The output is:

$ 5 off when you spend $75

Note that I stipulated that the match should be global, to get all occurrences.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMatView Question on Stackoverflow
Solution 1 - PythonZach ScrivenaView Answer on Stackoverflow
Solution 2 - PythonMottiView Answer on Stackoverflow
Solution 3 - PythonMartijn LaarmanView Answer on Stackoverflow
Solution 4 - PythonWiktor StribiżewView Answer on Stackoverflow
Solution 5 - PythonChristophView Answer on Stackoverflow
Solution 6 - PythonSvanteView Answer on Stackoverflow
Solution 7 - PythonduffymoView Answer on Stackoverflow
Solution 8 - Pythonjoel.neelyView Answer on Stackoverflow