Handling backreferences to capturing groups in re.sub replacement pattern

PythonRegex

Python Problem Overview


I want to take the string 0.71331, 52.25378 and return 0.71331,52.25378 - i.e. just look for a digit, a comma, a space and a digit, and strip out the space.

This is my current code:

coords = '0.71331, 52.25378'
coord_re = re.sub("(\d), (\d)", "\1,\2", coords)
print coord_re

But this gives me 0.7133,2.25378. What am I doing wrong?

Python Solutions


Solution 1 - Python

You should be using raw strings for regex, try the following:

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

With your current code, the backslashes in your replacement string are escaping the digits, so you are replacing all matches the equivalent of chr(1) + "," + chr(2):

>>> '\1,\2'
'\x01,\x02'
>>> print '\1,\2'
,
>>> print r'\1,\2'   # this is what you actually want
\1,\2

Any time you want to leave the backslash in the string, use the r prefix, or escape each backslash (\\1,\\2).

Solution 2 - Python

Python interprets the \1 as a character with ASCII value 1, and passes that to sub.

Use raw strings, in which Python doesn't interpret the \.

coord_re = re.sub(r"(\d), (\d)", r"\1,\2", coords)

This is covered right in the beginning of the re documentation, should you need more info.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRichardView Question on Stackoverflow
Solution 1 - PythonAndrew ClarkView Answer on Stackoverflow
Solution 2 - PythonPetr ViktorinView Answer on Stackoverflow