Use Python's string.replace vs re.sub

PythonRegex

Python Problem Overview


For Python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements?

In PHP, this was explicitly stated but I can't find a similar note for Python.

Python Solutions


Solution 1 - Python

As long as you can make do with str.replace(), you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

Solution 2 - Python

str.replace() should be used whenever it's possible to. It's more explicit, simpler, and faster.

In [1]: import re

In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""

In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop

In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop

Solution 3 - Python

String manipulation is usually preferable to regex when you can figure out how to adapt it. Regex is incredibly powerful, but it's usually slower, and usually harder to write, debug, and maintain.

That being said, notice the amount of "usually" in the above paragraph! It's possible (and I've seen it done) to write a zillion lines of string manipulation for something you could've done with a 20-character regex. It's also possible to waste valuable time using "efficient" string functions on tasks a good regex engine could do almost as fast. Then there's maintainability: Regex can be horribly complex, but sometimes a regex will be simpler and easier to read than a giant block of procedural code.

Regex is fantastic for its intended purpose: searching for highly-variable needles in highly-variable haystacks. Think of it as a precision torque wrench: It's the perfect tool for a specific set of jobs, but it makes a lousy hammer.

Some guidelines you should follow when you aren't sure what to use:

> - Is the pattern you're looking for highly static? For example, do you want to split a string on every comma, pipe, or tab? > - Is resource efficiency more important than developer time? What are your priorities? Remember: Hardware is cheap, programmers are > expensive. > - Are you working with HTML, XML, or other context-free grammars? Don't forget that regex has limitations. > - And my #1 rule of thumb: If you work on the problem for 5 minutes, can you rough out an idea for a non-regex approach?

If the answer to any of these questions is "yes", you probably want string manipulation. Otherwise, consider regex.

Solution 4 - Python

Another thing to consider is that if you're doing rather complex replacements, str.translate() might be what you're looking for.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionwag2639View Question on Stackoverflow
Solution 1 - PythonSven MarnachView Answer on Stackoverflow
Solution 2 - PythonchmulligView Answer on Stackoverflow
Solution 3 - PythonJustin MorganView Answer on Stackoverflow
Solution 4 - PythonjathanismView Answer on Stackoverflow