How to use a variable inside a regular expression?

PythonRegexPython 3.xVariablesEscaping

Python Problem Overview


I'd like to use a variable inside a regex, how can I do this in Python?

TEXTO = sys.argv[1]

if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
	# Successful match
else:
	# Match attempt failed

Python Solutions


Solution 1 - Python

You have to build the regex as a string:

TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"

if re.search(my_regex, subject, re.IGNORECASE):
    etc.

Note the use of re.escape so that if your text has special characters, they won't be interpreted as such.

Solution 2 - Python

From python 3.6 on you can also use Literal String Interpolation, "f-strings". In your particular case the solution would be:

if re.search(rf"\b(?=\w){TEXTO}\b(?!\w)", subject, re.IGNORECASE):
    ...do something

EDIT:

Since there have been some questions in the comment on how to deal with special characters I'd like to extend my answer:

raw strings ('r'):

One of the main concepts you have to understand when dealing with special characters in regular expressions is to distinguish between string literals and the regular expression itself. It is very well explained here:

In short:

Let's say instead of finding a word boundary \b after TEXTO you want to match the string \boundary. The you have to write:

TEXTO = "Var"
subject = r"Var\boundary"

if re.search(rf"\b(?=\w){TEXTO}\\boundary(?!\w)", subject, re.IGNORECASE):
    print("match")

This only works because we are using a raw-string (the regex is preceded by 'r'), otherwise we must write "\\\\boundary" in the regex (four backslashes). Additionally, without '\r', \b' would not converted to a word boundary anymore but to a backspace!

re.escape:

Basically puts a backspace in front of any special character. Hence, if you expect a special character in TEXTO, you need to write:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

NOTE: For any version >= python 3.7: !, ", %, ', ,, /, :, ;, <, =, >, @, and ` are not escaped. Only special characters with meaning in a regex are still escaped. _ is not escaped since Python 3.3.(s. here)

Curly braces:

If you want to use quantifiers within the regular expression using f-strings, you have to use double curly braces. Let's say you want to match TEXTO followed by exactly 2 digits:

if re.search(rf"\b(?=\w){re.escape(TEXTO)}\d{{2}}\b(?!\w)", subject, re.IGNORECASE):
    print("match")

Solution 3 - Python

if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):

This will insert what is in TEXTO into the regex as a string.

Solution 4 - Python

rx = r'\b(?<=\w){0}\b(?!\w)'.format(TEXTO)

Solution 5 - Python

I find it very convenient to build a regular expression pattern by stringing together multiple smaller patterns.

import re

string = "begin:id1:tag:middl:id2:tag:id3:end"
re_str1 = r'(?<=(\S{5})):'
re_str2 = r'(id\d+):(?=tag:)'
re_pattern = re.compile(re_str1 + re_str2)
match = re_pattern.findall(string)
print(match)

Output:

[('begin', 'id1'), ('middl', 'id2')]

Solution 6 - Python

I agree with all the above unless:

sys.argv[1] was something like Chicken\d{2}-\d{2}An\s*important\s*anchor

sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"

you would not want to use re.escape, because in that case you would like it to behave like a regex

TEXTO = sys.argv[1]

if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

Solution 7 - Python

you can try another usage using format grammer suger:

re_genre = r'{}'.format(your_variable)
regex_pattern = re.compile(re_genre)  

Solution 8 - Python

I needed to search for usernames that are similar to each other, and what Ned Batchelder said was incredibly helpful. However, I found I had cleaner output when I used re.compile to create my re search term:

pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)

Output can be printed using the following:

print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.

Solution 9 - Python

here's another format you can use (tested on python 3.7)

regex_str = r'\b(?<=\w)%s\b(?!\w)'%TEXTO

I find it's useful when you can't use {} for variable (here replaced with %s)

Solution 10 - Python

You can use format keyword as well for this.Format method will replace {} placeholder to the variable which you passed to the format method as an argument.

if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
    # Successful match**strong text**
else:
    # Match attempt failed

Solution 11 - Python

more example

I have configus.yml with flows files

"pattern":
  - _(\d{14})_
"datetime_string":
  - "%m%d%Y%H%M%f"

in python code I use

data_time_real_file=re.findall(r""+flows[flow]["pattern"][0]+"", latest_file)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionPedro LobitoView Question on Stackoverflow
Solution 1 - PythonNed BatchelderView Answer on Stackoverflow
Solution 2 - PythonairborneView Answer on Stackoverflow
Solution 3 - PythonBo BuchananView Answer on Stackoverflow
Solution 4 - PythonCat Plus PlusView Answer on Stackoverflow
Solution 5 - PythonDeepak NagarajanView Answer on Stackoverflow
Solution 6 - PythonMax CarrollView Answer on Stackoverflow
Solution 7 - PythonKevin ChouView Answer on Stackoverflow
Solution 8 - PythonjdelaporteView Answer on Stackoverflow
Solution 9 - PythonArdhiView Answer on Stackoverflow
Solution 10 - PythonHaneef MohammedView Answer on Stackoverflow
Solution 11 - PythonNikolay BaranenkoView Answer on Stackoverflow