Type of compiled regex object in python

PythonRegexTypes

Python Problem Overview


What is the type of the compiled regular expression in python?

In particular, I want to evaluate

isinstance(re.compile(''), ???)

to be true, for introspection purposes.

One solution I had was, have some global constant REGEX_TYPE = type(re.compile('')), but it doesn't seem very elegant.

EDIT: The reason I want to do this is because I have list of strings and compiled regex objects. I want to "match" a string against list, by

  • for each string in the list, try to check for string equality.
  • for each regex in the list, try to check whether the string matches the given pattern.

and the code that I came up with was:

for allowed in alloweds:
    if isinstance(allowed, basestring) and allowed == input:
        ignored = False
        break
    elif isinstance(allowed, REGEX_TYPE) and allowed.match(input):
        ignored = False
        break

Python Solutions


Solution 1 - Python

Python 3.5 introduced the typing module. Included therein is typing.Pattern, a _TypeAlias.

Starting with Python 3.6, you can simply do:

from typing import Pattern

my_re = re.compile('foo')
assert isinstance(my_re, Pattern)

In 3.5, there used to be a bug requiring you to do this:

assert issubclass(type(my_re), Pattern)

Which isn’t guaranteed to work according to the documentation and test suite.

Solution 2 - Python

When the type of something isn't well specified, there's nothing wrong with using the type builtin to discover the answer at runtime:

>>> import re
>>> retype = type(re.compile('hello, world'))
>>> isinstance(re.compile('goodbye'), retype)
True
>>> isinstance(12, retype)
False
>>> 

Discovering the type at runtime protects you from having to access private attributes and against future changes to the return type. There's nothing inelegant about using type here, though there may be something inelegant about wanting to know the type at all.

That said, with the passage of time, the context of this question has shifted. With contemporary versions of Python, the return type of re.compile is now re.Pattern.

The general question about what to do if the type of something is not well-specified is still valid but in this particular case, the type of re.compile(...) is now well-specified.

Solution 3 - Python

It is possible to compare a compiled regular expression with 're._pattern_type'

import re
pattern = r'aa'
compiled_re = re.compile(pattern)
print isinstance(compiled_re, re._pattern_type)

>>True

Gives True, at least in version 2.7

Solution 4 - Python

Disclaimer: This isn't intended as a direct answer for your specific needs, but rather something that may be useful as an alternative approach


You can keep with the ideals of duck typing, and use hasattr to determine if the object has certain properties that you want to utilize. For example, you could do something like:

if hasattr(possibly_a_re_object, "match"): # Treat it like it's an re object
    possibly_a_re_object.match(thing_to_match_against)
else:
    # alternative handler

Solution 5 - Python

Prevention is better than cure. Don't create such a heterogeneous list in the first place. Have a set of allowed strings and a list of compiled regex objects. This should make your checking code look better and run faster:

if input in allowed_strings:
    ignored = False
else:
    for allowed in allowed_regexed_objects:
        if allowed.match(input):
            ignored = False
            break

If you can't avoid the creation of such a list, see if you have the opportunity to examine it once and build the two replacement objects.

Solution 6 - Python

As an illustration of polymorphism, an alternate solution is to create wrapper classes which implement a common method.

class Stringish (str):
    def matches (self, input):
        return self == input

class Regexish (re):
    def matches (self, input):
        return self.match(input)

Now your code can iterate over a list of alloweds containing objects instantiating either of these two classes completely transparently:

for allowed in alloweds:
    if allowed.matches(input):
        ignored = False
        break

Notice also how some code duplication goes away (though your original code could have been refactored to fix that separately).

Solution 7 - Python

FYI an example of such code is in BeautifulSoup: http://www.crummy.com/software/BeautifulSoup and uses the 'hasattr' technique. In the spirit of the "alternative approach", you might also encapsulate your string search in a regexp by doing this: regexp = re.compile(re.escape(your_string)) therefore having a list of only regular expressions.

Solution 8 - Python

In 3.7 you can use re.Pattern:

import re
rr = re.compile("pattern")
isinstance(rr, re.Pattern)
>> True

Solution 9 - Python

This is another not the answer to the question, but it solves the problem response. Unless your_string contains regular expression special characters,

if re.match(your_string,target_string):

has the same effect as

if your_string == target_string:

So drop back one step and use uncompiled regular expression patterns in your list of allowed. This is undoubtedly slower than using compiled regular expressions, but it will work with only the occasional unexpected outcome, and that only if you allow users to supply the allowed items

Solution 10 - Python

>>> import re
>>> regex = re.compile('foo')
>>> regex
<_sre.SRE_Pattern object at 0x10035d960>

Well - _sre is a C extension doing the pattern matching...you may look in the _sre C source.

Why do you care?

Or you try something like this (for whatever reason - I don't care):

>>> regex1 = re.compile('bar')
>>> regex2 = re.compile('foo')
>>> type(regex1) == type(regex2)
True

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJeeyoung KimView Question on Stackoverflow
Solution 1 - Pythonflying sheepView Answer on Stackoverflow
Solution 2 - PythonJean-Paul CalderoneView Answer on Stackoverflow
Solution 3 - PythonalemolView Answer on Stackoverflow
Solution 4 - PythonbgwView Answer on Stackoverflow
Solution 5 - PythonJohn MachinView Answer on Stackoverflow
Solution 6 - PythontripleeeView Answer on Stackoverflow
Solution 7 - PythonScoutView Answer on Stackoverflow
Solution 8 - PythonDominiCaneView Answer on Stackoverflow
Solution 9 - PythongriswolfView Answer on Stackoverflow
Solution 10 - PythonAndreas JungView Answer on Stackoverflow