Named regular expression group "(?P<group_name>regexp)": what does "P" stand for?
PythonRegexRegex GroupPython Problem Overview
In Python, the (?P<group_name>…)
syntax allows one to refer to the matched string through its name:
>>> import re
>>> match = re.search('(?P<name>.*) (?P<phone>.*)', 'John 123456')
>>> match.group('name')
'John'
What does "P" stand for? I could not find any hint in the official documentation.
I would love to get ideas about how to help my students remember this syntax. Knowing what "P" does stand for (or might stand for) would be useful.
Python Solutions
Solution 1 - Python
Since we're all guessing, I might as well give mine: I've always thought it stood for Python. That may sound pretty stupid -- what, P for Python?! -- but in my defense, I vaguely remembered this thread [emphasis mine]:
> Subject: Claiming (?P...) regex syntax extensions
> From: Guido van Rossum ([email protected])
> Date: Dec 10, 1997 3:36:19 pm
>
> I have an unusual request for the Perl developers (those that develop
> the Perl language). I hope this (perl5-porters) is the right list. I
> am cc'ing the Python string-sig because it is the origin of most of
> the work I'm discussing here.
>
> You are probably aware of Python. I am Python's creator; I am
> planning to release a next "major" version, Python 1.5, by the end of
> this year. I hope that Python and Perl can co-exist in years to come;
> cross-pollination can be good for both languages. (I believe Larry
> had a good look at Python when he added objects to Perl 5; O'Reilly
> publishes books about both languages.)
>
> As you may know, Python 1.5 adds a new regular expression module that
> more closely matches Perl's syntax. We've tried to be as close to the
> Perl syntax as possible within Python's syntax. However, the regex
> syntax has some Python-specific extensions, which all begin with (?P .
> Currently there are two of them:
>
> (?P<foo>...)
Similar to regular grouping parentheses, but the text
> matched by the group is accessible after the match has been performed,
> via the symbolic group name "foo".
>
> (?P=foo)
Matches the same string as that matched by the group named
> "foo". Equivalent to \1, \2, etc. except that the group is referred
> to by name, not number.
>
> I hope that this Python-specific extension won't conflict with any
> future Perl extensions to the Perl regex syntax. If you have plans to
> use (?P, please let us know as soon as possible so we can resolve the
> conflict. Otherwise, it would be nice if the (?P syntax could be
> permanently reserved for Python-specific syntax extensions. (Is
> there some kind of registry of extensions?)
to which Larry Wall replied:
> [...] There's no registry as of now--yours is the first request from > outside perl5-porters, so it's a pretty low-bandwidth activity. > (Sorry it was even lower last week--I was off in New York at Internet > World.) > > Anyway, as far as I'm concerned, you may certainly have 'P' with my > blessing. (Obviously Perl doesn't need the 'P' at this point. :-) [...]
So I don't know what the original choice of P was motivated by -- pattern? placeholder? penguins? -- but you can understand why I've always associated it with Python. Which considering that (1) I don't like regular expressions and avoid them wherever possible, and (2) this thread happened fifteen years ago, is kind of odd.
Solution 2 - Python
Python Extension. From the Python Docs:
> The solution chosen by the Perl developers was to use (?...) as the > extension syntax. ? immediately after a parenthesis was a syntax error > because the ? would have nothing to repeat, so this didn’t introduce > any compatibility problems. The characters immediately after the ? > indicate what extension is being used, so (?=foo) is one thing (a > positive lookahead assertion) and (?:foo) is something else (a > non-capturing group containing the subexpression foo). > > Python supports several of Perl’s extensions and adds an extension > syntax to Perl’s extension syntax.If the first character after the > question mark is a P, you know that it’s an extension that’s specific > to Python
Solution 3 - Python
Pattern! The group names a (sub)pattern for later use in the regex. See the documentation here for details about how such groups are used.