Use Python format string in reverse for parsing

Python

Python Problem Overview


I've been using the following python code to format an integer part ID as a formatted part number string:

pn = 'PN-{:0>9}'.format(id)

I would like to know if there is a way to use that same format string ('PN-{:0>9}') in reverse to extract the integer ID from the formatted part number. If that can't be done, is there a way to use a single format string (or regex?) to create and parse?

Python Solutions


Solution 1 - Python

The parse module "is the opposite of format()".

Example usage:

>>> import parse
>>> format_string = 'PN-{:0>9}'
>>> id = 123
>>> pn = format_string.format(id)
>>> pn
'PN-000000123'
>>> parsed = parse.parse(format_string, pn)
>>> parsed
<Result ('123',) {}>
>>> parsed[0]
'123'

Solution 2 - Python

You might find simulating scanf interresting.

Solution 3 - Python

Here's a solution in case you don't want to use the parse module. It converts format strings into regular expressions with named groups. It makes a few assumptions (described in the docstring) that were okay in my case, but may not be okay in yours.

def match_format_string(format_str, s):
    """Match s against the given format string, return dict of matches.

    We assume all of the arguments in format string are named keyword arguments (i.e. no {} or
    {:0.2f}). We also assume that all chars are allowed in each keyword argument, so separators
    need to be present which aren't present in the keyword arguments (i.e. '{one}{two}' won't work
    reliably as a format string but '{one}-{two}' will if the hyphen isn't used in {one} or {two}).

    We raise if the format string does not match s.
    
    Example:
    fs = '{test}-{flight}-{go}'
    s = fs.format('first', 'second', 'third')
    match_format_string(fs, s) -> {'test': 'first', 'flight': 'second', 'go': 'third'}
    """

    # First split on any keyword arguments, note that the names of keyword arguments will be in the
    # 1st, 3rd, ... positions in this list
    tokens = re.split(r'\{(.*?)\}', format_str)
    keywords = tokens[1::2]

    # Now replace keyword arguments with named groups matching them. We also escape between keyword
    # arguments so we support meta-characters there. Re-join tokens to form our regexp pattern
    tokens[1::2] = map(u'(?P<{}>.*)'.format, keywords)
    tokens[0::2] = map(re.escape, tokens[0::2])
    pattern = ''.join(tokens)

    # Use our pattern to match the given string, raise if it doesn't match
    matches = re.match(pattern, s)
    if not matches:
        raise Exception("Format string did not match")

    # Return a dict with all of our keywords and their values
    return {x: matches.group(x) for x in keywords}

Solution 4 - Python

How about:

id = int(pn.split('-')[1])

This splits the part number at the dash, takes the second component and converts it to integer.

P.S. I've kept id as the variable name so that the connection to your question is clear. It is a good idea to rename that variable that it doesn't shadow the built-in function.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoshView Question on Stackoverflow
Solution 1 - PythonBrian DorseyView Answer on Stackoverflow
Solution 2 - PythondugresView Answer on Stackoverflow
Solution 3 - PythonnonagonView Answer on Stackoverflow
Solution 4 - PythonNPEView Answer on Stackoverflow