Extracting date from a string in Python

PythonStringDate

Python Problem Overview


How can I extract the date from a string like "monkey 2010-07-10 love banana"? Thanks!

Python Solutions


Solution 1 - Python

Using python-dateutil:

In [1]: import dateutil.parser as dparser

In [18]: dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Out[18]: datetime.datetime(2010, 7, 10, 0, 0)

Invalid dates raise a ValueError:

In [19]: dparser.parse("monkey 2010-07-32 love banana",fuzzy=True)
# ValueError: day is out of range for month

It can recognize dates in many formats:

In [20]: dparser.parse("monkey 20/01/1980 love banana",fuzzy=True)
Out[20]: datetime.datetime(1980, 1, 20, 0, 0)

Note that it makes a guess if the date is ambiguous:

In [23]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True)
Out[23]: datetime.datetime(1980, 10, 1, 0, 0)

But the way it parses ambiguous dates is customizable:

In [21]: dparser.parse("monkey 10/01/1980 love banana",fuzzy=True, dayfirst=True)
Out[21]: datetime.datetime(1980, 1, 10, 0, 0)

Solution 2 - Python

If the date is given in a fixed form, you can simply use a regular expression to extract the date and "datetime.datetime.strptime" to parse the date:

import re
from datetime import datetime

match = re.search(r'\d{4}-\d{2}-\d{2}', text)
date = datetime.strptime(match.group(), '%Y-%m-%d').date()

Otherwise, if the date is given in an arbitrary form, you can't extract it easily.

Solution 3 - Python

For extracting the date from a string in Python; the best module available is the [datefinder][1] module.

You can use it in your Python project by following the easy steps given below.

Step 1: Install datefinder Package

pip install datefinder

Step 2: Use It In Your Project

import datefinder

input_string = "monkey 2010-07-10 love banana"
# a generator will be returned by the datefinder module. I'm typecasting it to a list. Please read the note of caution provided at the bottom.
matches = list(datefinder.find_dates(input_string))

if len(matches) > 0:
    # date returned will be a datetime.datetime object. here we are only using the first match.
    date = matches[0]
    print date
else:
    print 'No dates found'

note: if you are expecting a large number of matches; then typecasting to list won't be a recommended way as it will be having a big performance overhead. [1]: https://datefinder.readthedocs.io

Solution 4 - Python

Using Pygrok, you can define abstracted extensions to the Regular Expression syntax.

The custom patterns can be included in your regex in the format %{PATTERN_NAME}.

You can also create a label for that pattern, by separating with a colon: %s{PATTERN_NAME:matched_string}. If the pattern matches, the value will be returned as part of the resulting dictionary (e.g. result.get('matched_string'))

For example:

from pygrok import Grok

input_string = 'monkey 2010-07-10 love banana'
date_pattern = '%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}'

grok = Grok(date_pattern)
print(grok.match(input_string))

The resulting value will be a dictionary:

{'month': '07', 'day': '10', 'year': '2010'}

If the date_pattern does not exist in the input_string, the return value will be None. By contrast, if your pattern does not have any labels, it will return an empty dictionary {}

References:

Solution 5 - Python

You could also try the dateparser module, which may be slower than datefinder on free text but which should cover more potential cases and date formats, as well as a significant number of languages.

Solution 6 - Python

Hands Down The Best Ways

There are two good modules on PyPI and GitHub, that make this task easier for us. Those are

  1. DATEFINDER Module, useful for finding dates in strings of text.

Installation pip install datefinder

EXAMPLE

import datefinder

input_string = "monkey 2010-07-10 love banana"
# a generator will be returned by the datefinder module. I'm typecasting it to a list. Please read the note of caution provided at the bottom.
matches = list(datefinder.find_dates(input_string))

if len(matches) > 0:
    # date returned will be a datetime.datetime object. here we are only using the first match.
    date = matches[0]
    print date
else:
    print 'No dates found'

SOURCE: Finny Abraham

  1. DATERPARSER, extremely useful for scraping dates from an HTML file, in different lingual formats, supports Hijri and Jalali Calender as well. And supporters almost 200+ Languages in Different Formats

Features

Generic parsing of dates in over 200 language locales plus numerous formats in a language agnostic fashion. Generic parsing of relative dates like: '1 min ago', '2 weeks ago', '3 months, 1 week and 1 day ago', 'in 2 days', 'tomorrow'.

Advanced Features

Generic parsing of dates with time zones abbreviations or UTC offsets like: 'August 14, 2015 EST', 'July 4, 2013 PST', '21 July 2013 10:15 pm +0500'. Date lookup in longer texts. Support for non-Gregorian calendar systems. See Supported Calendars. Extensive test coverage.

SOURCE CODE [Example]
>>> parse('1 hour ago')
datetime.datetime(2015, 5, 31, 23, 0)
>>> parse('Il ya 2 heures')  # French (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)
>>> parse('1 anno 2 mesi')  # Italian (1 year 2 months)
datetime.datetime(2014, 4, 1, 0, 0)
>>> parse('yaklaşık 23 saat önce')  # Turkish (23 hours ago)
datetime.datetime(2015, 5, 31, 1, 0)
>>> parse('Hace una semana')  # Spanish (a week ago)
datetime.datetime(2015, 5, 25, 0, 0)
>>> parse('2小时前')  # Chinese (2 hours ago)
datetime.datetime(2015, 5, 31, 22, 0)

Solution 7 - Python

If you know the position of the date object in the string (for example in a log file), you can use .split()[index] to extract the date without fully knowing the format.

For example:

>>> string = 'monkey 2010-07-10 love banana'
>>> date = string.split()[1]
>>> date
'2010-07-10'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondmpopView Question on Stackoverflow
Solution 1 - PythonunutbuView Answer on Stackoverflow
Solution 2 - PythonlunaryornView Answer on Stackoverflow
Solution 3 - PythonFinny AbrahamView Answer on Stackoverflow
Solution 4 - PythonAubrey LavigneView Answer on Stackoverflow
Solution 5 - PythonadbarView Answer on Stackoverflow
Solution 6 - PythonMuneeb Ahmad KhurramView Answer on Stackoverflow
Solution 7 - PythondsodView Answer on Stackoverflow