Parsing files (ics/ icalendar) using Python

Python Problem Overview

I have a .ics file in the following format. What is the best way to parse it? I need to retrieve the Summary, Description, and Time for each of the entries.

BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
PRODID:-//Lotus Development Corporation//NONSGML Notes 8.0//EN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:India
BEGIN:STANDARD
DTSTART:19500101T020000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID="India":20100615T111500
DTEND;TZID="India":20100615T121500
TRANSP:OPAQUE
DTSTAMP:20100713T071035Z
CLASS:PUBLIC
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n

UID:12D3901F0AD9E83E65257743001F2C9A-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:12D3901F0AD9E83E65257743001F2C9A
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100628T130000
DTEND;TZID="India":20100628T133000
TRANSP:OPAQUE
DTSTAMP:20100628T055408Z
CLASS:PUBLIC
DESCRIPTION:
SUMMARY:smart energy management
LOCATION:8778/92050462
UID:07F96A3F1C9547366525775000203D96-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-NOTICETYPE:A
X-LOTUS-APPTTYPE:3
X-LOTUS-CHILD_UID:07F96A3F1C9547366525775000203D96
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT

Python Solutions

Solution 1 - Python

The icalendar package looks nice.

For instance, to write a file:

from icalendar import Calendar, Event
from datetime import datetime
from pytz import UTC # timezone

cal = Calendar()
cal.add('prodid', '-//My calendar product//mxm.dk//')
cal.add('version', '2.0')

event = Event()
event.add('summary', 'Python meeting about calendaring')
event.add('dtstart', datetime(2005,4,4,8,0,0,tzinfo=UTC))
event.add('dtend', datetime(2005,4,4,10,0,0,tzinfo=UTC))
event.add('dtstamp', datetime(2005,4,4,0,10,0,tzinfo=UTC))
event['uid'] = '20050115T101010/[email protected]'
event.add('priority', 5)

cal.add_component(event)

f = open('example.ics', 'wb')
f.write(cal.to_ical())
f.close()

Tadaaa, you get this file:

BEGIN:VCALENDAR
PRODID:-//My calendar product//mxm.dk//
VERSION:2.0
BEGIN:VEVENT
DTEND;VALUE=DATE:20050404T100000Z
DTSTAMP;VALUE=DATE:20050404T001000Z
DTSTART;VALUE=DATE:20050404T080000Z
PRIORITY:5
SUMMARY:Python meeting about calendaring
UID:20050115T101010/[email protected]
END:VEVENT
END:VCALENDAR

But what lies in this file?

g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
    print component.name
g.close()

You can see it easily:

>>> 
VCALENDAR
VEVENT
>>>

What about parsing the data about the events:

g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
    if component.name == "VEVENT":
        print(component.get('summary'))
        print(component.get('dtstart'))
        print(component.get('dtend'))
        print(component.get('dtstamp'))
g.close()

Now you get:

>>> 
Python meeting about calendaring
20050404T080000Z
20050404T100000Z
20050404T001000Z
>>>

Solution 2 - Python

You could probably also use the vobject module for this: http://pypi.python.org/pypi/vobject

If you have a sample.ics file you can read it's contents like, so:

# read the data from the file
data = open("sample.ics").read()

# parse the top-level event with vobject
cal = vobject.readOne(data)

# Get Summary
print 'Summary: ', cal.vevent.summary.valueRepr()
# Get Description
print 'Description: ', cal.vevent.description.valueRepr()

# Get Time
print 'Time (as a datetime object): ', cal.vevent.dtstart.value
print 'Time (as a string): ', cal.vevent.dtstart.valueRepr()

Solution 3 - Python

New to python; the above comments were very helpful so wanted to post a more complete sample.

# ics to csv example
# dependency: https://pypi.org/project/vobject/

import vobject
import csv

with open('sample.csv', mode='w') as csv_out:
    csv_writer = csv.writer(csv_out, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    csv_writer.writerow(['WHAT', 'WHO', 'FROM', 'TO', 'DESCRIPTION'])

    # read the data from the file
    data = open("sample.ics").read()

    # iterate through the contents
    for cal in vobject.readComponents(data):
        for component in cal.components():
            if component.name == "VEVENT":
                # write to csv
                csv_writer.writerow([component.summary.valueRepr(),component.attendee.valueRepr(),component.dtstart.valueRepr(),component.dtend.valueRepr(),component.description.valueRepr()])

Solution 4 - Python

Four years later and understanding ICS format a bit better, if those were the only fields I needed, I'd just use the native string methods:

import io

# Probably not a valid .ics file, but we don't really care for the example
# it works fine regardless
file = io.StringIO('''
BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n

SUMMARY:smart energy management
LOCATION:8778/92050462
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT
'''.strip())

parsing = False
for line in file:
    field, _, data = line.partition(':')
    if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):
        parsing = True
        print(field)
        print('\t'+'\n\t'.join(data.split('\n')))
    elif parsing and not data:
        print('\t'+'\n\t'.join(field.split('\n')))
    else:
        parsing = False

Storing the data and parsing the datetime is left as an exercise for the reader (it's always UTC)

old answer below

You could use a regex:

import re
text = #your text
print(re.search("SUMMARY:.*?:", text, re.DOTALL).group())
print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group())
print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group())

I'm sure it may be possible to skip the first and last words, I'm just not sure how to do it with regex. You could do it this way though:

print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1])

Solution 5 - Python

In case anyone else is looking at this, the ics package seems like it's updated better than any others mentioned in the thread. https://pypi.org/project/ics/

Here's some sample code I'm using:

from ics import Calendar, Event

with open(in_file, 'r') as file:
        ics_text = file.read()

c = Calendar(ics_text) for e in c.events:
        print(e.name)

Solution 6 - Python

I'd parse line by line and do a search for your terms, then get the index and extract that and X number of characters further (however many you think you'll need). Then parse that much smaller string to get it to be what you need.

Content Type	Original Author	Original Content on Stackoverflow
Question	LVT	View Question on Stackoverflow
Solution 1 - Python	Wok	View Answer on Stackoverflow
Solution 2 - Python	Brad Montgomery	View Answer on Stackoverflow
Solution 3 - Python	cibsbui	View Answer on Stackoverflow
Solution 4 - Python	Wayne Werner	View Answer on Stackoverflow
Solution 5 - Python	clementzach	View Answer on Stackoverflow
Solution 6 - Python	Brian	View Answer on Stackoverflow