Extract data from iCal? It couldn’t be easier.

Today we will write a script for scraping various resources which uses files in iCal format to send event data.

This format was invented by Apple and now many websites let you export the calendar events in this format. In this case, you do not need to scrap the site and the HTML, you only need to get and parse file in iCal format. It makes whole process much easier.

Diggernaut.com natively support this format and automatically converts it to XML. And then we can work with iCal data as with a normal HTML page.

Let’s see how it works by extracting data from Science Fiction Conventions calendar, found by me on the icalshare.com website. Let’s start with the writing config with defining some basic settings. At the first stage we will need to set a digger to the debug level 2. This is only way we can see the source code of the converted file, and we need to check it so we could write the navigation instructions for walking to blocks with data we need to extract and collect.

Calendar file we will use is: https://www.google.com/calendar/ical/lirleni%40gmail.com/public/basic.ics

So, our config will start with:

In this code we set Debug level 2, configure digger to use Firefox as browser name and get iCal file. Now we need to login to our account at Diggernaut.com, select existing or create new project and then create new digger, where in config field we should put the code we have above.

Make sure that digger is set to the Debug mode (in Status column you should see Debug). If its not so, you need to switch digger to debug mode using selector in Status column. Then we need to start the digger and wait until it finishes the run. When its done, we need to check logs by clicking on the “Log” button.

As you can see, page structure consists of blocks. So all that we need – go through all these blocks and pick all fields from each. So, lets pick one block and reformat it so we could see better what we need to get as data fields and probably what filters to use.

We are not going to pick all these fields, lets get only summary, description, start datetime, end datetime and location. Its very easy to do: first we walk to the event block, create data object, then we walk to the fields blocks, parse data and save it to the object fields and finally save the data object.

Lets put our config to the digger and run it. Once its done, lets jump to the Data section and make sure that data we have scraped is in a good state. You should see there something like:

If data is good, lets switch our digger to the Active mode, as in Debug mode you cannot download data, all you can do in Debug mode is to review limited set of data. Lets start digger again and wait for completion. Then go to the Data section again and download data in format we need. Sample in XLSX format you can download here.

As you can see its very easy to work with iCal at Diggernaut.com!

Co-founder of cloud based web scraping and data extraction platform Diggernaut

Leave a Reply

Your email address will not be published. Required fields are marked *