Iterators

Iterator by Dates

Iterators like date are used in cases where you need to create a set of ordered dates. For example, if you use the search form with date field on the source site to retrieve data just for specific date (or period). When you set up such iterator, you can specify the start date, the interval in days between each iteration, the period that sets the end date for the iterations, and a template that describes format of dates for arguments.

Parameter Description
type The constant that defines the iterator type, has the value date.
start Start date for iterations in YYYY-MM-DD format, if omitted, system will use current date as start date (optional).
end End date for iterations in YYYY-MM-DD format, if omitted, system will use period parameter (optional).
period Period duration in days between start and end dates, if omitted, defaulted to 60 days (optional).
interval Interval in days between iterations. Eg if you want to have agruments for each date between start and end dates, set interval to 1. If you need weekly intervals, set it to 7. If parameter is omitted, system will use default value 1. (optional).
template Template, used to format values in argument fields start_date and end_date, formed by iterator (optional).

In the table below, you can find all possible tags that can be used in the template and examples of usage:

Tag Description Template Example Value Sample
%a abbreviation for weekdays, eg Mon or Fri %a, %d %B Fri, 20 February
%A weekday, eg Monday or Friday %A, %d %B Friday, 20 February
%b month abbreviation, eg Feb or Sep %A, %d %b Friday, 20 Jun
%B month name, eg February or September %A, %d %B Friday, 20 June
%C number of century, takes values from 00 to 99 %С/%y 20/17
%d day of month, takes values from 01 to 31 %Y-%m-%d 2017-10-01
%D preset template, same as %m/%d/%y %D 05/08/17
%e day of month, takes values from 1 to 31 %e %B 5 January
%F preset template, same as %Y-%m-%d %F 2017-10-01
%g 2-digit number of year according to ISO-8601:1988 standard %g 17
%G 4-digit number of year according to ISO-8601:1988 standard %G 2017
%h same as %b% %A, %d %h Friday, 20 Jun
%H hour in 24-hours system, takes values from 00 to 23 %H:%M:%S 08:35:26
%I hour in 12-hours system, takes values from 01 to 12 %H:%M:%S 08:35:26
%j number of day of year, takes values from 1 to 366 Today is %j day of year Today is 183 day of year
%k hour in 24-hours system, takes values from 0 to 23 %k hrs %M mnt 8 hrs 35 mnt
%l hour in 12-hours system, takes values from 1 to 12 %l hrs %M mnt 8 hrs 35 mnt
%m number of month, takes values from 01 to 12 %Y-%m-%d 2017-10-01
%l minutes, takes values from 00 to 59 %l hrs %M mnt 8 hrs 35 mnt
%n new line symbol %Y%n%m 2017\n10
%p value AM or PM depending on time, used with 12-hours time system %I%p 8AM
%P value am or pm depending on time, used with 12-hours time system %I%P 8am
%r same as %I:%M:%S %p %r 04:12:37 PM
%R same as %H:%M %R 22:35
%s Unix timestamp, shows number of seconds since start of epoch (1 january 1970) %s 1506867213
%S seconds, takes values from 00 to 59 %H:%M:%S 08:35:26
%t tabulation symbol %Y%t%m 2017\t10
%T same as %H:%M:%S %T 08:35:26
%u number of weekday from 1 (monday) to 7 (sunday) Today is %u week day Today is 5 week day
%U number of week of year, if week starts with Sunday, takes values from 00 to 53 It was %U week It was 23 week
%V number of week of year by ISO standard, if week starts with Monday, takes values from 01 to 53. If week with 1 Jan has 4 or more days in new year, this week is counted as first week of new year, in other case its counted as last week of previous year. It was %V week It was 23 week
%w number of day of week from 0 (sunday) to 6 (saturday) Today is %w day of week Today is 5 day of week
%W number of week of year, if week starts with monday, takes values from 00 to 53 It was %W week It was 23 week
%y 2-digits number of year %m/%d/%y 10/01/17
%Y 4-digits number of year %Y-%m-%d 2017-10-01
%z time correction value to UTC time. Showing in format like +HHMM or -HHMM, where + means east from GMT, - means west from GMT, HH - number of hours, MM - number of minutes. %z +0300
%Z abbreviation fo timezone %Z PST
%+ same as %a %b %e %H:%M:%S %Z %Y %+ Mon Sep 20 13:24:55 PST 2017
%% symbol % %Y%%%m 2017%10

If you dont use template, start_date and end_date will be using ISO standard when formatted. In addition to these two arguments, there are some other agruments in the set that can be very useful in many cases:

Argument Description
start_date date of the interval start, in the format described by template or ISO standard
end_date date of the interval end, in the format described by template or ISO standard
start_year year of the interval start, in %Y (YYYY) format
end_year year of the interval end, in %Y (YYYY) format
start_yr year of the interval start, in %y (YY) format
end_yr year of the interval end, in %y (YY) format
start_month month of the interval start, in %m (MM) format
end_month month of the interval end, in %m (MM) format

Example of iterator by dates:

              iterator:
- type: date
  # SET INTERVAL FOR EVERY 2 DAYS
  interval: 2
  # PERIOD BETWEEN START DATE (IN THIS CASE CURRENT DATE, BECAUSE START DATE PARAMENTER IS OMITTED) AND END DATE IS SET TO 10 DAYS
  period: 10
  # TEMPLATE FOR `start_date` AND `end_date`
  template: '%B %d %Y'
              

As a result, we get the following list of fieldsets ​for each of which the digger will execute the main logic block:

              [
    {
      "start_date": "October 01 2017", "end_date": "October 02 2017",
      "start_year": "2017", "end_year": "2017",
      "start_yr": "17", "end_yr": "17",
      "start_month": "10", "end_month": "10"
    },
    {
      "start_date": "October 03 2017", "end_date": "October 04 2017",
      "start_year": "2017", "end_year": "2017",
      "start_yr": "17", "end_yr": "17",
      "start_month": "10", "end_month": "10"
    },
    {
      "start_date": "October 05 2017", "end_date": "October 06 2017",
      "start_year": "2017", "end_year": "2017",
      "start_yr": "17", "end_yr": "17",
      "start_month": "10", "end_month": "10"
    },
    {
      "start_date": "October 07 2017", "end_date": "October 08 2017",
      "start_year": "2017", "end_year": "2017",
      "start_yr": "17", "end_yr": "17",
      "start_month": "10", "end_month": "10"
    },
    {
      "start_date": "October 09 2017", "end_date": "October 10 2017",
      "start_year": "2017", "end_year": "2017",
      "start_yr": "17", "end_yr": "17",
      "start_month": "10", "end_month": "10"
    }
]
              

Iterators by date are very often used to organize incremental data collection, which allows you to save resources and perform the task faster.

Example of using date iterator in the digger:

              ---
config:
    debug: 2
    agent: Firefox
iterator:
    type: date
    start: '2017-10-01'
    period: 4
    interval: 2
    template: '%Y-%m-%d'
do:
- walk:
    to: https://www.diggernaut.com/sandbox/meta-lang-object-en.html?from=<%start_date%>&to=<%end_date%>
    do:
              
Time Level Message
2017-10-23 14:23:41:335 info Scrape is done
2017-10-23 14:23:41:321 debug Page content: <!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8"/> <title>Diggernaut | Meta-language | Object sample</title> </head> <body> <h1>Title-1</h1> <p>Lorem ipsum dolor sit amet.</p> </body></html>
2017-10-23 14:23:41:166 debug Referers: Referer: https://www.diggernaut.com/sandbox/meta-lang-object-en.html?from=2017-10-01&to=2017-10-02
2017-10-23 14:23:41:158 debug Referer: https://www.diggernaut.com/sandbox/meta-lang-object-en.html?from=2017-10-01&to=2017-10-02
2017-10-23 14:23:41:150 info Retrieving page (GET): https://www.diggernaut.com/sandbox/meta-lang-object-en.html?from=2017-10-03&to=2017-10-04
2017-10-23 14:23:41:138 debug Page content: <!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8"/> <title>Diggernaut | Meta-language | Object sample</title> </head> <body> <h1>Title-1</h1> <p>Lorem ipsum dolor sit amet.</p> </body></html>
2017-10-23 14:23:40:185 info Retrieving page (GET): https://www.diggernaut.com/sandbox/meta-lang-object-en.html?from=2017-10-01&to=2017-10-02
2017-10-23 14:23:40:178 info Starting scrape
2017-10-23 14:23:40:166 debug Setting up default proxy
2017-10-23 14:23:40:153 debug Setting up surf
2017-10-23 14:23:40:125 info Starting digger: meta-lang-iterator [1859]

Next we will learn more about csv iterators.