Diggernaut can run JS routines

1 min read

How to execute JavaScript snippet in the middle of the scraping process?

Sometimes, when you scrape something, you may need to calculate some parameter to make a request for some page. There may be a lot of different cases when you need to do some calc, but most of them can be solved using “eval” command.

Let imagine that we need to get the current time in seconds since Epoch and use it in some URL.
There are a few ways to do it:

Way 1:

Let’s keep URL value in the variable, evaluate JS code and return a result of JS code execution to the register, to do it you need to use a closure, and your closure should return value.

---
        ...
        ...
    - find:
        path: a
        do:
        - parse:
            attr: href
        - variable_set: link
        - eval:
            routine: js
            body: "(function (){return new Date().getTime();})();"
        - variable_prepend:
            field: link
            joinby: "?time="
        # you will have following value in the register http://somelink.com/?time=1476403635606
        - walk:
            to: value
            do:
            ...
            ...
            ...

Way 2:

Another way to do it is to write a value of some JS variable to your digger variable. You can use argument “assign” for it.

---
        ...
        ...
    - find:
        path: a
        do:
        - parse:
            attr: href
        - eval:
            routine: js
            body: "var jstime = new Date().getTime();"
            assign:
                time: jstime
        - variable_append:
            field: time
            joinby: "?time="
        # you will have following value in the register http://somelink.com/?time=1476403635606
        - walk:
            to: value
            do:
            ...
            ...
            ...

Way 3:

The third way to do it is to pass base URL to JS routine as variable, and then your closure can return complete URL.

---
        ...
        ...
    - find:
        path: a
        do:
        - parse:
            attr: href
        - variable_set: link
        - eval:
            routine: js
            body: "(function (){return '<%link%>' + '?time=' + new Date().getTime();})();"
        # you will have following value in the register http://somelink.com/?time=1476403635606
        - walk:
            to: value
            do:
            ...
            ...
            ...

Please note that if you are using digger’s variables in JS routine, you need to use quotes, because they become literal values before engine evaluate JS routine.
You can, of course, create more complex functions and even run scripts you extract from script tag on the page, but you should understand that it does not work with DOM and your function should not run too long.

If you do too many recursive calls or your JS code runs too long you shall see the following errors:

{"level":"error","msg":"Eval error: RangeError: Maximum call stack size exceeded","time":"2016-10-14T06:12:12.2624413+03:00"}
{"level":"error","msg":"Eval error: Code execution time exceeded limits. Stopping after: 10.0004sec","time":"2016-10-14T06:16:22.9742751+03:00"}

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.