{"id":86,"date":"2016-10-14T20:38:23","date_gmt":"2016-10-14T20:38:23","guid":{"rendered":"https:\/\/blog.diggernaut.com\/?p=86"},"modified":"2019-01-12T20:34:58","modified_gmt":"2019-01-12T20:34:58","slug":"diggernaut-can-run-js-routines","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/diggernaut-can-run-js-routines\/","title":{"rendered":"Diggernaut can run JS routines"},"content":{"rendered":"<h3>How to execute JavaScript snippet in the middle of the scraping process?<\/h3>\n<p>Sometimes, when you scrape something, you may need to calculate some parameter to make a request for some page. There may be a lot of different cases when you need to do some calc, but most of them can be solved using \u201ceval\u201d command.<\/p>\n<p>Let imagine that we need to get the current time in seconds since Epoch and use it in some URL.<br>\nThere are a few ways to do it:<\/p>\n<h4>Way 1:<\/h4>\n<p>Let\u2019s keep URL value in the variable, evaluate JS code and return a result of JS code execution to the register, to do it you need to use a closure, and your closure should return value.<\/p>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\n        ...\n        ...\n    - find:\n        path: a\n        do:\n        - parse:\n            attr: href\n        - variable_set: link\n        - eval:\n            routine: js\n            body: &quot;(function (){return new Date().getTime();})();&quot;\n        - variable_prepend:\n            field: link\n            joinby: &quot;?time=&quot;\n        # you will have following value in the register http:\/\/somelink.com\/?time=1476403635606\n        - walk:\n            to: value\n            do:\n            ...\n            ...\n            ...<\/code><\/pre>\n<h4>Way 2:<\/h4>\n<p>Another way to do it is to write a value of some JS variable to your digger variable. You can use argument \u201cassign\u201d for it.<\/p>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\n        ...\n        ...\n    - find:\n        path: a\n        do:\n        - parse:\n            attr: href\n        - eval:\n            routine: js\n            body: &quot;var jstime = new Date().getTime();&quot;\n            assign:\n                time: jstime\n        - variable_append:\n            field: time\n            joinby: &quot;?time=&quot;\n        # you will have following value in the register http:\/\/somelink.com\/?time=1476403635606\n        - walk:\n            to: value\n            do:\n            ...\n            ...\n            ...<\/code><\/pre>\n<h4>Way 3:<\/h4>\n<p>The third way to do it is to pass base URL to JS routine as variable, and then your closure can return complete URL.<\/p>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\n        ...\n        ...\n    - find:\n        path: a\n        do:\n        - parse:\n            attr: href\n        - variable_set: link\n        - eval:\n            routine: js\n            body: &quot;(function (){return &#039;&#039; + &#039;?time=&#039; + new Date().getTime();})();&quot;\n        # you will have following value in the register http:\/\/somelink.com\/?time=1476403635606\n        - walk:\n            to: value\n            do:\n            ...\n            ...\n            ...<\/code><\/pre>\n<p>Please note that if you are using digger\u2019s variables in JS routine, you need to use quotes, because they become literal values before engine evaluate JS routine.<br>\nYou can, of course, create more complex functions and even run scripts you extract from script tag on the page, but you should understand that \n<strong><em>it does not work with DOM<\/em><\/strong> and your <strong><em>function should not run too long<\/em><\/strong>.<\/p>\n<p>If you do too many recursive calls or your JS code runs too long you shall see the following errors:<\/p>\n<pre><code class=\"language-js\">{&quot;level&quot;:&quot;error&quot;,&quot;msg&quot;:&quot;Eval error: RangeError: Maximum call stack size exceeded&quot;,&quot;time&quot;:&quot;2016-10-14T06:12:12.2624413+03:00&quot;}\n{&quot;level&quot;:&quot;error&quot;,&quot;msg&quot;:&quot;Eval error: Code execution time exceeded limits. Stopping after: 10.0004sec&quot;,&quot;time&quot;:&quot;2016-10-14T06:16:22.9742751+03:00&quot;}\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>How to execute JavaScript snippet in the middle of the scraping process? Sometimes, when you scrape something, you may need to calculate some parameter to make a request for some page. There may be a lot of different cases when you need to do some calc, but most of them can be solved using \u201ceval\u201d [&hellip;]<\/p>","protected":false},"author":5,"featured_media":87,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,2],"tags":[],"class_list":["post-86","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-learning-meta-language","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/86","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=86"}],"version-history":[{"count":3,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/86\/revisions"}],"predecessor-version":[{"id":680,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/86\/revisions\/680"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/87"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=86"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=86"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=86"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}