{"id":540,"date":"2018-05-18T20:52:25","date_gmt":"2018-05-18T20:52:25","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=540"},"modified":"2019-01-12T09:27:00","modified_gmt":"2019-01-12T09:27:00","slug":"automated-cloudflare-challenge-solution-golang","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/automated-cloudflare-challenge-solution-golang\/","title":{"rendered":"Automated CloudFlare challenge solution with Golang"},"content":{"rendered":"<p>There is a new version of <a href=\"https:\/\/github.com\/Diggernaut\/surf\">Surf<\/a> library for Golang has been pushed. This version can bypass a fresh version of CloudFlare protection. We are using this library in our engine so our users can feel all benefits. Library bypass protection in automated mode, so you don\u2019t need to do anything extra. You are just loading a page as usual, and if there is CloudFlare challenge, library resolves it automatically it, and you get content of the page you requested.<\/p>\n<p>You are free to use Surf library from our repo for your projects, it\u2019s under MIT license and is forked from <a href=\"https:\/\/github.com\/headzoo\/surf\">headzoo\/surf<\/a>. However, we are using own version that fit needs of our web scraping engine.<\/p>\n<p>How to test if it works. You can try to load some page which is under protection. This site is under CloudFlare. Let\u2019s try to use following digger config to get this page and extract website URL:<\/p>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig:\n    debug: 2\n    agent: Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/66.0.3359.139 Safari\/537.36\ndo:\n- walk:\n    to: https:\/\/www.g2crowd.com\/products\/essbase\/details\n    do:\n    - find:\n        path: div.company-info\n        do:\n        - object_new: item\n        - find:\n            path: dl > dt:contains(&quot;Vendor&quot;) + dd\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set:\n                object: item\n                field: vendor\n        - find:\n            path: dl > dt:contains(&quot;Description&quot;) + dd\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set:\n                object: item\n                field: description\n        - find:\n            path: dl > dt:contains(&quot;Company Website&quot;) + dd>a\n            do:\n            - parse:\n                attr: href\n            - space_dedupe\n            - trim\n            - object_field_set:\n                object: item\n                field: website\n        - object_save:\n            name: item<\/code><\/pre>\n<p>Data we get will looks like:<\/p>\n<pre><code class=\"language-js\">{\n  item : {\n    website :  &quot;https:\/\/www.oracle.com\/index.html&quot;,\n    vendor :  &quot;Oracle&quot;,\n    description :  &quot;Oracle Corporation develops, manufactures, markets, hosts, and supports database and middleware software, applications software, and hardware systems.&quot;\n  }\n}\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>There is a new version of Surf library for Golang has been pushed. This version can bypass a fresh version of CloudFlare protection. We are using this library in our engine so our users can feel all benefits. Library bypass protection in automated mode, so you don\u2019t need to do anything extra. You are just [&hellip;]<\/p>","protected":false},"author":4,"featured_media":623,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25,24,26],"tags":[],"class_list":["post-540","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-go","category-golang","category-programming"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=540"}],"version-history":[{"count":3,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/540\/revisions"}],"predecessor-version":[{"id":622,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/540\/revisions\/622"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/623"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}