Mikhail Sisin Follow Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

Solving Google ReCaptcha v3: 2Captcha has been integrated to the Diggernaut platform

December 26, 2018 3 min read

There is a new arrival in the ranks of services for solving captcha supported by us. Meet the 2Captcha.

From the functional side, 2Captcha looks similar to the AntiCaptcha. It supports all popular kinds of captcha from the box, has “proxyless” mode, which is in demand by our users. Prices are on the same level as AntiCaptcha has. However, there is one crucial thing that differs 2Captcha from other captcha solving services. The offer solution for the Google ReCaptcha v3, which is a new kind of Google captcha and already used on some popular websites. We tested it, and it works indeed!

More details about usage of 3rd party providers for captcha resolving you can find in our documentation.

Below we describe how to use 2Captcha to solve Google ReCaptcha v3 using the example of the Russian traffic police website. And we will request information about registrations in the traffic police by VIN.

For a start, let’s go to [russian traffic police site](https: //xn--90adear.xn--p1ai/check/auto/) and make sure that there is ReCaptcha v3 used on the page with the search form. At first glance, the link to api.js from Google is not observed in the source code of the page, so automatic extraction of the site key and the action parameter will not work, and we need to find them manually. If we look at the Network tab in the developer console of your browser (Chrome / Firefox), then we will see that api.js is loaded, which means that the browser doesn’t load it directly from the page source, but use some javascript. Therefore, our task will be to check all loaded JS files on the Network tab.

Looking through the code of JS files, we will see that app.js file has a fragment of the code that looks like something we are seeking.

/* reCaptcha */
var reCaptchaSiteKey = '6Lc66nwUAAAAANZvAnT-OK4f4D_xkdzw5MLtAYFL';
var onloadReCaptcha = function() {
    grecaptcha.execute(reCaptchaSiteKey, {action: 'check_auto'})
    .then(function(token) {
        appVehicleCheck.reCaptchaToken = token;
    });
}

From this block we can take the site key (6Lc66nwUAAAAANZvAnT-OK4f4D_xkdzw5MLtAYFL) and the action parameter (check_auto).

Next, we need to determine which query the page is doing when we click the search button. To do it, open the developer panel in the browser, go to the Network tab and set the filter so that only XHR requests are shown to us.

We’ can see that the following resource is being loaded: https: //xn--b1afk4ade.xn--90adear.xn--p1ai/proxy/check/auto/history by the POST method. The VIN, the type of verification and the Google ReCaptcha v3 token are transferred to the server in the request.

Thus, we have everything to build a web scraper. Please note, that to scrape this website you need to use Russian proxy, as they recently closed access to this data to foreign IPs.

---
config:
    debug: 2
    agent: "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"
        proxy: PUT HERE YOUR RUSSIAN PROXY IN 1.1.1.1:8888 FORMAT
do:
# SET VIN TO CHECK TO THE VARIABLE
- variable_set:
    field: vin
    value: WVGZZZ7LZ8D050320
# LOAD PAGE WITH SEARCH FORM
- walk:
    to: https://xn--90adear.xn--p1ai/check/auto/
    do:
    # РЕШАЕМ КАПЧУ
    - captcha_resolve:
        provider: 2captcha
        type: recaptchav3
        apikey: HERE YOU SHOULD SET YOUR OWN 2CAPTCHA API KEY
        sitekey: 6Lc66nwUAAAAANZvAnT-OK4f4D_xkdzw5MLtAYFL
        minscore: 0.3
        action: check_auto
    # GO TO BODY BLOCK ON THE BAGE TO SWITCH TO THE BLOCK CONTEXT
    - find:
        path: body
        do:
        # READ VARIABLE captcha TO THE REGISTER
        - variable_get: captcha
        # IF THERE IS TOKEN, CAPTCHA WAS RESOLVED AND WE CAN DO QUERY AND GET DATA
        - if:
            match: \S
            do:
            # DOING POST REQUEST TO GET THE DATA
            - walk:
                to:
                    post: https://xn--b1afk4ade.xn--90adear.xn--p1ai/proxy/check/auto/history
                    data:
                        vin: <%vin%>
                        captchaWord: ''
                        checkType: history
                        reCaptchaToken: <%captcha%>
                do:
                # PARSE DATA AND SAVE IT TO THE OBJECT
                - find:
                    path: requestresult
                    do:
                    - object_new: vehicle
                    - find:
                        path: vehicle>enginevolume
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: engine_volume
                    - find:
                        path: vehicle>color
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: color
                    - find:
                        path: vehicle>bodynymber
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: body_number
                    - find:
                        path: vehicle>year
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: year
                    - find:
                        path: vehicle>enginenumber
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: engine_number
                    - find:
                        path: vehicle>vin
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: vin
                    - find:
                        path: vehicle>model
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: model
                    - find:
                        path: vehicle>category
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: category
                    - find:
                        path: vehicle>type
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: type
                    - find:
                        path: vehicle>powerhp
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: power_hp
                    - find:
                        path: vehicle>powerkwt
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: power_kwt
                    - find:
                        path: vehiclepassport>number
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: passport_number
                    - find:
                        path: vehiclepassport>issue
                        do:
                        - parse
                        - object_field_set:
                            object: vehicle
                            field: passport_issued
                    - find:
                        path: ownershipperiod
                        do:
                        - object_new: ownership
                        - find:
                            path: from
                            do:
                            - parse
                            - object_field_set:
                                object: ownership
                                field: from
                        - find:
                            path: to
                            do:
                            - parse
                            - object_field_set:
                                object: ownership
                                field: to
                        - find:
                            path: simplepersontype
                            do:
                            - parse
                            - object_field_set:
                                object: ownership
                                field: owner_type
                        - find:
                            path: lastoperation
                            do:
                            - parse
                            - object_field_set:
                                object: ownership
                                field: operation_type
                        - object_save:
                            name: ownership
                            to: vehicle
                    - object_save:
                        name: vehicle

If we run it on the Diggernaut platform (of course, you will have to use your 2Captcha API key in the web scraper code), then we will get the following data (here’s an example of the dataset in JSON format)

[{
    "vehicle": {
        "category": "В",
        "color": "ЧЕРНЫЙ",
        "engine_number": "080571",
        "engine_volume": "2967.0",
        "model": "ФОЛЬКСВАГЕН ТОUАRЕG",
        "ownership": [
            {
                "from": "2008-05-22T00:00:00.000+04:00",
                "operation_type": "11",
                "owner_type": "Natural",
                "to": "2011-12-10T00:00:00.000+04:00"
            },
            {
                "from": "2011-12-10T00:00:00.000+04:00",
                "operation_type": "06",
                "owner_type": "Natural",
                "to": "2017-02-02T00:00:00.000+03:00"
            },
            {
                "from": "2017-02-02T00:00:00.000+03:00",
                "operation_type": "03",
                "owner_type": "Natural"
            }
        ],
        "passport_issued": "ТАМОЖНЯ: 10009191",
        "passport_number": "78ТУ623619",
        "power_hp": "224",
        "power_kwt": "164.8",
        "type": "21",
        "vin": "WVGZZZ7LZ8D050320",
        "year": "2008"
    }
}
]

Mikhail Sisin Follow Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

« Improving the functionality for working with geospatial data

New in Diggernaut: geo-targeting, captcha solver for Amazon and image processing »

Solving Google ReCaptcha v3: 2Captcha has been integrated to the Diggernaut platform

Albato integration – push your datasets anywhere

New in Diggernaut: expanded functionality to work with…

Learning how to scrape the data from eBay

Leave a Reply Cancel reply