New in Diggernaut: geo-targeting, captcha solver for Amazon and image processing

The January update service includes geo-targeting to select a list of proxies for cities and countries, an in-house solution for Amazon captcha with high success rate and image processing and export functionality.

Now our users who use a paid subscription have the opportunity to use a proxy from our pool geographically linked to a country or city. Since some sites use client IP for geolocation and offer different services or prices, depending on the location of the client, this feature will be very useful in such cases. More information about this functionality can be found on our documentation page: Basic settings: Proxies.

Previously, to solve a captcha on Amazon sites, we mainly used OCR functionality based on Tesseract. Since the characters in captcha are slightly distorted and rotated, the simple OCR method did not give good results in recognition. The recognition level was within 10-15%, which led to additional page requests on the client side. Last week, we implemented an entirely new method for solving Amazon’s captcha, the recognition level of which is around 97%. The algorithm is entirely in-house and does not require accounts in other services to solve the captcha. All our users running web scrapers in the cloud can use this service absolutely free. More information is available on our documentation page: Captcha: Bypassing captcha.

Now your web scrapers have the opportunity to process and save images to your computer or cloud storage. To save to the local disk of the computer, you will need to use the compiled version of the web scraper. In cloud storage, you can save files from both the web scraper running in the cloud and from the compiled version of the scraper. In addition to saving files, you can also crop and resize images. Want to know how? Read our documentation: Images.

Mikhail Sisin: Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.
Related Post