Mikhail Sisin Follow Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

Diggernaut’s may updates: Leaving the beta

May 13, 2019 3 min read

We are pleased to finally announce that since May 2019 we are officially out of beta. For us and, we hope that for you too this is a great event to which we have been going for two and a half years. In honor of this significant date, we have prepared for you several updates, which will undoubtedly please you.

Automated data export with Diggernaut.io

If you use Diggernaut paid subscription, you can use the automatic export service for the collected data. Upon completion, the digger checks whether you have configured export profiles, and if it finds them, it sends a command to process the dataset to the Diggernaut.io export service. You can associate multiple export profiles with a single digger. Thus, the same dataset can be exported to several destinations at once.

Currently, you can export data to public feeds, which will be accessible via a static URL in the selected format (CSV, JSON, NDJSON, XML, HTML, TEXT). Such feeds can be used in various widgets on your site. Unlike receiving the data via API, feeds do not require an authorization, so you can work with them directly from a web page using JavaScript.

Also, there is an option to export data to Google Spreadsheet. The data can be imported either to the main sheet of the document or the new sheet every time a dataset is exported.

If you have an API or you use some other service that has an API, you can configure the export to transfer the data to a custom webhook. In this case, the service sends a new dataset to the webhook URL using the standard HTTP request and settings that you specified when creating the export script.

Finally, you can email a direct link to download the dataset to any email address registered in your Diggernaut account. These links do not require authorization and are available to anyone who has a link. The main difference from public feeds is that the feed stores data only for the last session, but with no time limit (as long as your account has a paid subscription enabled). The download link works for a limited time while the session is kept on our side (from 7 to 30 days depending on the subscription plan). Each download link is a session based so for each session you will have very own download link.

Right now we are testing the data export modules for Shopify and PrestaShop, and soon they will be available to all to our subscribers. You can directly upload products to your online store, as well as synchronize availability and prices.

More information about the service can be found in our documentation: Diggernaut.io Service.

Web User Interface

Several useful changes are made to the web interface. For example, now you can add URLs for the digger to visit without the need to edit the configuration of the digger. If you are not well versed in the YAML format, it let you manage these URLs more easily. Read more about it here: URL list.

The second useful feature is an option to manage the cache of URLs and data records. If you use a digger in “unique” mode or “update” mode, then sometimes it is necessary to reset the cache and start collecting data from the beginning. Now you can do it yourself from your account. Want to know how? You can read on the page: Clearing the cache.

We also added the ability to highlight HTML in the content of blocks and pages in the log. Now the log in debug mode has become more readable. This feature can be disabled if you do not need it.

Diggers

Digger framework also received some improvements and new features. For example, in the dataset, you can now use the data fields in Boolean mode; it can have true/false values. For some cases, this can be very useful, especially if you are working with an API or webhooks and the software on your side requires Boolean fields.

Functions for images and files saving can now work with FTP. You can transfer binary data directly to your server if you have an FTP server installed on it. More information about this option can be found on the Images and Files pages.

If you use Selenium in your diggers, we are happy to inform you that we have changed the way we are charging your account for Selenium usage. Instead of a tenfold cost of each page request, the system will now charge an account with a number of credits, depending on how many requests the browser has made to load a specific page. Moreover, only the resources for which the response code 200 was received are charged. We also significantly reworked the functionality. Now you can fill out and submit forms, click on page elements, links, and also switch the engine from Selenium to Surf and back at any time with a special command. This will allow you to work more efficiently with different sources within the same digger. Want to know more? Read on Turn on Selenium.

Mikhail Sisin Follow Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.