Other Commands

Clearing Cache

If you are loading pages in the unique mode, the digger saves the URL address of each processed page in a special cache and skips (does not load) already processed pages in the current run or in all subsequent digger runs.

However, sometimes there are situations when you need to delete the page from the cache. Since page gets into the cache automatically when digger tries to load it, we can not re-read it with the page_reload command in case if proxy was blocked. In order to work around such situations, we added the link_remove command. It deletes the URL of the current page from the cache and the digger can retrieve it again.

Usage example:

Bypassing proxy blocking in 'unique' mode

              - walk:
    to: http://somesite.com/page.html
    mode: unique
    do:
    - find:
        path: body
        do:
        # CHECK IF PROXY IS BLOCKED
        - parse
        - if:
            match: "request has been blocked"
            do:
            # SWITCH PROXY
            - proxy_switch
            # DELETE CURRENT URL FROM THE CACHE
            - link_remove
            # RELOAD PAGE
            - page_reload

That's all we wanted to tell you about the meta-language of the Diggernaut service.
We hope that you will succeed with mastering it!

You can also visit our blog, where you will find useful articles and a large number of examples revealing the full power of our service and meta-language in particular.

If you still have any questions, feel free to contact us. We are always happy to help! Happy scraping!