Entity Manipulations

Link Pools

There are two commands for working with link pools, one used to add link to a specific pool, another to clear a given pool.

The pool_clear command clears the link pool with the given name. If the name is not provided, the pool with the name "default" is cleared, which is the default pool.

The link_add command adds a link to the pool. Depending on the context and used parameters, you can add a link from the register or in an explicit form. For example, in a block context you can add a link from the register, and in other contexts only explicitly, since the register is not available in other contexts. The full list of parameters is given below:

Parameter Description
pool Pool name. If not sent, digger uses "default" as pool name.
url One or list of links (see examples), given explicitly to add to the pool. If parameter is missing, register value is used as source to get link.

Usage examples:

              # CREATE BLOCK FROM HTML STRING
- register_set: '<body>
                  <a href="http://www.somesite.com/1">link1</a>
                  <a href="http://www.somesite.com/2">link2</a>
                  <a href="http://www.somesite.com/3">link3</a>
                  <a href="http://www.somesite.com/4">link4</a>
                </body>'
- to_block

# -------------------------------------------------------------
# FIND ALL `a` TAGS
- find:
    path: a
    do:
    # READ `href` ATTRIBUTE TO THE REGISTER
    - parse:
        attr: href
    # ADD LINK FROM REGISTER TO THE POOL (DEFAULT)
    - link_add

# ITERATE OVER LINKS IN THE POOL, LOAD PAGE AND EXECUTE `do` BLOCK
- walk:
    to: links
    do:
    ...
    ...
# CLEAR POOL WITH NAME `default`
- pool_clear

# -------------------------------------------------------------
# FIND ALL `a` TAGS
- find:
    path: a
    do:
    # READ `href` ATTRIBUTE TO THE REGISTER
    - parse:
        attr: href

    # ADD LINK FROM REGISTER TO THE POOL WITH NAME `main`
    - link_add:
        pool: main

# ITERATE OVER LINKS IN THE POOL `main`, LOAD PAGE AND EXECUTE `do` BLOCK
- walk:
    to: links
    pool: main
    do:
    ...
    ...
# CLEAR POOL WITH NAME main
- pool_clear: main

# -------------------------------------------------------------
# EXPLICITLY ADD URL http://www.somesite.com/somecoolurl TO THE POOL WITH NAME `somepool`
- link_add:
    pool: somepool
    url: http://www.somesite.com/somecoolurl

# -------------------------------------------------------------
# EXPLICITLY ADD LIST OF URLS TO THE POOL WITH NAME `somepool`
- link_add:
    pool: somepool
    url:
    - http://www.somesite.com/somecoolurl1
    - http://www.somesite.com/somecoolurl2
    - http://www.somesite.com/somecoolurl3
    - http://www.somesite.com/somecoolurl4
            

In the next chapter, we show you how to work with data objects.