Methods for Navigation

Find

The find command is used to navigate the DOM structure of the loaded document. You can use CSS selectors to find specific elements.

Useful information about the find command that you need to know:

  1. Each time, when digger execute the find command and enters the found block, it creates a new block context
  2. If the digger finds more than one element using given CSS selector, the digger visits sequentially all found elements and executes commands in the parameter do for each such element
  3. If the digger has found several elements, you can limit the selection using the slice parameter
  4. In the CSS selector, you can use variables, arguments, and even regular expressions
  5. You can merge the found elemets into single element using the merge parameter

Below is a series of examples with various CCS selectors:

          # SEARCHING BY TAGS
- find:
    path: body > div
    do:

# SEARCHING BY CLASS
- find:
    path: body > .someclass
    do:

# SEARCHING BY TAG WITH CLASS
- find:
    path: body > div.someclass
    do:

# SEARCH BY ID, PLEASE NOTE THAT YOU NEED TO ENCLOSE SUCH SELECTOR TO THE QUOTES
- find:
    path: 'body > #someid'
    do:

# SEARCH BY TAG WITH ID, PLEASE NOTE THAT YOU NEED TO ENCLOSE SUCH SELECTOR TO THE QUOTES
- find:
    path: 'body > div#someid'
    do:

# SEARCH BY FEW SELECTORS, ALL FOUND ELEMENTS WILL BE RETURNED AS SELECTION
- find:
    path: 'body > #someid, body > .someclass, body > div'
    do:

# USING :HAS SELECTOR, FINDS ALL TAGS WITH CLASS .someclass WITH NESTED TAG td
- find:
    path: body > .someclass:has(td)
    do:

# USING :CONTAINS SELECTOR, FINDS ALL TAGS WITH CLASS .someclass WITH NESTED TAG td WHICH CONTAINS TEXT sometext.
- find:
    path: body > .someclass:has(td:contains('sometext'))
    do:

# USING :MATCHES SELECTOR, FINDS ALL TAGS WITH CLASS .someclass WITH NESTED TAG td WHICH HAS TEXT THAT MATCHES GIVEN REGULAR EXPRESSION
- find:
    path: body > .someclass:has(td:matches(^\s*sometext\s*$))
    do:

# WILDCARD * SELECTOR, FINDS ALL TAGS WITH CLASS .someclass AND ID CONTAINS 123
- find:
    path: body > .someclass[id*="123"]
    do:

# USING SLICES, SEARCH BY CLASS AND LIMITING SELECTION BY ALL ELEMENTS BESIDES THE FIRST (ALLOWED NOTATION FOR SLICES: slice: 1, slice: -1, slice: 0:1, WHERE -1 MEANS LAST ELEMENT, 0 MEANS FIRST ELEMENT AND ":"" IS USED TO SET RANGES)
- find:
    path: body > .someclass
    slice: 1:-1
    do:

# SEARCH BY CLASS, THE MERGE ALL FOUND ELEMENTS INTO SINGLE ELEMENT p
- find:
    path: body > .someclass
    merge: p
    do:

# SEARCHING IN ENTIRE DOCUMENT FROM ANY BLOCK
- find: 
    path: body > div
    do:
    - find:
        # SET THE FLAG TO SHOW THAT YOU ARE PLANNING TO SEARCH IN THE ENTIRE DOCUMENT INSTEAD OF JUST CURRENT BLOCK
        in: doc
        path: body > ul > li
        do:
          

The find command supports the following parameters:

Parameter Description
path CSS selector, which is used by the digger to search for DOM elements. Bear in mind that the search will be done in the current block, if the command is called from a block context, so you must use a relative selector. If the command is called from the page context, the search is performed in the entire document.
form CSS selector, which is used by the digger to search for form elements. Bear in mind that the search will be done in the current block, if the command is called from a block context, so you must use a relative selector. If the command is called from the page context, the search is performed in the entire document. If the search was a successful digger switches into the form context.
in A special flag indicating where to perform the search. Currently supports only one possible value doc, which is used when you need to search in the entire document.
merge Combines the content of all found elements into one block and proceeds to it, to perform further commands on this block. If a non-empty string is passed in the parameter, the value will be considered as a tag and every element found will be enclosed into this tag.

Now let's look at more detailed examples of using the find command. As an source HTML, we will use the following fragment:

          <ul>
  <li>Some text</li>
  <li>Some other text</li>
  <li>Some other other text</li>
</ul>
          

See how the digger will go through the found elements and what will be in the block at each iteration step:

              - find:
    path: li
    do:
    - parse
              
              - find:
    path: li
    do:
    # NOW WE ARE IN THE FIRST `<li>` ELEMENT BLOCK: <li>Some text</li>
    # REGISTER CONTENT: ""
    - parse
    # REGISTER CONTENT: "Some text"
              
            - find:
    path: li
    do:
    # NOW WE ARE IN THE SECOND `<li>` ELEMENT BLOCK: <li>Some other text</li>
    # REGISTER CONTENT: ""
    - parse
    # REGISTER CONTENT: "Some other text"
            
            - find:
    path: li
    do:
    # NOW WE ARE IN THE THIRD `<li>` ELEMENT BLOCK: <li>Some other other text</li>
    # REGISTER CONTENT: ""
    - parse
    # REGISTER CONTENT: "Some other other text"
            

More usage examples:

              # SIMPLE SEARCH OF `ul` TAG
- find:
    path: ul
    do:
    # NOW WE ARE IN THE `ul` BLOCK
    - parse
    # REGISTER CONTENT: "Some textSome other textSome other other text"
              
              # SIMPLE SEARCH OF `li` TAG INSIDE THE `ul` TAG
- find:
    path: ul > li
    # GETTING SLICE OF SINGLE FOUND ELEMENT WITH INDEX 1
    slice: 1
    do:
    # NOW WE ARE IN <li>Some other text</li>
    - parse
    # REGISTER CONTENT: "Some other text"
              
              # SIMPLE SEARCH OF `li` TAG INSIDE THE `ul` TAG
- find:
    path: ul > li
    merge: p
    do:
    # NEW BLOCK (CURRENT) NOW HAS FOLLOWING DOM STRUCTURE:
    #   <p>Some text</p>
    #   <p>Some other text</p>
    #   <p>Some other other text</p>
    - parse
    # REGISTER CONTENT: "Some textSome other textSome other other text"

# SIMPLE SEARCH OF `li` TAG INSIDE THE `ul` TAG
- find:
    path: ul > li
    merge: ""
    do:
    # NEW BLOCK (CURRENT) NOW HAS FOLLOWING DOM STRUCTURE:
    #   Some textSome other textSome other other text
    - parse
    # REGISTER CONTENT: "Some textSome other textSome other other text"
                

As mentioned earlier, if you are in a block context, then you can search only in the current block and find only the elements from the current block. Therefore, your CSS selector must be constructed relative to your current location (block). Let's take a closer look at this using the following example:

          <body>
<div>
  <p>Some text</p>
  <p>Some other text </p>
  <p>Some other other text</p>
</div>
<div>
  <a>Some text</a>
  <a>Some other other text</a>
</div>
</body>
          

If we search using body > div CSS selector, we find 2 blocks:

          - find:
    path: body > div
    do:
    # THIS SELECTOR WILL FIND 2 `div` ELEMENTS
    # First block:
    # <div>
    #    <p>Some text</p>
    #    <p>Some other text </p>
    #    <p>Some other other text</p>
    # </div>

    # Second block:
    # <div>
    #    <a>Some text</a>
    #    <a>Some other other text</a>
    # </div>
    
    # LETS SELECT `а` INSIDE EACH `div`
    - find:
        path: a
        do:
          

Logic of the script is following:

  1. Digger finds all blocks div usinf first CSS selector, and for each found block it will execute commands passed in the do parameter
  2. Digger enters the first found div block and tried to find elements using the a CSS selector
  3. There is nothing found by this selector in the first block, so digger do nothing and leave the first div block
  4. Digger enters the second div block and tries to find elements using the a CSS selector
  5. There are 2 a elements, so digger enters each and execute commands passed in the do

If, being in some specific block context, you need to find some elements outside of scope of the current block, you can use in parameter with value doc to perform search in the entire document:

          - find:
    path: body > div > ul > li
    do:
    - find:
        # INDICATE THAT WE ARE GOING TO SEARCH IN THE ENTIRE DOCUMENT
        in: doc
        path: body > div > a
        do:
          

In the next chapter, we'll take a closer look at using the form parameter of the find command.