Method for Working with DOM

Split the Block

In some cases, you may need to split the contents of a block into several blocks, to go further into each of these blocks and parse the contents separately. For example, some parameters of the item can be listed as comma-separated values in defined block, or separated by the
tag. In this case, the split command will help you. It can work in two contexts: text and HTML. In the text context, it works with the contents of all text nodes of the current block, and in the HTML context - with the HTML content of the current block. The result of the command execution will be a new block, in the context of which the digger will automatically switch.

Command can use following paramenters:

Parameter Description
context Defines context for the command: text or html.
delimiter Separator, which will be used to split contents to blocks.

Let's use following HTML source:

          <div>
    <p>Some text</p>
    <br/>
    <p>Some,other,text with 
    comma 
    and
    newline</p>
</div>
          

Usage examples:

              - find:
    path: div > p:contains(",")
    do:
    - split:
        context: text
        delimiter: ","

    # AT THIS MOMENT WE WILL BE IN NEW BLOCK
    # WHICH IS CREATED BY THE `split` COMMAND
    # THIS BLOCK WILL HAVE FOLLOWING HTML CONTENT:
    # <div class="splitted element_0">Some</div>
    # <div class="splitted element_1">other</div>
    # <div class="splitted element_2">text with comma and newline</div>
    
    # LETS USE FOLLOWING CSS SELECTOR AND SELECT LAST SPLITTED BLOCK
    - find:
        path: .splitted
        slice: -1
        do:
        - parse

        # REGISTER VALUE: text with comma and newline
              
              - find:
    path: div 
    do:
    - split:
        context: html
        delimiter: <br/>

    # AT THIS MOMENT WE WILL BE IN NEW BLOCK
    # WHICH IS CREATED BY THE `split` COMMAND
    # THIS BLOCK WILL HAVE FOLLOWING HTML CONTENT:
    # <div class="splitted element_0"><p>Some text</p></div>
    # <div class="splitted element_1"><p>Some,other,text with comma and newline</p></div>

    # LETS USE FOLLOWING CSS SELECTOR AND SELECT LAST SPLITTED BLOCK
    - find:
        path: .splitted
        slice: -1
        do:
        - parse

        # REGISTER VALUE: Some,other,text with comma and newline
              

In the next chapter, we will learn how to split the contents of a block into blocks using sequences.