Methods for Working with DOM
When a digger is working with HTML or XML document, it works with DOM (Domain Object Model) structure of the document. So basically such document consists of nodes. When you use the find method, you search through nodes of the document, switch to the found node and, respectively, to block context. Inside the current block (node) there can be nested (child) nodes, and they also may have child nodes and so on. In a block context, you can manipulate the nodes of the current block, delete or replace them.
Examples of commands you can use for nodes manipulations:
# DELETE ALL NON-TEXT CHILD NODES - node_remove_all
# DELETE ALL `а` NODES - node_remove: a
# REPLACE ALL `а` NODES TO EMPTY `p` NODES - node_replace: path: a with: <p></p>
# REPLACE ALL `а` NODES TO THEIR CONTENTS - node_replace: path: a with: content
Let's use following HTML source as example:
<div> <span>some text</span> <a>some link</a> <span>another text</span> </div>
Examples of usage:
- find: path: div do: - node_remove_all - parse # REGISTER WILL BE EMPTY AS ALL NODES WERE REMOVED
- find: path: div do: - node_remove: span - parse # REGISTER VALUE: some link # BECAUSE ALL `span` NODES WERE REMOVED
- find: path: div do: - node_replace: path: span with: ' some text ' - parse # REGISTER VALUE: " some text some link some text " # BECAUSE ALL `span` NODES WERE REPLACED WITH TEXT " some text "
- find: path: div do: - node_replace: path: span with: content - parse # REGISTER VALUE: some textsome linksome text # BECAUSE ALL `span` NODES WERE REPLACED WITH THEIR CONTENTS
In the next chapter, we learn how to manipulate the attributes of a node.