In the process of fetching pages from the site, there may be a situation where the server does not return you the page completely, or for example, block your proxy if you fetch pages too fast. In order to work through such situations, we added the page_reload command. It reloads the current page and works in block and page contexts.
Or you might need to reread the content of the page in the client, for example, when using a headless browser as a client, when you need to wait until all scripts on page finish page rendering. To do it, you can use the command page_reread.
- find: path: body do: # CHECK IF PROXY IS BLOCKED - parse - if: # PLEASE NOTE THAT TEXT "request has been blocked" MAY BE DIFFERENT # IN YOUR CASE, AS IT DEPENDS ON SOURCE SITE # IN SUCH CASES YOU JUST NEED TO USE TEXT YOUR WEBSITE USES # TO TELL CLIENT THAT REQUEST HAS BEEN BLOCKED match: "request has been blocked" do: # SWITCHING PROXY - proxy_switch # RELOADING PAGE - page_reload
- walk: to: http://somesite.com/page.html do: # OUR PAGE IS RENDERED WITH JS IN A FEW SECONDS AFTER LOADING, SO LET'S WAIT FOR 5 SEC - sleep: 5 # REREAD PAGE CONTENT - page_reread
In the next chapter, you'll learn how to remove the URL from the cache of the loaded pages.