Images

Switch to Image Context

Sometimes there is a need to process an image from a website and save the result to the cloud storage or a local drive. To do it, you must switch to image context using the image command. This command only works in block context and requires a base64 encoded image in the register. The following image types are currently supported: jpg, png, gif, webp, tif, bmp.

As a rule, the transition to the context is fairly standard. You need to load the image, go to the block with the encoded image, scrape the contents of the block to the register and switch to the image context:

                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # PROCESS THE IMAGE
                        
                    

Process the Image

While you are in the image context, you can process the image. For example, you can resize and crop the image. The list of available commands and their description is given below:

Command Description
resize resize an image to given size, using given resampling algorithm.
Parameter Description
width The width of the new image, in pixels (if omitted, is set to 0). If 0, then scaling is performed at a given height, keeping the aspect ratio.
height The height of the new image, in pixels (if omitted, is set to 0). If 0, then scaling is done for a given width, keeping the aspect ratio.
resampling algorithm:
lanczos Lanczos (used by default if omitted)
bicubic Bicubic
bilinear Bilinear
box Box
nn Nearest Neighbor
resize_to_fit Resizes the image by fitting it into the specified rectangle, using the specified algorithm, keeping the aspect ratio.
Parameter Description
width The width of the new image, in pixels (if omitted, is set to 0).
height The height of the new image, in pixels (if omitted, is set to 0).
resampling algorithm:
lanczos Lanczos (used by default if omitted)
bicubic Bicubic
bilinear Bilinear
box Box
nn Nearest Neighbor
crop Crops the image to a new image with a given width and height.
Parameter Description
x The coordinate of the origin point for cropping along the X-axis (if omitted, it is set to 0).
y The coordinate of the origin point for cropping along the Y-axis (if omitted, it is set to 0).
width Frame width, in pixels.
height Frame height, in pixels.

Below is an example of how to make a thumbnail of the image with the preservation of the aspect ratio:

                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # RESIZE IMAGE TO THE THUMBNAIL WITH WIDTH OF 50 PX
            - resize:
                width: 50
                        
                    

Save the Image

After processing, we need to save the image. You can save the image back to the register, to a local file on the computer (available only in the compiled digger), to the cloud storage (currently supported: Amazon S3, Yandex Object Storage) or FTP server. Save command is used to save the image.

The command supports following parameters:

Parameter Description
ext An extension that defines the type of file being saved. If omitted, the original type and extension are used.                                     The following extensions are currently supported for saving: jpg, png, gif, tif and bmp.
to The type of storage. The following types are currently supported: register, file, s3, yandex and ftp.

The register type does not use any additional parameters. The image is encoded to Base64 and saved to                         register.

                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # SAVE IMAGE AS JPG TO THE REGISTER
            - save:
                ext: jpg
                to: register
        # NOW WE HAVE AN IMAGE IN THE REGISTER
        # SAVE IT TO THE VARIABLE
        - variable_set: newimage
                        
                    

The file type saves the image to a local drive. This type will work only in compiled scrapers. When using storage of this type, the following parameters are required:

Parameter Description
name Filename without an extension. If not specified, a unique name will be generated.
path A path to the directory where you want to save the file. If not specified, the file will be saved to the current directory.
                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # SAVE IMAGE TO THE FILE (e://myimages/mylogo.png)
            - save:
                to: file
                name: mylogo
                path: 'e://myimages'
                        
                    

The s3 type saves the image to the Amazon S3 cloud storage. When using storage of this type, the following parameters are required:

Parameter Description
key AWS S3 access key. Mandatory.
secret AWS S3 secret. Mandatory.
region AWS S3 region. Mandatory.
bucket AWS S3 bucket name. Mandatory.
token AWS S3 token. Optional.
name Filename without an extension. If not specified, a unique name will be generated.
path A path to the directory where you want to save the file. If not specified, the file will be saved to the root of the bucket.
                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # SAVE IMAGE TO THE S3 STORAGE (/logos/mylogo.png)
            - save:
                to: s3
                key: AWSAJJDJJSJDJDJFK
                secret: AWSSERETTDHFJJJDJSKFJFJSJJFJJGKRI
                region: us-east-1
                bucket: mybucket
                name: mylogo
                path: '/logos'
                        
                    

The yandex type saves the image to the Yandex Object Storage. When using storage of this type, the following parameters are required:

Parameter Description
key Yandex Object Storage access key. Mandatory.
secret Yandex Object Storage secret. Mandatory.
region Yandex Object Storage region. Mandatory.
bucket Yandex Object Storage bucket name. Mandatory.
token Yandex Object Storage token. Optional.
name Filename without an extension. If not specified, a unique name will be generated.
path A path to the directory where you want to save the file. If not specified, the file will be saved to the root of the bucket.
                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # SAVE IMAGE TO THE YANDEX STORAGE (/logos/mylogo.png)
            - save:
                to: yandex
                key: AWSAJJDJJSJDJDJFK
                secret: AWSSERETTDHFJJJDJSKFJFJSJJFJJGKRI
                region: ru-central1
                bucket: mybucket
                name: mylogo
                path: '/logos'
                        
                    

The ftp type saves the image to the FTP server. When using storage of this type, the following parameters are required:

Parameter Description
host IP address or hostname of the FTP server. Mandatory.
port Port of the FTP server, if omitted, default port 21 is used. Optional.
username FTP server username, if omitted empty username will be used. Optional.
password FTP server password, if omitted empty password will be used. Optional.
name Filename without an extension. If not specified, a unique name will be generated.
path A path to the directory where you want to save the image. If not specified, the image will be saved to the current directory after user is logged in to the FTP server.
                        # LOAD IMAGE
- walk:
    to: https://www.diggernaut.com/static/site/images/logo_light_beta.png
    do:
    # FIND THE BLOCK WITH THE BASE64 ENCODED IMAGE
    - find:
        path: imgbase64
        do:
        # SCRAPE THE CONTENT
        - parse
        # SWITCH TO THE IMAGE CONTEXT
        - image:
            do:
            # SAVE IMAGE TO THE FTP SERVER (logos/mylogo.png)
            - save:
                to: ftp
                host: ftp.mywebsite.com
                port: 21
                username: mylogin
                password: mypassword
                name: mylogo
                path: 'logos'
                        
                    

Please note that when you run digger in the cloud and save images to cloud storages or FTP server, a bandwidth quota is used that corresponds to your subscription plan. For example, on a free plan, the quota is 10 megabytes per month. Upon reaching the quota, the images will no longer be stored in the cloud storage / FTP. Also, the images will not be saved if your digger is in debug mode.

Now its time to find out how to work with binary files.