Mikhail Sisin Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

How to scrape product and price information from Cartier website

4 min read

How to scrape product and price information from Cartier website

In this article we are going to share information on how to scrape product and price information from Cartier website. Cartier – the famous French House that produce and sell jewelry and watches. It was founded in 1847 by Louis-Francois Cartier as a small workshop. Popularity came to him in 1867 after the World Exhibition in Paris, and since then the products of this brand are highly valued all over the world.

Approx number of goods: 2000
Approx number of page requests: 2000
Recommended subscription plan: Free

PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.

How to use the web scraper to extract data about goods and prices from cartier.com

To use the web scraper for Cartier store website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:

  1. Go through this registration link to open free account with Diggernaut
  2. After registering and confirming the email address, you will need to log in to your account
  3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
  4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
  5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
  6. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
  7. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
  8. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

Scraping configuration for the digger

---
config:
    debug: 2
    agent: Firefox
do:
- walk:
    to: http://www.cartier.com/en-us/collections.html
    do:
    - find:
        path: ul.c-navigation__ulist a
        do:
        - parse:
            attr: href
            filter: ^([^\?]+)
        - space_dedupe
        - trim
        - normalize:
            routine: replace_matched
            args:
                javascript\:: ''
        - if:
            match: \s*[a-z]+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: catalog
- walk:
    to: links
    pool: catalog
    do:
    - sleep: 2
    - find:
        path: a.c-collection-link
        do:
        - parse:
            attr: href
            filter: ^([^\?]+)
        - space_dedupe
        - trim
        - normalize:
            routine: replace_matched
            args:
                javascript\:: ''
        - if:
            match: \s*[a-z]+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: catalog
    - find:
        path: a.prod-link
        do:
        - parse:
            attr: href
            filter: ^([^\?]+)
        - space_dedupe
        - trim
        - normalize:
            routine: replace_matched
            args:
                javascript\:: ''
        - if:
            match: \s*[a-z]+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: pages
- walk:
    to: links
    pool: pages
    do:
    - sleep: 2
    - find:
        path: div.main-container
        do:
        - variable_clear: desc
        - object_new: product
        - eval:
            routine: js
            body: '(function (){var d = new Date(); return d.toISOString()})();'
        - object_field_set:
            object: product
            field: date
        - static_get: url
        - object_field_set:
            object: product
            field: url
        - find:
            path: span.c-pdp__cta-section--product-title
            do:
            - parse
            - space_dedupe
            - trim
            - object_field_set:
                object: product
                field: name
        - register_set: Cartier
        - object_field_set:
            object: product
            field: brand
        - find:
            path: div.c-pdp__cta-section--product-ref-id>span
            do:
            - parse
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - variable_set: pid
                - object_field_set:
                    object: product
                    field: sku
        - find:
            in: doc
            path: meta[property="description"]
            do:
            - parse:
                attr: content
            - space_dedupe
            - trim
            - variable_set: desc
        - find:
            path: div.c-pdp__desc--content
            do:
            - parse
            - space_dedupe
            - trim
            - variable_set: desc
        - variable_get: desc
        - object_field_set:
            object: product
            field: description
        - find:
            path: div.c-pdp__cta-section--product-price
            do:
            - find:
                path: div.price
                do:
                - parse
                - normalize:
                    routine: replace_matched
                    args:
                        \$: USD
                - object_field_set:
                    object: product
                    field: currency
                - parse:
                    filter:
                    - ([0-9\.\,]+)\s*-
                    - ([0-9\.\,]+)
                - normalize:
                    routine: replace_substring
                    args:
                        \,: ''
                - space_dedupe
                - trim
                - object_field_set:
                    object: product
                    type: float
                    field: price
        - find:
            path: ul.c-breadcrumb__list>li.c-breadcrumb__list-item>a
            do:
            - parse
            - space_dedupe
            - trim
            - normalize:
                routine: replace_matched
                args:
                    Collections: ''
                    Categories: ''
            - if:
                match: \w+
                do:
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: categories
        - find:
            path: div.c-pdp__image--wrapper
            do:
            - parse:
                attr: data-src
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - normalize:
                    routine: url
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: images
        - object_save:
            name: product

Sample of scraped data

Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.

[{
    "product": {
        "brand": "Cartier",
        "categories": "Watches|Women's watches|Crash",
        "currency": "USD",
        "date": "2017-12-27T10:58:53.896Z",
        "description": "Created in 1967 in *Swinging London*, the Crash watch expresses the sparkling, carefree spirit of an era that was all about complete freedom. The unlikely design of this watch could only have been conceived by Cartier, the great maker of shaped watches. Passionate and in touch with the spirit of the times, it sought to create a unique watch that would capture the joyous burst of rebellion and pop culture that shook up the conformism of the time.",
        "images": "http://www.cartier.com/content/dam/rcq/car/59/37/24/593724.png|http://www.cartier.com/content/dam/rcq/car/59/29/55/592955.png",
        "name": "Crash watch",
        "price": 133000,
        "sku": "HPI00654",
        "url": "http://www.cartier.com/en-us/collections/watches/womens-watches/crash/hpi00654-crash-watch.html"
    }
}
,{
    "product": {
        "brand": "Cartier",
        "categories": "Watches|Gifts|Cartier Classics",
        "currency": "USD",
        "date": "2017-12-27T10:58:57.333Z",
        "description": "Louis Cartier created the Santos watch in 1904, sealing his friendship with the aviator Alberto Santos Dumont. The famous aviator's wish was granted: he could check the time while flying. The dial's rounded angles and exposed screws made this an iconic timepiece. Cartier marked the centenary of the watch with the introduction of a new version.",
        "images": "http://www.cartier.com/content/dam/rcq/car/58/46/40/584640.png|http://www.cartier.com/content/dam/rcq/car/15/35/39/2/1535392.png",
        "name": "Santos 100 watch",
        "price": 7000,
        "sku": "W20073X8",
        "url": "http://www.cartier.com/en-us/collections/watches/selections/cartier-classics/w20073x8-santos-100-watch.html"
    }
}
,{
    "product": {
        "brand": "Cartier",
        "categories": "Watches|Gifts|Cartier Classics",
        "currency": "USD",
        "date": "2017-12-27T10:59:00.589Z",
        "description": "The Tank story takes an unexpected turn with the Tank Anglaise. This variation of the distinctive features of the Tank recreates the perfect alignment of the original thanks to a winding mechanism seamlessly incorporated into the case. Featuring a concentrated form and reinforced lines, the streamlined design reinterprets the original model and gives it a new dimension.",
        "images": "http://www.cartier.com/content/dam/rcq/car/10/28/14/2/1028142.png",
        "name": "Tank Anglaise watch",
        "price": 9100,
        "sku": "W5310047",
        "url": "http://www.cartier.com/en-us/collections/watches/selections/cartier-classics/w5310047-tank-anglaise-watch.html"
    }
}]
Mikhail Sisin Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

Leave a Reply

Your email address will not be published. Required fields are marked *