{"id":397,"date":"2018-02-11T02:30:06","date_gmt":"2018-02-11T02:30:06","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=397"},"modified":"2019-01-12T15:38:31","modified_gmt":"2019-01-12T15:38:31","slug":"scrape-product-price-information-cartier-website","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/scrape-product-price-information-cartier-website\/","title":{"rendered":"How to scrape product and price information from Cartier website"},"content":{"rendered":"<p>In this article we are going to share information on how to scrape product and price information from Cartier website. Cartier &#8211; the famous French House that produce and sell jewelry and watches. It was founded in 1847 by Louis-Francois Cartier as a small workshop. Popularity came to him in 1867 after the World Exhibition in Paris, and since then the products of this brand are highly valued all over the world.<\/p>\n<p><strong>Approx number of goods:<\/strong> 2000<br>\n<strong>Approx number of page requests:<\/strong> 2000<br>\n<strong>Recommended subscription plan:<\/strong> Free<\/p>\n<p><strong>PLEASE NOTE!<\/strong> The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.<\/p>\n<h3>How to use the web scraper to extract data about goods and prices from cartier.com<\/h3>\n<p>To use the web scraper for Cartier store website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:<\/p>\n<ol>\n<li>Go through this <a href=\"https:\/\/www.diggernaut.com\/accounts\/signup\/\">registration link<\/a> to open free account with <a href=\"https:\/\/www.diggernaut.com\">Diggernaut<\/a><\/li>\n<li>After registering and confirming the email address, you will need to <a href=\"https:\/\/www.diggernaut.com\/accounts\/login\/\">log in to your account<\/a><\/li>\n<li>Create a project with any name and description, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-project.html\">documentation<\/a><\/li>\n<li>Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-digger.html\">documentation<\/a><\/li>\n<li>Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-digger-config.html\">documentation<\/a><\/li>\n<li>Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-edit-digger.html\">documentation<\/a><\/li>\n<li>Run your digger and wait until the completion, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-run-digger.html\">documentation<\/a><\/li>\n<li>Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-scraped-data.html\">documentation<\/a><\/li>\n<\/ol>\n<p>You can also setup a schedule for running your scraper and collect data regularly.<\/p>\n<h3>Scraping configuration for the digger<\/h3>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig:\n    debug: 2\n    agent: Firefox\ndo:\n- walk:\n    to: http:\/\/www.cartier.com\/en-us\/collections.html\n    do:\n    - find:\n        path: ul.c-navigation__ulist a\n        do:\n        - parse:\n            attr: href\n            filter: ^([^\\?]+)\n        - space_dedupe\n        - trim\n        - normalize:\n            routine: replace_matched\n            args:\n                javascript\\:: &#039;&#039;\n        - if:\n            match: \\s*[a-z]+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: catalog\n- walk:\n    to: links\n    pool: catalog\n    do:\n    - sleep: 2\n    - find:\n        path: a.c-collection-link\n        do:\n        - parse:\n            attr: href\n            filter: ^([^\\?]+)\n        - space_dedupe\n        - trim\n        - normalize:\n            routine: replace_matched\n            args:\n                javascript\\:: &#039;&#039;\n        - if:\n            match: \\s*[a-z]+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: catalog\n    - find:\n        path: a.prod-link\n        do:\n        - parse:\n            attr: href\n            filter: ^([^\\?]+)\n        - space_dedupe\n        - trim\n        - normalize:\n            routine: replace_matched\n            args:\n                javascript\\:: &#039;&#039;\n        - if:\n            match: \\s*[a-z]+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: pages\n- walk:\n    to: links\n    pool: pages\n    do:\n    - sleep: 2\n    - find:\n        path: div.main-container\n        do:\n        - variable_clear: desc\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find:\n            path: span.c-pdp__cta-section--product-title\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set:\n                object: product\n                field: name\n        - register_set: Cartier\n        - object_field_set:\n            object: product\n            field: brand\n        - find:\n            path: div.c-pdp__cta-section--product-ref-id&gt;span\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - variable_set: pid\n                - object_field_set:\n                    object: product\n                    field: sku\n        - find:\n            in: doc\n            path: meta[property=&quot;description&quot;]\n            do:\n            - parse:\n                attr: content\n            - space_dedupe\n            - trim\n            - variable_set: desc\n        - find:\n            path: div.c-pdp__desc--content\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - variable_set: desc\n        - variable_get: desc\n        - object_field_set:\n            object: product\n            field: description\n        - find:\n            path: div.c-pdp__cta-section--product-price\n            do:\n            - find:\n                path: div.price\n                do:\n                - parse\n                - normalize:\n                    routine: replace_matched\n                    args:\n                        \\$: USD\n                - object_field_set:\n                    object: product\n                    field: currency\n                - parse:\n                    filter:\n                    - ([0-9\\.\\,]+)\\s*-\n                    - ([0-9\\.\\,]+)\n                - normalize:\n                    routine: replace_substring\n                    args:\n                        \\,: &#039;&#039;\n                - space_dedupe\n                - trim\n                - object_field_set:\n                    object: product\n                    type: float\n                    field: price\n        - find:\n            path: ul.c-breadcrumb__list&gt;li.c-breadcrumb__list-item&gt;a\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - normalize:\n                routine: replace_matched\n                args:\n                    Collections: &#039;&#039;\n                    Categories: &#039;&#039;\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: categories\n        - find:\n            path: div.c-pdp__image--wrapper\n            do:\n            - parse:\n                attr: data-src\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - normalize:\n                    routine: url\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: images\n        - object_save:\n            name: product<\/code><\/pre>\n<h3>Sample of scraped data<\/h3>\n<p>Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.<\/p>\n<pre><code class=\"language-js\">[{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Cartier&quot;,\n        &quot;categories&quot;: &quot;Watches|Women&#039;s watches|Crash&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T10:58:53.896Z&quot;,\n        &quot;description&quot;: &quot;Created in 1967 in *Swinging London*, the Crash watch expresses the sparkling, carefree spirit of an era that was all about complete freedom. The unlikely design of this watch could only have been conceived by Cartier, the great maker of shaped watches. Passionate and in touch with the spirit of the times, it sought to create a unique watch that would capture the joyous burst of rebellion and pop culture that shook up the conformism of the time.&quot;,\n        &quot;images&quot;: &quot;http:\/\/www.cartier.com\/content\/dam\/rcq\/car\/59\/37\/24\/593724.png|http:\/\/www.cartier.com\/content\/dam\/rcq\/car\/59\/29\/55\/592955.png&quot;,\n        &quot;name&quot;: &quot;Crash watch&quot;,\n        &quot;price&quot;: 133000,\n        &quot;sku&quot;: &quot;HPI00654&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.cartier.com\/en-us\/collections\/watches\/womens-watches\/crash\/hpi00654-crash-watch.html&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Cartier&quot;,\n        &quot;categories&quot;: &quot;Watches|Gifts|Cartier Classics&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T10:58:57.333Z&quot;,\n        &quot;description&quot;: &quot;Louis Cartier created the Santos watch in 1904, sealing his friendship with the aviator Alberto Santos Dumont. The famous aviator&#039;s wish was granted: he could check the time while flying. The dial&#039;s rounded angles and exposed screws made this an iconic timepiece. Cartier marked the centenary of the watch with the introduction of a new version.&quot;,\n        &quot;images&quot;: &quot;http:\/\/www.cartier.com\/content\/dam\/rcq\/car\/58\/46\/40\/584640.png|http:\/\/www.cartier.com\/content\/dam\/rcq\/car\/15\/35\/39\/2\/1535392.png&quot;,\n        &quot;name&quot;: &quot;Santos 100 watch&quot;,\n        &quot;price&quot;: 7000,\n        &quot;sku&quot;: &quot;W20073X8&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.cartier.com\/en-us\/collections\/watches\/selections\/cartier-classics\/w20073x8-santos-100-watch.html&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Cartier&quot;,\n        &quot;categories&quot;: &quot;Watches|Gifts|Cartier Classics&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T10:59:00.589Z&quot;,\n        &quot;description&quot;: &quot;The Tank story takes an unexpected turn with the Tank Anglaise. This variation of the distinctive features of the Tank recreates the perfect alignment of the original thanks to a winding mechanism seamlessly incorporated into the case. Featuring a concentrated form and reinforced lines, the streamlined design reinterprets the original model and gives it a new dimension.&quot;,\n        &quot;images&quot;: &quot;http:\/\/www.cartier.com\/content\/dam\/rcq\/car\/10\/28\/14\/2\/1028142.png&quot;,\n        &quot;name&quot;: &quot;Tank Anglaise watch&quot;,\n        &quot;price&quot;: 9100,\n        &quot;sku&quot;: &quot;W5310047&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.cartier.com\/en-us\/collections\/watches\/selections\/cartier-classics\/w5310047-tank-anglaise-watch.html&quot;\n    }\n}]\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>In this article we are going to share information on how to scrape product and price information from Cartier website. Cartier &#8211; the famous French House that produce and sell jewelry and watches. It was founded in 1847 by Louis-Francois Cartier as a small workshop. Popularity came to him in 1867 after the World Exhibition [&hellip;]<\/p>","protected":false},"author":4,"featured_media":399,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31,30,2],"tags":[],"class_list":["post-397","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ecommerce-scraping","category-free-scrapers","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=397"}],"version-history":[{"count":3,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/397\/revisions"}],"predecessor-version":[{"id":641,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/397\/revisions\/641"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/399"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}