{"id":401,"date":"2018-02-11T02:45:55","date_gmt":"2018-02-11T02:45:55","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=401"},"modified":"2019-01-12T15:28:21","modified_gmt":"2019-01-12T15:28:21","slug":"scraping-products-prices-images-chanel-com","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/scraping-products-prices-images-chanel-com\/","title":{"rendered":"Scraping products, prices and images from chanel.com"},"content":{"rendered":"<p>Our cloud web scraping platform Diggernaut can help you with scraping products, prices, and images from the chanel.com online store. You can use scraper listed in this article. The company Chanel was founded in the early twentieth century by fashion designer Coco Chanel. The first boutique was opened in 1910 in Paris. In 1924 the company launched the production of perfumes. At the moment, Chanel specializes in the production and sale of clothing, luxury goods, perfumes, and cosmetics.<\/p>\n<p><strong>Approx number of goods:<\/strong> 1000<br>\n<strong>Approx number of page requests:<\/strong> 1000<br>\n<strong>Recommended subscription plan:<\/strong> Free<\/p>\n<p><strong>PLEASE NOTE!<\/strong> The number of requests can exceed the number of products because data about variations, images, etc. can be scraped from other resources and spend additional requests. Also, part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.<\/p>\n<h3>How to use the web scraper to extract data about goods and prices from chanel.com<\/h3>\n<p>To use the web scraper for Chanel store website, you must have an account with our Diggernaut service. You can follow this comprehensive guide:<\/p>\n<ol>\n<li>Go through this <a href=\"https:\/\/www.diggernaut.com\/accounts\/signup\/\">registration link<\/a> to open free account with <a href=\"https:\/\/www.diggernaut.com\">Diggernaut<\/a><\/li>\n<li>After registering and confirming the email address, you will need to <a href=\"https:\/\/www.diggernaut.com\/accounts\/login\/\">log in to your account<\/a><\/li>\n<li>Create a project with any name and description, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-project.html\">documentation<\/a><\/li>\n<li>Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-digger.html\">documentation<\/a><\/li>\n<li>Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-digger-config.html\">documentation<\/a><\/li>\n<li>Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-edit-digger.html\">documentation<\/a><\/li>\n<li>Run your digger and wait until the completion, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-run-digger.html\">documentation<\/a><\/li>\n<li>Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-scraped-data.html\">documentation<\/a><\/li>\n<\/ol>\n<p>You can also setup a schedule for running your scraper and collect data regularly.<\/p>\n<h3>Scraping configuration for the digger<\/h3>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig:\n    debug: 2\n    agent: Firefox\ndo:\n- link_add:\n    pool: beauty\n    url:\n    - https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/fragrance-beauty-skincare-140910\n- walk:\n    to: links\n    pool: beauty\n    do:\n    - find:\n        path: a.product-link\n        do:\n        - parse:\n            attr: href\n        - if:\n            match: \\w+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: beauty\n    - find:\n        path: a:haschild(div.top_header[role=&quot;button&quot;])\n        do:\n        - parse:\n            attr: href\n        - if:\n            match: \\w+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: beauty\n    - find:\n        path: form.product_container ul.unstyled>li.img>a\n        do:\n        - parse:\n            attr: href\n        - if:\n            match: \\w+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: beautypages\n- walk:\n    to: links\n    pool: beautypages\n    do:\n    - sleep: 2\n    - find:\n        path: &#039;div#contentContainer&#039;\n        do:\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find:\n            path: h1[itemprop=&quot;name&quot;]\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: name\n        - find:\n            path: div.cc-sku-selector-dropdown select>option\n            slice: 0\n            do:\n            - parse:\n                attr: value\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: sku\n        - register_set: Chanel\n        - object_field_set:\n            object: product\n            field: brand\n        - find:\n            path: div.cc-product-options-price\n            do:\n            - find:\n                path: span[itemprop=&quot;priceCurrency&quot;]\n                do:\n                - parse:\n                    attr: content\n                - space_dedupe\n                - trim\n                - object_field_set:\n                    object: product\n                    field: currency\n            - find:\n                path: span[itemprop=&quot;price&quot;]\n                do:\n                - parse:\n                    filter: (\\d+\\.\\d+)\n                - space_dedupe\n                - trim\n                - object_field_set:\n                    object: product\n                    type: float\n                    field: price\n        - find:\n            path: div.cc-sku-selector-dropdown select>option\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: variations\n        - find:\n            in: doc\n            path: script:contains(&#039;window.__CC_STATE__&#039;)\n            do:\n            - parse:\n                filter: window\\.__CC_STATE__\\s*\\=\\s*(.+)\\;\n            - normalize:\n                routine: json2xml\n            - to_block\n            - find:\n                path: images src\n                do:\n                - parse\n                - if:\n                    match: \\w+\n                    do:\n                    - normalize:\n                        routine: url\n                    - object_field_set:\n                        object: product\n                        joinby: &quot;|&quot;\n                        field: images\n            - find:\n                path: product>description\n                slice: 0\n                do:\n                - parse\n                - to_block\n                - find:\n                    path: p\n                    slice: 1\n                    do:\n                    - parse\n                    - space_dedupe\n                    - trim\n                    - if:\n                        match: \\w+\n                        do:\n                        - object_field_set:\n                            object: product\n                            field: description\n        - find:\n            path: span.breadcrumb>a\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: categories\n        - object_save:\n            name: product\n- link_add:\n    pool: sun\n    url:\n    - https:\/\/www.chanel.com\/en_US\/fashion\/sunglasses\/products\/\n- walk:\n    to: links\n    pool: sun\n    do:\n    - find:\n        path: a.ui-pagination-next\n        do:\n        - parse:\n            attr: href\n        - if:\n            match: \\w+\n            do:\n            - normalize:\n                routine: url\n            - link_add:\n                pool: sun\n    - find:\n        path: ul.product-list>li\n        do:\n        - find:\n            path: li.item\n            slice: 0\n            do:\n            - find:\n                path: a\n                do:\n                - parse:\n                    attr: href\n                - if:\n                    match: \\w+\n                    do:\n                    - normalize:\n                        routine: url\n                    - link_add:\n                        pool: sunpages\n- walk:\n    to: links\n    pool: sunpages\n    do:\n    - sleep: 2\n    - find:\n        path: main[role=&quot;main&quot;]\n        do:\n        - variable_clear: pid\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find:\n            path: h1.tt-1\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: name\n        - find:\n            path: input[name=&quot;pdt-sku&quot;]\n            do:\n            - parse:\n                attr: value\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - variable_set: pid\n                - object_field_set:\n                    object: product\n                    field: sku\n        - register_set: Chanel\n        - object_field_set:\n            object: product\n            field: brand\n        - find:\n            path: span[property=&quot;price&quot;]\n            do:\n            - parse:\n                filter: (\\d+)\n            - object_field_set:\n                object: product\n                type: float\n                field: price\n            - parse\n            - normalize:\n                routine: replace_matched\n                args:\n                    \\$: USD\n            - object_field_set:\n                object: product\n                field: currency\n        - find:\n            path: select[data-select=&quot;pdt-color&quot;]>option\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: variations\n        - find:\n            in: doc\n            path: meta[name=&quot;description&quot;]\n            do:\n            - parse:\n                attr: content\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: description\n        - find:\n            path: div.breadcrumb>a\n            slice: 1:-2\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: categories\n        - walk:\n            to: https:\/\/www.chanel.com\/en_US\/fashion\/sunglasses\/pdpjson\/\/product\n            do:\n            - find:\n                path: script\n                do:\n                - parse\n                - normalize:\n                    routine: replace_substring\n                    args:\n                        ^window\\.: &#039;&#039;\n                - to_block\n                - parse\n                - eval:\n                    routine: js\n                    body: (function () {var ; return JSON.stringify(product);})();\n                - normalize:\n                    routine: json2xml\n                - to_block\n                - find:\n                    path: zoom\n                    do:\n                    - parse:\n                        filter: ^(\\S+)\n                    - if:\n                        match: \\w+\n                        do:\n                        - normalize:\n                            routine: url\n                        - object_field_set:\n                            object: product\n                            joinby: &quot;|&quot;\n                            field: images\n        - object_save:\n            name: product\n- link_add:\n    url:\n    - https:\/\/www.chanel.com\/en_US\/watches-jewelry\/fine-jewelry\/collections\n    - https:\/\/www.chanel.com\/en_US\/watches-jewelry\/watches\/collections\n- walk:\n    to: links\n    do:\n    - find:\n        path: div.product-item-wrapper>a\n        do:\n        - parse:\n            attr: href\n        - register_set: ?show=All\n        - walk:\n            to: value\n            do:\n            - find:\n                path: div.product-item-wrapper>a\n                do:\n                - parse:\n                    attr: href\n                - if:\n                    match: \\w+\n                    do:\n                    - normalize:\n                        routine: url\n                    - link_add:\n                        pool: pages\n- walk:\n    to: links\n    pool: pages\n    do:\n    - sleep: 2\n    - find:\n        path: &#039;main#page-content&#039;\n        do:\n        - variable_clear: pid\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find:\n            path: dl>dt:contains(&quot;Name:&quot;)+dd\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: name\n        - find:\n            path: dl>dt:contains(&quot;Reference:&quot;)+dd\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - variable_set: pid\n                - object_field_set:\n                    object: product\n                    field: sku\n        - register_set: Chanel\n        - object_field_set:\n            object: product\n            field: brand\n        - find:\n            path: product_price\n            do:\n            - parse\n            - object_field_set:\n                object: product\n                type: float\n                field: price\n            - register_set: USD\n            - object_field_set:\n                object: product\n                field: currency\n        - find:\n            in: doc\n            path: meta[name=&quot;description&quot;]\n            do:\n            - parse:\n                attr: content\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w+\n                do:\n                - object_field_set:\n                    object: product\n                    field: description\n        - find:\n            path: &#039;nav#breadcrumb>ul>li:not(.visually-hidden)>a&#039;\n            slice: 1:-1\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: categories\n        - find:\n            path: div.product-images figure>a\n            do:\n            - parse:\n                attr: href\n            - space_dedupe\n            - trim\n            - if:\n                match: \\w{2,}\n                do:\n                - normalize:\n                    routine: url\n                - object_field_set:\n                    object: product\n                    joinby: &quot;|&quot;\n                    field: images\n        - object_save:\n            name: product<\/code><\/pre>\n<h3>Sample of scraped data<\/h3>\n<p>Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.<\/p>\n<pre><code class=\"language-js\">[{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Chanel&quot;,\n        &quot;categories&quot;: &quot;Fragrance|Women|Allure Sensuelle&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T12:44:54.948Z&quot;,\n        &quot;description&quot;: &quot;Like the charismatic, passionate presence of Gabrielle Chanel, ALLURE SENSUELLE is the modern, magnetic fragrance of a true, radiant and intense woman. The floral-soft-Oriental fragrance is revealed in a unique way on every woman \u0432\u0402\u201d because each woman has her own special allure.&quot;,\n        &quot;images&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P129710\/S129710_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P129710\/S129720_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P129710\/S129730_XLARGE.jpg&quot;,\n        &quot;name&quot;: &quot;ALLURE SENSUELLE EAU DE PARFUM SPRAY&quot;,\n        &quot;price&quot;: 130,\n        &quot;sku&quot;: &quot;88316&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/fragrance-allure-sensuelle-allure-sensuelle-88314&quot;,\n        &quot;variations&quot;: &quot;3.4 FL. OZ.|1.7 FL. OZ.|1.2 FL. OZ.&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Chanel&quot;,\n        &quot;categories&quot;: &quot;Makeup|Lips|Lipstick&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T12:45:00.308Z&quot;,\n        &quot;description&quot;: &quot;The intensity of a lipstick, the shine of a lipgloss and the comfort of a lip balm \u0432\u0402\u201d all in one creamy yet lightweight formula.&quot;,\n        &quot;images&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170202_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170206_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170208_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170212_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170214_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170216_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170218_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170217_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170222_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170224_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P170202\/S170227_XLARGE.jpg&quot;,\n        &quot;name&quot;: &quot;ROUGE COCO STYLO COMPLETE CARE LIPSHINE&quot;,\n        &quot;price&quot;: 37,\n        &quot;sku&quot;: &quot;141754&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/makeup-lipstick-rouge-coco-stylo-140392&quot;,\n        &quot;variations&quot;: &quot;217 PANORAMA - Limited Edition|218 SCRIPT|216 LETTRE|202 CONTE|227 ESQUISSE - Limited Edition|222 FICTION|206 HISTOIRE|208 ROMAN|214 MESSAGE|224 M\u0413\u2030MOIRE|212 RECIT&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Chanel&quot;,\n        &quot;categories&quot;: &quot;Skincare|BY CATEGORY|Sun Protection&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T12:45:04.800Z&quot;,\n        &quot;description&quot;: &quot;A breakthrough daily sunscreen that features an adaptive skincare technology for tailor-made defense from UVA and UVB rays, free radicals and pollution.&quot;,\n        &quot;images&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P141836\/S141836_XLARGE.jpg&quot;,\n        &quot;name&quot;: &quot;UV ESSENTIEL Multi-Protection Daily Defense Sunscreen Anti-Pollution Broad Spectrum SPF 30&quot;,\n        &quot;price&quot;: 55,\n        &quot;sku&quot;: &quot;140249&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/skincare-sun-protection-uv-essentiel-140248&quot;,\n        &quot;variations&quot;: &quot;1 FL. OZ.&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Chanel&quot;,\n        &quot;categories&quot;: &quot;Makeup|Eyes|Mascara&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-27T12:45:09.444Z&quot;,\n        &quot;description&quot;: &quot;A high-precision waterproof mascara that achieves instant volume and intense colour in a single stroke.&quot;,\n        &quot;images&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P194210\/S194210_XLARGE.jpg|https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/cms2export\/Site1Files\/P194210\/S194220_XLARGE.jpg&quot;,\n        &quot;name&quot;: &quot;LE VOLUME DE CHANEL WATERPROOF MASCARA&quot;,\n        &quot;price&quot;: 32,\n        &quot;sku&quot;: &quot;139065&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.chanel.com\/en_US\/fragrance-beauty\/makeup-mascara-le-volume-de-chanel-waterproof-139064&quot;,\n        &quot;variations&quot;: &quot;10 NOIR|20 BRUN&quot;\n    }\n}]\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>Our cloud web scraping platform Diggernaut can help you with scraping products, prices, and images from the chanel.com online store. You can use scraper listed in this article. The company Chanel was founded in the early twentieth century by fashion designer Coco Chanel. The first boutique was opened in 1910 in Paris. In 1924 the [&hellip;]<\/p>","protected":false},"author":4,"featured_media":403,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27,31,2],"tags":[],"class_list":["post-401","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-diggernaut-engine","category-ecommerce-scraping","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/401","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=401"}],"version-history":[{"count":3,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions"}],"predecessor-version":[{"id":638,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/401\/revisions\/638"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/403"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=401"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=401"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=401"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}