{"id":304,"date":"2018-02-05T21:38:49","date_gmt":"2018-02-05T21:38:49","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=304"},"modified":"2019-01-12T17:00:38","modified_gmt":"2019-01-12T17:00:38","slug":"product-price-scraper-alexander-mcqueen-online-store","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/product-price-scraper-alexander-mcqueen-online-store\/","title":{"rendered":"Product and price scraper for Alexander McQueen online store"},"content":{"rendered":"<p>Alexander McQueen &#8211; was a famous British fashion designer and founded his own label and fashion house. This product and price scraper is designed to extract information about merchanise sold at the flagship online store of the fashion house alexandermcqueen.com.<\/p>\n<p><strong>Approx number of goods:<\/strong> 1000<br>\n<strong>Approx number of page requests:<\/strong> 2300<br>\n<strong>Recommended subscription plan:<\/strong> Free<\/p>\n<p><strong>PLEASE NOTE!<\/strong> The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.<\/p>\n<h3>How to use the web scraper to extract data about products and prices from alexandermcqueen.com<\/h3>\n<p>To use the web scraper for Alexander McQueen store&#8217;s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:<\/p>\n<ol>\n<li>Go through this <a href=\"https:\/\/www.diggernaut.com\/accounts\/signup\/\">registration link<\/a> to open free account with <a href=\"https:\/\/www.diggernaut.com\">Diggernaut<\/a><\/li>\n<li>After registering and confirming the email address, you will need to <a href=\"https:\/\/www.diggernaut.com\/accounts\/login\/\">log in to your account<\/a><\/li>\n<li>Create a project with any name and description, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-project.html\">documentation<\/a><\/li>\n<li>Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-digger.html\">documentation<\/a><\/li>\n<li>Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-digger-config.html\">documentation<\/a><\/li>\n<li>Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-edit-digger.html\">documentation<\/a><\/li>\n<li>Run your digger and wait until the completion, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-run-digger.html\">documentation<\/a><\/li>\n<li>Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-scraped-data.html\">documentation<\/a><\/li>\n<\/ol>\n<p>You can also setup a schedule for running your scraper and collect data regularly.<\/p>\n<h3>Scraping configuration for the digger<\/h3>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig:\n    debug: 2\n    agent: Firefox\ndo:\n- walk:\n    to: http:\/\/www.alexandermcqueen.com\/us\/\n    do:\n    - find:\n        path: ul.level-1&gt;li\n        do:\n        - variable_clear: cat1\n        - parse:\n            attr: id\n        - normalize:\n            routine: replace_matched\n            args:\n                shop_womenswear: Womens\n                shop_menswear: Mens\n                .+: &#039;&#039;\n        - variable_set: cat1\n        - find:\n            path: ul.level-2&gt;li\n            do:\n            - variable_clear: cat2\n            - find:\n                path: a\n                slice: 0\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - variable_set: cat2\n            - find:\n                path: ul.level-3&gt;li\n                do:\n                - variable_clear: cat3\n                - find:\n                    path: a\n                    slice: 0\n                    do:\n                    - parse\n                    - space_dedupe\n                    - trim\n                    - variable_set: cat3\n                    - parse:\n                        attr: href\n                    - space_dedupe\n                    - trim\n                    - if:\n                        match: \\w+\n                        do:\n                        - normalize:\n                            routine: url\n                        - walk:\n                            to: value\n                            do:\n                            - find:\n                                path: script:contains(&#039;yTos.navigation =&#039;)\n                                do:\n                                - parse:\n                                    filter: yTos\\.navigation\\s+\\=\\s+(.+)\\;\n                                - normalize:\n                                    routine: json2xml\n                                - to_block\n                                - find:\n                                    path: pathandqueryparsed:has(paramname:matches(^sitecode$))\n                                    do:\n                                    - find:\n                                        path: paramvalue\n                                        do:\n                                        - parse\n                                        - variable_set: sitecode\n                                - find:\n                                    path: pathandqueryparsed:has(paramname:matches(^dept$))\n                                    do:\n                                    - find:\n                                        path: paramvalue\n                                        do:\n                                        - parse\n                                        - variable_set: department\n                                - find:\n                                    path: pathandqueryparsed:has(paramname:matches(^season$))\n                                    do:\n                                    - find:\n                                        path: paramvalue\n                                        do:\n                                        - parse\n                                        - normalize:\n                                            routine: replace_substring\n                                            args:\n                                                \\,: &quot;%2C&quot;\n                                        - variable_set: season\n                                - find:\n                                    path: pathandqueryparsed:has(paramname:matches(^gender$))\n                                    do:\n                                    - find:\n                                        path: paramvalue\n                                        do:\n                                        - parse\n                                        - variable_set: gender\n                                - find:\n                                    path: pathandqueryparsed:has(paramname:matches(^yurirulename$))\n                                    do:\n                                    - find:\n                                        path: paramvalue\n                                        do:\n                                        - parse\n                                        - variable_set: yurirulename\n                                - walk:\n                                    to: http:\/\/www.alexandermcqueen.com\/Search\/RenderProducts?ytosQuery=true&amp;department=&amp;gender=&amp;season=&amp;yurirulename=&amp;page=1&amp;productsPerPage=1000&amp;suggestion=false&amp;totalPages=1&amp;partialLoadedItems=1000&amp;siteCode=\n                                    do:\n                                    - find:\n                                        path: article&gt;a\n                                        do:\n                                        - parse:\n                                            attr: href\n                                            filter: ^([^\\#]+)\n                                        - walk:\n                                            to: value\n                                            do:\n                                            - sleep: 2\n                                            - find:\n                                                path: article.item\n                                                do:\n                                                - variable_clear: pid\n                                                - variable_clear: cid\n                                                - object_new: product\n                                                - eval:\n                                                    routine: js\n                                                    body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n                                                - object_field_set:\n                                                    object: product\n                                                    field: date\n                                                - static_get: url\n                                                - object_field_set:\n                                                    object: product\n                                                    field: url\n                                                - register_set: Alexander McQueen\n                                                - object_field_set:\n                                                    object: product\n                                                    field: brand\n                                                - find:\n                                                    path: h2.modelName\n                                                    do:\n                                                    - parse\n                                                    - space_dedupe\n                                                    - trim\n                                                    - object_field_set:\n                                                        object: product\n                                                        field: name\n                                                - find:\n                                                    in: doc\n                                                    path: meta[name=&quot;description&quot;]\n                                                    do:\n                                                    - parse:\n                                                        attr: content\n                                                    - space_dedupe\n                                                    - trim\n                                                    - variable_set: desc\n                                                - find:\n                                                    path: div.descriptionsContainer&gt;div.EditorialDescription\n                                                    do:\n                                                    - parse\n                                                    - space_dedupe\n                                                    - trim\n                                                    - variable_set: desc\n                                                - variable_get: desc\n                                                - object_field_set:\n                                                    object: product\n                                                    field: description\n                                                - find:\n                                                    path: div.itemPriceContainer span.price\n                                                    slice: 0\n                                                    do:\n                                                    - find:\n                                                        path: span.currency\n                                                        do:\n                                                        - parse\n                                                        - normalize:\n                                                            routine: replace_matched\n                                                            args:\n                                                                \\$: USD\n                                                        - object_field_set:\n                                                            object: product\n                                                            field: currency\n                                                    - find:\n                                                        path: span.value\n                                                        do:\n                                                        - parse\n                                                        - normalize:\n                                                            routine: replace_substring\n                                                            args:\n                                                            - \\,: &#039;&#039;\n                                                            - \\s+: &#039;&#039;\n                                                        - object_field_set:\n                                                            object: product\n                                                            type: float\n                                                            field: price\n                                                - find:\n                                                    path: div.modelFabricColor&gt;span.value\n                                                    do:\n                                                    - parse\n                                                    - space_dedupe\n                                                    - trim\n                                                    - if:\n                                                        match: \\w+\n                                                        do:\n                                                        - variable_set: pid\n                                                        - object_field_set:\n                                                            object: product\n                                                            field: sku\n                                                - variable_get: cat1\n                                                - if:\n                                                    match: \\w{2,}\n                                                    do:\n                                                    - object_field_set:\n                                                        object: product\n                                                        joinby: &quot;|&quot;\n                                                        field: category\n                                                - variable_get: cat2\n                                                - if:\n                                                    match: \\w{2,}\n                                                    do:\n                                                    - object_field_set:\n                                                        object: product\n                                                        joinby: &quot;|&quot;\n                                                        field: category\n                                                - variable_get: cat3\n                                                - if:\n                                                    match: \\w{2,}\n                                                    do:\n                                                    - object_field_set:\n                                                        object: product\n                                                        joinby: &quot;|&quot;\n                                                        field: category\n                                                - find:\n                                                    path: ul.alternativeImages&gt;li&gt;img\n                                                    do:\n                                                    - parse:\n                                                        attr: srcset\n                                                    - to_block\n                                                    - split:\n                                                        context: text\n                                                        delimiter: \\,\\s*\n                                                    - find:\n                                                        path: div.splitted\n                                                        slice: 0\n                                                        do:\n                                                        - parse:\n                                                            filter: ^([^\\s]+)\n                                                        - object_field_set:\n                                                            object: product\n                                                            joinby: &quot;|&quot;\n                                                            field: images\n                                                - find:\n                                                    path: div.selectColor\n                                                    slice: 0\n                                                    do:\n                                                    - variable_clear: cod10\n                                                    - find:\n                                                        in: doc\n                                                        path: script:contains(&quot;yTos.navigation.itemData =&quot;)\n                                                        do:\n                                                        - parse:\n                                                            filter: yTos\\.navigation\\.itemData\\s+\\=\\s+(.+)\\;\n                                                        - normalize:\n                                                            routine: json2xml\n                                                        - to_block\n                                                        - find:\n                                                            path: cod10\n                                                            do:\n                                                            - parse\n                                                            - space_dedupe\n                                                            - trim\n                                                            - variable_set: cod10\n                                                            - walk:\n                                                                to: http:\/\/www.alexandermcqueen.com\/yTos\/api\/Plugins\/ItemPluginApi\/GetCombinationsAsync\/?siteCode=&amp;code10=\n                                                                do:\n                                                                - find:\n                                                                    path: body_safe&gt;colors\n                                                                    do:\n                                                                    - find:\n                                                                        path: description\n                                                                        do:\n                                                                        - parse\n                                                                        - space_dedupe\n                                                                        - trim\n                                                                        - if:\n                                                                            match: \\w+\n                                                                            do:\n                                                                            - object_field_set:\n                                                                                object: product\n                                                                                joinby: &quot;|&quot;\n                                                                                field: variations\n                                                - object_save:\n                                                    name: product<\/code><\/pre>\n<h3>Sample of scraped data<\/h3>\n<p>Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.<\/p>\n<pre><code class=\"language-js\">[{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Alexander McQueen&quot;,\n        &quot;category&quot;: &quot;Shop by|Sale&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T17:25:41.800Z&quot;,\n        &quot;description&quot;: &quot;Sleeveless black leather midi-length dress, in raw cut panels whip-stitched together. Featuring hand-applied, silver-plated metal eyelets, which are hand-laced with multicolored leather laces, left long as decorative fringe.&quot;,\n        &quot;images&quot;: &quot;https:\/\/cdn.yoox.biz\/items\/34\/34767065be_18_g_f.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767065be_18_g_r.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767065be_18_g_d.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767065be_18_g_e.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767065be_18_g_a.jpg&quot;,\n        &quot;name&quot;: &quot;Whip-Stitched Leather Dress&quot;,\n        &quot;price&quot;: 5699,\n        &quot;sku&quot;: &quot;493349Q5HLU1666&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.alexandermcqueen.com\/us\/alexandermcqueen\/long-dress_cod34767065be.html&quot;,\n        &quot;variations&quot;: &quot;BLACK&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Alexander McQueen&quot;,\n        &quot;category&quot;: &quot;Shop by|Sale&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T17:25:45.343Z&quot;,\n        &quot;description&quot;: &quot;Long-sleeved, crew neck Boucl\u0413&copy; knit dress with multicolored leather laces, laced through hand-applied silver-plated metal eyelets.&quot;,\n        &quot;images&quot;: &quot;https:\/\/cdn.yoox.biz\/items\/34\/34767074av_18_g_f.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767074av_18_g_r.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767074av_18_g_d.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767074av_18_g_e.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767074av_18_g_a.jpg&quot;,\n        &quot;name&quot;: &quot;Boucl\u0413&copy; Knit Long Dress with Leather Lacing&quot;,\n        &quot;price&quot;: 3179,\n        &quot;sku&quot;: &quot;493382Q1WHI1666&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.alexandermcqueen.com\/us\/alexandermcqueen\/long-dress_cod34767074av.html&quot;,\n        &quot;variations&quot;: &quot;BLACK&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Alexander McQueen&quot;,\n        &quot;category&quot;: &quot;Shop by|Sale&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T17:25:48.261Z&quot;,\n        &quot;description&quot;: &quot;Long ivory pliss\u0413&copy; knit dress with extra fine merino wool piping in red, finished with a decorative lurex cross stitch that is left long as frayed fringe on hem and shoulders. Roll neck and invisible zipper on center back.&quot;,\n        &quot;images&quot;: &quot;https:\/\/cdn.yoox.biz\/items\/34\/34767073al_18_g_f.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767073al_18_g_r.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767073al_18_g_d.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767073al_18_g_e.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767073al_18_g_a.jpg&quot;,\n        &quot;name&quot;: &quot;Long Knit Dress With Roll Neck&quot;,\n        &quot;price&quot;: 2444,\n        &quot;sku&quot;: &quot;493374Q1WHC9082&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.alexandermcqueen.com\/us\/alexandermcqueen\/long-dress_cod34767073al.html&quot;,\n        &quot;variations&quot;: &quot;IVORY\/RED&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Alexander McQueen&quot;,\n        &quot;category&quot;: &quot;Shop by|Sale&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T17:25:51.193Z&quot;,\n        &quot;description&quot;: &quot;Long black pliss\u0413&copy; knit dress with extra fine merino wool piping in red, finished with a decorative lurex cross stitch that is left long as frayed fringe. Featuring a roll neck and long voluminous balloon sleeves with an invisible zipper on center back.&quot;,\n        &quot;images&quot;: &quot;https:\/\/cdn.yoox.biz\/items\/34\/34767069qs_18_g_f.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767069qs_18_g_r.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767069qs_18_g_d.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767069qs_18_g_e.jpg|https:\/\/cdn.yoox.biz\/items\/34\/34767069qs_18_g_a.jpg&quot;,\n        &quot;name&quot;: &quot;Long-Sleeved Knit Dress With Roll Neck&quot;,\n        &quot;price&quot;: 2669,\n        &quot;sku&quot;: &quot;493373Q1WHB1056&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.alexandermcqueen.com\/us\/alexandermcqueen\/long-dress_cod34767069qs.html&quot;,\n        &quot;variations&quot;: &quot;BLACK\/RED&quot;\n    }\n}]\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>Alexander McQueen &#8211; was a famous British fashion designer and founded his own label and fashion house. This product and price scraper is designed to extract information about merchanise sold at the flagship online store of the fashion house alexandermcqueen.com. Approx number of goods: 1000 Approx number of page requests: 2300 Recommended subscription plan: Free [&hellip;]<\/p>","protected":false},"author":4,"featured_media":307,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31,30,2],"tags":[],"class_list":["post-304","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ecommerce-scraping","category-free-scrapers","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=304"}],"version-history":[{"count":4,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/304\/revisions"}],"predecessor-version":[{"id":658,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/304\/revisions\/658"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/307"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=304"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}