{"id":313,"date":"2018-02-07T10:05:25","date_gmt":"2018-02-07T10:05:25","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=313"},"modified":"2019-01-12T16:55:13","modified_gmt":"2019-01-12T16:55:13","slug":"extract-product-price-data-ann-taylor-diggernaut","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/extract-product-price-data-ann-taylor-diggernaut\/","title":{"rendered":"Extract product and price data from Ann Taylor with Diggernaut"},"content":{"rendered":"<p>Ann Taylor is an American chain of women&#8217;s clothing stores. Richard Libeskind opened the first Ann Taylor store in 1954 in New Haven, Connecticut. The name of the store came from the name of the dress, which was the most popular in his father&#8217;s store. This web scraper will help you to extract product and price data along with images from anntaylor.com website.<\/p>\n<p><strong>Approx number of goods:<\/strong> 2000<br>\n<strong>Approx number of page requests:<\/strong> 4000<br>\n<strong>Recommended subscription plan:<\/strong> Free<\/p>\n<p><strong>PLEASE NOTE!<\/strong> The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.<\/p>\n<h3>How to use the web scraper to extract data about products and prices from anntaylor.com<\/h3>\n<p>To use the web scraper for Ann Taylor store&#8217;s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:<\/p>\n<ol>\n<li>Go through this <a href=\"https:\/\/www.diggernaut.com\/accounts\/signup\/\">registration link<\/a> to open free account with <a href=\"https:\/\/www.diggernaut.com\">Diggernaut<\/a><\/li>\n<li>After registering and confirming the email address, you will need to <a href=\"https:\/\/www.diggernaut.com\/accounts\/login\/\">log in to your account<\/a><\/li>\n<li>Create a project with any name and description, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-project.html\">documentation<\/a><\/li>\n<li>Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-digger.html\">documentation<\/a><\/li>\n<li>Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-digger-config.html\">documentation<\/a><\/li>\n<li>Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-edit-digger.html\">documentation<\/a><\/li>\n<li>Run your digger and wait until the completion, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-run-digger.html\">documentation<\/a><\/li>\n<li>Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-scraped-data.html\">documentation<\/a><\/li>\n<\/ol>\n<p>You can also setup a schedule for running your scraper and collect data regularly.<\/p>\n<h3>Scraping configuration for the digger<\/h3>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig:\n    debug: 2\ndo:\n- pool_clear: pages\n- walk:\n    to: https:\/\/www.anntaylor.com\n    do:\n    - find:\n        path: nav.sub-nav a\n        do:\n        - variable_clear: did\n        - variable_set:\n            field: viewsnum\n            value: 0\n        - parse:\n            attr: data-id\n        - variable_set: did\n        - walk:\n            to: https:\/\/www.anntaylor.com\/ecws\/endecaService.jsp?SortByFacetSelectedValue=remove&amp;DocSortOrder=remove&amp;format=json&amp;catid=&amp;question=&amp;fRequest=true&amp;goToPage=1&amp;N=0&amp;categoryType=regular&amp;priceSort=DESC&amp;country=US&amp;currency=USD&amp;Submit=Submit\n            do:\n            - find:\n                path: resultslist&gt;pagination&gt;attributes&gt;pagesavailable\n                do:\n                - parse\n                - variable_set: viewsnum\n                - eval:\n                    routine: js\n                    body: (function(){var num = ;var str = &quot;&quot;;for(var i = num; i &gt; 0; i--){if (i != num){str += &quot;,&quot;}str += i} return &quot;&lt;div&gt;&quot;+str+&quot;&lt;\/div&gt;&quot;;})();\n                - to_block\n                - split:\n                    context: text\n                    delimiter: &quot;,&quot;\n                - find:\n                    path: div\n                    do:\n                    - variable_clear: pagenum\n                    - parse\n                    - variable_set: pagenum\n                    - link_add:\n                        url: https:\/\/www.anntaylor.com\/ecws\/endecaService.jsp?SortByFacetSelectedValue=remove&amp;DocSortOrder=remove&amp;format=json&amp;catid=&amp;question=&amp;fRequest=true&amp;goToPage=&amp;N=0&amp;categoryType=regular&amp;priceSort=DESC&amp;country=US&amp;currency=USD&amp;Submit=Submit\n                        pool: catalog\n- walk:\n    to: links\n    pool: catalog\n    do:\n    - find:\n        path: resultslist&gt;records&gt;records&gt;attributes&gt;quicklookurl\n        do:\n        - parse:\n            filter: ^([^\\?]+)\n        - normalize:\n            routine: url\n        - link_add:\n            pool: pages\n- walk:\n    to: links\n    pool: pages\n    do:\n    - sleep: 3\n    - find:\n        path: main\n        do:\n        - variable_clear: pid\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find:\n            path: h1[itemprop=&quot;name&quot;]\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set:\n                object: product\n                field: name\n        - register_set: Ann Taylor\n        - object_field_set:\n            object: product\n            field: brand\n        - find:\n            in: doc\n            path: script:contains(&quot;window.productSettings = &quot;)\n            do:\n            - parse:\n                filter: window\\.productSettings\\s+=\\s+(.+)\\s*\n            - normalize:\n                routine: json2xml\n            - to_block\n            - find:\n                path: body_safe&gt;currency\n                do:\n                - parse\n                - normalize:\n                    routine: replace_matched\n                    args:\n                        \\$: USD\n                - object_field_set:\n                    object: product\n                    field: currency\n            - find:\n                path: body_safe&gt;products&gt;listprice\n                do:\n                - parse\n                - object_field_set:\n                    object: product\n                    type: float\n                    field: price\n            - find:\n                path: body_safe&gt;prodid\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - variable_set: pid\n                - object_field_set:\n                    object: product\n                    field: sku\n            - find:\n                path: body_safe&gt;products&gt;skucolors&gt;colors\n                do:\n                - find:\n                    path: colorname\n                    do:\n                    - parse\n                    - space_dedupe\n                    - trim\n                    - if:\n                        match: \\w+\n                        do:\n                        - object_field_set:\n                            object: product\n                            joinby: &quot;|&quot;\n                            field: variations\n            - walk:\n                to: https:\/\/richmedia.channeladvisor.com\/ViewerDelivery\/productXmlService?profileid=52000652&amp;itemid=&amp;viewerid=196\n                do:\n                - find:\n                    path: img\n                    do:\n                    - parse:\n                        attr: path\n                    - normalize:\n                        routine: replace_substring\n                        args:\n                            \\&amp;recipeId\\=\\d+: &#039;&#039;\n                    - object_field_set:\n                        object: product\n                        joinby: &quot;|&quot;\n                        field: images\n            - find:\n                path: body_safe&gt;products&gt;weblongdescription\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - object_field_set:\n                    object: product\n                    field: description\n            - find:\n                path: body_safe&gt;products&gt;parentcategoryname\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - if:\n                    match: \\w+\n                    do:\n                    - object_field_set:\n                        object: product\n                        joinby: &quot;|&quot;\n                        field: category\n        - object_save:\n            name: product<\/code><\/pre>\n<h3>Sample of scraped data<\/h3>\n<p>Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.<\/p>\n<pre><code class=\"language-js\">[{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Ann Taylor&quot;,\n        &quot;category&quot;: &quot;Online Exclusives&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:49:19.687Z&quot;,\n        &quot;description&quot;: &quot;Our longest shorts always step things up in style. Refined in crisp cotton, this essential pair has a touch of stretch for an endlessly flattering fit. Contoured waistband. Front zip with double hook-and-bar closure. Belt loops. Front off-seam pockets. Back welt pockets. Side slits. 11\u0432\u0402\u045c inseam.&quot;,\n        &quot;images&quot;: &quot;https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143755|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143755|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158580|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158580|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137187|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137187|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137183|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137183|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137185|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1137185&quot;,\n        &quot;name&quot;: &quot;Walking Shorts&quot;,\n        &quot;price&quot;: 49,\n        &quot;sku&quot;: &quot;455672&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.anntaylor.com\/walking-shorts\/455672&quot;,\n        &quot;variations&quot;: &quot;Atlantic Navy|Coastal Beige&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Ann Taylor&quot;,\n        &quot;category&quot;: &quot;Jewelry&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:49:25.022Z&quot;,\n        &quot;description&quot;: &quot;A gleaming round pendant and adjustable cord necklace make this modern accessory shine. 34&quot; length adjustable cord necklace; 2&quot; pendant.&quot;,\n        &quot;images&quot;: &quot;https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1149247|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1149247|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1148837|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1148837&quot;,\n        &quot;name&quot;: &quot;Circle Pendant Cord Necklace&quot;,\n        &quot;price&quot;: 39.5,\n        &quot;sku&quot;: &quot;464472&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.anntaylor.com\/circle-pendant-cord-necklace\/464472&quot;,\n        &quot;variations&quot;: &quot;Gold&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Ann Taylor&quot;,\n        &quot;category&quot;: &quot;Jewelry&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:49:29.062Z&quot;,\n        &quot;description&quot;: &quot;This stellar pair stars a linear drop of polished stones that takes your look to the next level. French wire. 2&quot; drop.&quot;,\n        &quot;images&quot;: &quot;https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143771|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143771|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158888|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158888&quot;,\n        &quot;name&quot;: &quot;Stellar Linear Drop Earrings&quot;,\n        &quot;price&quot;: 39.5,\n        &quot;sku&quot;: &quot;459484&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.anntaylor.com\/stellar-linear-drop-earrings\/459484&quot;,\n        &quot;variations&quot;: &quot;Frosted Pink&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;Ann Taylor&quot;,\n        &quot;category&quot;: &quot;Jewelry&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:49:33.292Z&quot;,\n        &quot;description&quot;: &quot;Make the rounds with this glossy bead necklace, polished off with sparkling pave accents. Lobster claw closure. 30&quot; length with 2&quot; extender.&quot;,\n        &quot;images&quot;: &quot;https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143739|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1143739|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158885|https:\/\/richmedia.channeladvisor.com\/ImageDelivery\/imageService?profileId=52000652&amp;id=1158885&quot;,\n        &quot;name&quot;: &quot;Beaded Necklace&quot;,\n        &quot;price&quot;: 39.5,\n        &quot;sku&quot;: &quot;448996&quot;,\n        &quot;url&quot;: &quot;https:\/\/www.anntaylor.com\/beaded-necklace\/448996&quot;,\n        &quot;variations&quot;: &quot;Black&quot;\n    }\n}]\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>Ann Taylor is an American chain of women&#8217;s clothing stores. Richard Libeskind opened the first Ann Taylor store in 1954 in New Haven, Connecticut. The name of the store came from the name of the dress, which was the most popular in his father&#8217;s store. This web scraper will help you to extract product and [&hellip;]<\/p>","protected":false},"author":4,"featured_media":315,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31,30,2],"tags":[],"class_list":["post-313","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ecommerce-scraping","category-free-scrapers","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/313","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=313"}],"version-history":[{"count":4,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/313\/revisions"}],"predecessor-version":[{"id":656,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/313\/revisions\/656"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/315"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=313"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=313"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=313"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}