{"id":309,"date":"2018-02-05T23:04:26","date_gmt":"2018-02-05T23:04:26","guid":{"rendered":"https:\/\/www.diggernaut.com\/blog\/?p=309"},"modified":"2019-01-12T16:57:10","modified_gmt":"2019-01-12T16:57:10","slug":"extract-product-price-information-american-apparel-online-store","status":"publish","type":"post","link":"https:\/\/www.diggernaut.com\/blog\/extract-product-price-information-american-apparel-online-store\/","title":{"rendered":"Extract product and price information from American Apparel online store"},"content":{"rendered":"<p>American Apparel is a North American manufacturer and fashion apparel retailer based in Los Angeles, California. The company was founded in 1989 by Canadian businessman Dov Charny. The scraper presented in this article will allow you to extract product and price information presented in the online store of the company: americanapparel.net.<\/p>\n<p><strong>Approx number of goods:<\/strong> 500<br>\n<strong>Approx number of page requests:<\/strong> 500<br>\n<strong>Recommended subscription plan:<\/strong> Free<\/p>\n<p><strong>PLEASE NOTE!<\/strong> The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.<\/p>\n<h3>How to use the web scraper to extract product and price information from americanapparel.net<\/h3>\n<p>To use the web scraper for American Apparel online store\u2019s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:<\/p>\n<ol>\n<li>Go through this <a href=\"https:\/\/www.diggernaut.com\/accounts\/signup\/\">registration link<\/a> to open free account with <a href=\"https:\/\/www.diggernaut.com\">Diggernaut<\/a><\/li>\n<li>After registering and confirming the email address, you will need to <a href=\"https:\/\/www.diggernaut.com\/accounts\/login\/\">log in to your account<\/a><\/li>\n<li>Create a project with any name and description, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-project.html\">documentation<\/a><\/li>\n<li>Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-create-new-digger.html\">documentation<\/a><\/li>\n<li>Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-digger-config.html\">documentation<\/a><\/li>\n<li>Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-edit-digger.html\">documentation<\/a><\/li>\n<li>Run your digger and wait until the completion, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-run-digger.html\">documentation<\/a><\/li>\n<li>Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our <a href=\"https:\/\/www.diggernaut.com\/dev\/website-projects-scraped-data.html\">documentation<\/a><\/li>\n<\/ol>\n<p>You can also setup a schedule for running your scraper and collect data regularly.<\/p>\n<h3>Scraping configuration for the digger<\/h3>\n<pre class=\"language-yaml line-numbers\"><code class=\"language-yaml\">---\nconfig: \n    debug: 2\n    agent: Firefox\ndo:\n- link_add:\n    url: http:\/\/store.americanapparel.net\n- link_add:\n    url: http:\/\/store.americanapparel.net\/en\/factory-store\/\n- walk:\n    to: links\n    do:\n    - find:\n        path: .cd-primary-nav a\n        do:\n        - parse:\n            attr: href\n        - normalize:\n            routine: url\n        - link_add:\n            pool: main\n- walk:\n    to: links\n    pool: main\n    do:\n    - find:\n        path: .product > a\n        do: \n        - parse:\n            attr: href\n        - normalize:\n            routine: url\n        - link_add:\n            pool: sub\n- walk:\n    to: links\n    pool: sub\n    do:\n    - sleep: 3\n    - find:\n        path: .pdp\n        do:\n        - variable_clear: allli\n        - variable_clear: descr\n        - variable_clear: li\n        - variable_clear: id\n        - variable_clear: views\n        - variable_clear: color\n        - variable_clear: imgnum\n        - variable_clear: imgxl\n        - variable_clear: viewsnum\n        - variable_clear: stp\n        - object_new: product\n        - eval:\n            routine: js\n            body: &#039;(function (){var d = new Date(); return d.toISOString()})();&#039;\n        - object_field_set:\n            object: product\n            field: date\n        - static_get: url\n        - object_field_set:\n            object: product\n            field: url\n        - find: \n            in: doc\n            path: head meta[name=&quot;description&quot;] \n            do: \n            - parse:\n                attr: content\n            - space_dedupe\n            - trim\n            - to_block\n            - node_replace:\n                path: br\n                with: &quot;\\n&quot;\n            - split:\n                context: text\n                delimiter: \\n+\n            - find:\n                path: div.splitted\n                slice: 0\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - object_field_set: \n                    object: product\n                    field: description\n        - find:\n            path: .product-style\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set: \n                object: product\n                field: sku\n        - find:\n            path: .price\n            do:\n            - find:\n                path: .red-text\n                do:\n                - parse:\n                    filter:\n                        - (\\d+\\.?\\d*)\n                - if:\n                    match: (\\d)\n                    do:\n                    - object_field_set:\n                        object: product\n                        field: price\n                        type: float\n                    - register_set: USD\n                    - object_field_set:\n                        object: product\n                        field: currency\n                    - register_set: 1\n                    - variable_set: stp\n            - find:\n                path: span[data-test=&quot;test&quot;]\n                do:\n                - variable_get: stp\n                - if:\n                    match: (1)\n                    else:\n                    - parse:\n                        filter:\n                            - (\\d+\\.?\\d*)\n                    - if:\n                        match: (\\d)\n                        do:\n                        - object_field_set:\n                            object: product\n                            field: price\n                            type: float\n                        - register_set: USD\n                        - object_field_set:\n                            object: product\n                            field: currency\n        - find:\n            path: .product-name\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set: \n                object: product\n                field: name\n        - find:\n            path: .main-img\n            do:\n            - parse:\n                attr: src\n            - object_field_set:\n                object: product\n                field: images\n                joinby: &quot;|&quot;\n        - find:\n            path: .logo\n            slice: 0\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set: \n                object: product\n                field: brand\n        - find:\n            path: .breadcrumbs a\n            do:\n            - parse\n            - space_dedupe\n            - trim\n            - object_field_set: \n                object: product\n                field: category\n                joinby: &quot;|&quot;\n        - find:\n            path: &#039;.product-details > input#skuVarData&#039;\n            do:\n            - parse:\n                attr: value\n            - normalize:\n                routine: replace_substring\n                args:\n                    \\s+\\\/\\s+: _\n            - normalize:\n                routine: json2xml\n            - to_block\n            - find:\n                path: body_safe > name\n                do:\n                - parse\n                - space_dedupe\n                - trim\n                - object_field_set:\n                    object: product\n                    field: name\n            - find:\n                path: colors\n                do:\n                - find:\n                    path: zoomimage\n                    do:\n                    - parse:\n                        filter:\n                            - \\s*(.+)\\?\n                    - variable_set: imgxl\n                    - register_set: ?$ProductZoom$\n                    - object_field_set:\n                        object: product\n                        field: images\n                        joinby: &quot;|&quot;\n                - find:\n                    path: name\n                    do:\n                    - parse\n                    - space_dedupe\n                    - trim\n                    - object_field_set:\n                        object: product\n                        field: variations\n                        joinby: &quot;|&quot;\n        - object_save:\n            name: product<\/code><\/pre>\n<h3>Sample of scraped data<\/h3>\n<p>Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.<\/p>\n<pre><code class=\"language-js\">[{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;American Apparel \u00ae&quot;,\n        &quot;category&quot;: &quot;Women|Multipacks&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:06:21.973Z&quot;,\n        &quot;description&quot;: &quot;The 50\/50 Crewneck T-Shirt is a super-soft Poly-Cotton t-shirt featuring a slightly scooped neck and perfectly worn feel.&quot;,\n        &quot;images&quot;: &quot;http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_white?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_white?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_asphalt?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_black?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_gold?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_kellygreen?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_navy?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_orchid?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_pink?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_red?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_truffle?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb301w_white?$ProductZoom$&quot;,\n        &quot;name&quot;: &quot;50\/50 Crewneck T-Shirt&quot;,\n        &quot;price&quot;: 18,\n        &quot;sku&quot;: &quot;bb301w&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.americanapparel.com\/en\/50-50-crewneck-t-shirt_bb301w?c=White&quot;,\n        &quot;variations&quot;: &quot;Asphalt|Black|Gold|Kelly Green|Navy|Orchid|Pink|Red|Truffle|White&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;American Apparel \u00ae&quot;,\n        &quot;category&quot;: &quot;Women|T-Shirts & Tanks|Tanks&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:06:25.305Z&quot;,\n        &quot;description&quot;: &quot;The 50\/50 tank is a sexy tank with generously cut arm openings and a slim racerback in our super-soft Poly-Cotton fabric.&quot;,\n        &quot;images&quot;: &quot;http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_navy?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_navy?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_asphalt?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_black?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_gold?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_kellygreen?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_navy?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_orchid?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_pink?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_red?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_truffle?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/bb308w_white?$ProductZoom$&quot;,\n        &quot;name&quot;: &quot;50\/50 Tank&quot;,\n        &quot;price&quot;: 16,\n        &quot;sku&quot;: &quot;bb308w&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.americanapparel.com\/en\/50-50-tank_bb308w?c=Navy&quot;,\n        &quot;variations&quot;: &quot;Asphalt|Black|Gold|Kelly Green|Navy|Orchid|Pink|Red|Truffle|White&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;American Apparel \u00ae&quot;,\n        &quot;category&quot;: &quot;Women|Basics Shop&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:06:28.613Z&quot;,\n        &quot;description&quot;: &quot;The 50\/50 Loose Crop Tee is a loose-fitting cropped t-shirt in our ultra-soft 50\/50 Poly-Cotton blend. Perfect for layering or paired with high-waist skirts, pants and shorts.&quot;,\n        &quot;images&quot;: &quot;http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_white?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_white?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_black?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_navy?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_orchid?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_pink?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_red?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/rsabb380w_white?$ProductZoom$&quot;,\n        &quot;name&quot;: &quot;50\/50 Loose Crop Tee&quot;,\n        &quot;price&quot;: 18,\n        &quot;sku&quot;: &quot;rsabb380w&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.americanapparel.com\/en\/50-50-loose-crop-tee_rsabb380w?c=White&quot;,\n        &quot;variations&quot;: &quot;Black|Navy|Orchid|Pink|Red|White&quot;\n    }\n}\n,{\n    &quot;product&quot;: {\n        &quot;brand&quot;: &quot;American Apparel \u00ae&quot;,\n        &quot;category&quot;: &quot;Women|Multipacks&quot;,\n        &quot;currency&quot;: &quot;USD&quot;,\n        &quot;date&quot;: &quot;2017-12-05T18:06:31.899Z&quot;,\n        &quot;description&quot;: &quot;The Tri-Blend Racerback Tank is a sexy tank with generously cut arm openings and a slim racerback in our ultra soft Tri-Blend fabric. \u2022 Polyester retains shape and elasticity; Cotton lends both comfort and durability; addition of Rayon makes for a unique texture and drapes against the body for a slimming look&quot;,\n        &quot;images&quot;: &quot;http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-lieutenant?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-lieutenant?defaultImage=\/notavail&$ProductImage2.5$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_athleticblue?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_athleticgrey?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-black?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-creolepink?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-indigo?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-lieutenant?$ProductZoom$|http:\/\/s7d9.scene7.com\/is\/image\/AmericanApparel\/tr308w_tri-red?$ProductZoom$&quot;,\n        &quot;name&quot;: &quot;Tri-Blend Racerback Tank&quot;,\n        &quot;price&quot;: 18,\n        &quot;sku&quot;: &quot;tr308w&quot;,\n        &quot;url&quot;: &quot;http:\/\/www.americanapparel.com\/en\/tri-blend-racerback-tank_tr308w?c=Tri-Lieutenant&quot;,\n        &quot;variations&quot;: &quot;Athletic Blue|Athletic Grey|Tri-Black|Tri-Creole Pink|Tri-Indigo|Tri-Lieutenant|Tri-Red&quot;\n    }\n}]\n<\/code><\/pre>","protected":false},"excerpt":{"rendered":"<p>American Apparel is a North American manufacturer and fashion apparel retailer based in Los Angeles, California. The company was founded in 1989 by Canadian businessman Dov Charny. The scraper presented in this article will allow you to extract product and price information presented in the online store of the company: americanapparel.net. Approx number of goods: [&hellip;]<\/p>","protected":false},"author":4,"featured_media":311,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[31,30,2],"tags":[],"class_list":["post-309","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ecommerce-scraping","category-free-scrapers","category-web-scraping"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/comments?post=309"}],"version-history":[{"count":3,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/309\/revisions"}],"predecessor-version":[{"id":657,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/posts\/309\/revisions\/657"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media\/311"}],"wp:attachment":[{"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/media?parent=309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/categories?post=309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.diggernaut.com\/blog\/wp-json\/wp\/v2\/tags?post=309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}