Extract product and price data from Ann Taylor with Diggernaut

Ann Taylor is an American chain of women’s clothing stores. Richard Libeskind opened the first Ann Taylor store in 1954 in New Haven, Connecticut. The name of the store came from the name of the dress, which was the most popular in his father’s store. This web scraper will help you to extract product and price data along with images from anntaylor.com website.

Approx number of goods: 2000
Approx number of page requests: 4000
Recommended subscription plan: Free

PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.

How to use the web scraper to extract data about products and prices from anntaylor.com

To use the web scraper for Ann Taylor store’s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:

  1. Go through this registration link to open free account with Diggernaut
  2. After registering and confirming the email address, you will need to log in to your account
  3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
  4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
  5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
  6. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
  7. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
  8. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

Scraping configuration for the digger

---
config:
    debug: 2
do:
- pool_clear: pages
- walk:
    to: https://www.anntaylor.com
    do:
    - find:
        path: nav.sub-nav a
        do:
        - variable_clear: did
        - variable_set:
            field: viewsnum
            value: 0
        - parse:
            attr: data-id
        - variable_set: did
        - walk:
            to: https://www.anntaylor.com/ecws/endecaService.jsp?SortByFacetSelectedValue=remove&DocSortOrder=remove&format=json&catid=&question=&fRequest=true&goToPage=1&N=0&categoryType=regular&priceSort=DESC&country=US&currency=USD&Submit=Submit
            do:
            - find:
                path: resultslist>pagination>attributes>pagesavailable
                do:
                - parse
                - variable_set: viewsnum
                - eval:
                    routine: js
                    body: (function(){var num = ;var str = "";for(var i = num; i > 0; i--){if (i != num){str += ","}str += i} return "<div>"+str+"</div>";})();
                - to_block
                - split:
                    context: text
                    delimiter: ","
                - find:
                    path: div
                    do:
                    - variable_clear: pagenum
                    - parse
                    - variable_set: pagenum
                    - link_add:
                        url: https://www.anntaylor.com/ecws/endecaService.jsp?SortByFacetSelectedValue=remove&DocSortOrder=remove&format=json&catid=&question=&fRequest=true&goToPage=&N=0&categoryType=regular&priceSort=DESC&country=US&currency=USD&Submit=Submit
                        pool: catalog
- walk:
    to: links
    pool: catalog
    do:
    - find:
        path: resultslist>records>records>attributes>quicklookurl
        do:
        - parse:
            filter: ^([^\?]+)
        - normalize:
            routine: url
        - link_add:
            pool: pages
- walk:
    to: links
    pool: pages
    do:
    - sleep: 3
    - find:
        path: main
        do:
        - variable_clear: pid
        - object_new: product
        - eval:
            routine: js
            body: '(function (){var d = new Date(); return d.toISOString()})();'
        - object_field_set:
            object: product
            field: date
        - static_get: url
        - object_field_set:
            object: product
            field: url
        - find:
            path: h1[itemprop="name"]
            do:
            - parse
            - space_dedupe
            - trim
            - object_field_set:
                object: product
                field: name
        - register_set: Ann Taylor
        - object_field_set:
            object: product
            field: brand
        - find:
            in: doc
            path: script:contains("window.productSettings = ")
            do:
            - parse:
                filter: window\.productSettings\s+=\s+(.+)\s*
            - normalize:
                routine: json2xml
            - to_block
            - find:
                path: body_safe>currency
                do:
                - parse
                - normalize:
                    routine: replace_matched
                    args:
                        \$: USD
                - object_field_set:
                    object: product
                    field: currency
            - find:
                path: body_safe>products>listprice
                do:
                - parse
                - object_field_set:
                    object: product
                    type: float
                    field: price
            - find:
                path: body_safe>prodid
                do:
                - parse
                - space_dedupe
                - trim
                - variable_set: pid
                - object_field_set:
                    object: product
                    field: sku
            - find:
                path: body_safe>products>skucolors>colors
                do:
                - find:
                    path: colorname
                    do:
                    - parse
                    - space_dedupe
                    - trim
                    - if:
                        match: \w+
                        do:
                        - object_field_set:
                            object: product
                            joinby: "|"
                            field: variations
            - walk:
                to: https://richmedia.channeladvisor.com/ViewerDelivery/productXmlService?profileid=52000652&itemid=&viewerid=196
                do:
                - find:
                    path: img
                    do:
                    - parse:
                        attr: path
                    - normalize:
                        routine: replace_substring
                        args:
                            \&recipeId\=\d+: ''
                    - object_field_set:
                        object: product
                        joinby: "|"
                        field: images
            - find:
                path: body_safe>products>weblongdescription
                do:
                - parse
                - space_dedupe
                - trim
                - object_field_set:
                    object: product
                    field: description
            - find:
                path: body_safe>products>parentcategoryname
                do:
                - parse
                - space_dedupe
                - trim
                - if:
                    match: \w+
                    do:
                    - object_field_set:
                        object: product
                        joinby: "|"
                        field: category
        - object_save:
            name: product

Sample of scraped data

Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.

[{
    "product": {
        "brand": "Ann Taylor",
        "category": "Online Exclusives",
        "currency": "USD",
        "date": "2017-12-05T18:49:19.687Z",
        "description": "Our longest shorts always step things up in style. Refined in crisp cotton, this essential pair has a touch of stretch for an endlessly flattering fit. Contoured waistband. Front zip with double hook-and-bar closure. Belt loops. Front off-seam pockets. Back welt pockets. Side slits. 11” inseam.",
        "images": "https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143755|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143755|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158580|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158580|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137187|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137187|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137183|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137183|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137185|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1137185",
        "name": "Walking Shorts",
        "price": 49,
        "sku": "455672",
        "url": "https://www.anntaylor.com/walking-shorts/455672",
        "variations": "Atlantic Navy|Coastal Beige"
    }
}
,{
    "product": {
        "brand": "Ann Taylor",
        "category": "Jewelry",
        "currency": "USD",
        "date": "2017-12-05T18:49:25.022Z",
        "description": "A gleaming round pendant and adjustable cord necklace make this modern accessory shine. 34" length adjustable cord necklace; 2" pendant.",
        "images": "https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1149247|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1149247|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1148837|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1148837",
        "name": "Circle Pendant Cord Necklace",
        "price": 39.5,
        "sku": "464472",
        "url": "https://www.anntaylor.com/circle-pendant-cord-necklace/464472",
        "variations": "Gold"
    }
}
,{
    "product": {
        "brand": "Ann Taylor",
        "category": "Jewelry",
        "currency": "USD",
        "date": "2017-12-05T18:49:29.062Z",
        "description": "This stellar pair stars a linear drop of polished stones that takes your look to the next level. French wire. 2" drop.",
        "images": "https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143771|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143771|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158888|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158888",
        "name": "Stellar Linear Drop Earrings",
        "price": 39.5,
        "sku": "459484",
        "url": "https://www.anntaylor.com/stellar-linear-drop-earrings/459484",
        "variations": "Frosted Pink"
    }
}
,{
    "product": {
        "brand": "Ann Taylor",
        "category": "Jewelry",
        "currency": "USD",
        "date": "2017-12-05T18:49:33.292Z",
        "description": "Make the rounds with this glossy bead necklace, polished off with sparkling pave accents. Lobster claw closure. 30" length with 2" extender.",
        "images": "https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143739|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1143739|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158885|https://richmedia.channeladvisor.com/ImageDelivery/imageService?profileId=52000652&id=1158885",
        "name": "Beaded Necklace",
        "price": 39.5,
        "sku": "448996",
        "url": "https://www.anntaylor.com/beaded-necklace/448996",
        "variations": "Black"
    }
}]
Mikhail Sisin: Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.
Related Post