Free product scraper for Banana Republic inventory

Banana Republic is an American retailer of clothing and accessories, owned by the American transсorporation Gap. The company was founded in 1978 by Mel and Patricia Ziegler, and in 1983 it was acquired by Gap Corporation, which contributed to the growth of the company. To date, the company owns more than 600 stores around the world. This free product scraper will help you gather information about products, priices and images from the bananarepublic.gap.com online store.

Approx number of goods: 20000
Approx number of page requests: 20000
Recommended subscription plan: X-Small

PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.

How to use the web scraper to extract data about goods and prices from bananarepublic.gap.com

To use the web scraper for Banana Republic store’s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:

  1. Go through this registration link to open free account with Diggernaut
  2. After registering and confirming the email address, you will need to log in to your account
  3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
  4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
  5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
  6. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
  7. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
  8. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

Scraping configuration for the digger

---
config:
    debug: 2
    agent: Firefox
do:
- walk:
    to: http://bananarepublic.gap.com/
    do:
    - find:
        path: ul.brnavigation-brol>li>a
        do:
        - parse:
            attr: href
        - space_dedupe
        - trim
        - if:
            match: \/browse\/
            do:
            - normalize:
                routine: url
            - link_add:
                pool: main
- walk:
    to: links
    pool: main
    do:
    - find:
        path: .sidebar-navigation
        slice: 0
        do:
        - node_remove: h1
        - sequence:
            header: h2
            selector: h2,div
        - find:
            path: div.sequence
            do:
            - variable_clear: catname
            - find:
                path: h2
                do:
                - parse
                - space_dedupe
                - trim
                - variable_set: catname
            - find:
                path: .sidebar-navigation--category--link
                do: 
                - pool_clear: pager
                - parse:
                    attr: href
                    filter:
                        - cid=(.+)
                - variable_set: cid
                - register_set: http://bananarepublic.gap.com/resources/productSearch/v1/search?cid=&locale=en_US&isFacetsEnabled=true
                - link_add:
                    pool: pager
                - walk:
                    to: links
                    pool: pager
                    do:
                    - variable_clear: ptot
                    - find:
                        path: pageNumberTotal
                        do:
                        - parse
                        - if:
                            match: (^\s*[0-1]\s*$)
                            else:
                            - variable_set: ptot
                    - find:
                        path: pageNumberRequested
                        do:
                        - parse
                        - if:
                            match: (^\s*0\s*$)
                            do:
                            - variable_get: ptot
                            - if:
                                match: (\d)
                                do:
                                - if:
                                    gt: 1
                                    do:
                                    - eval:
                                        routine: js
                                        body: '(function (){var r = ""; for (var i = 1; i; i++){r += "<div>"+i+"</div>"}; return r;})();'
                                    - to_block
                                    - find:
                                        path: div 
                                        do:
                                        - parse
                                        - variable_set: pageid
                                        - register_set:  http://bananarepublic.gap.com/resources/productSearch/v1/search?cid=&locale=en_US&pageId=&isFacetsEnabled=true
                                        - link_add:
                                            pool: pager
                    - find:
                        path: productCategory > name
                        do:
                        - parse
                        - space_dedupe
                        - trim
                        - variable_set: catname2
                    - find:
                        path: productCategory > childProducts
                        do:
                        - find:
                            path: parentBusinessCatalogItemId
                            do:
                            - parse
                            - if:
                                match: (\S)
                                do:
                                - variable_set: pid
                                - register_set:  http://bananarepublic.gap.com/browse/product.do?pid=&cid=
                                - walk:
                                    to: value
                                    do:
                                    - variable_clear: isP
                                    - find:
                                        path: script:matches(gap.pageProductData\s*=\s*\{)
                                        do:
                                        - variable_set:
                                            field: isP
                                            value: 1
                                    - find:
                                        path: html
                                        do:
                                        - variable_get: isP
                                        - if:
                                            match: (1)
                                            do:
                                            - object_new: product
                                            - find:
                                                path: head
                                                do:
                                                - eval:
                                                    routine: js
                                                    body: '(function (){var d = new Date(); return d.toISOString()})();'
                                                - object_field_set:
                                                    object: product
                                                    field: date
                                                - static_get: url
                                                - object_field_set:
                                                    object: product
                                                    field: url
                                                - register_set: 'Banana Republic'
                                                - object_field_set:
                                                    object: product
                                                    field: brand
                                                - find:
                                                    path: meta[name="keywords"] 
                                                    do:
                                                    - parse:
                                                        attr: content
                                                    - object_field_set:
                                                        object: product
                                                        field: description
                                            - find:
                                                path: script:matches(gap.pageProductData\s*=\s*\{)
                                                do:
                                                - parse:
                                                    filter: 
                                                        - gap\.currentBrand\s*=\s*\"(.+)\"\;
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: brand
                                                - parse
                                                - normalize:
                                                    routine: replace_substring
                                                    args:
                                                        var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
                                                        gap\.pageProductData\s*=\s*: ''
                                                        \s*;\s*gap.currentBrand\s*=\s*.*\;: ''
                                                - normalize:
                                                    routine: json2xml
                                                - to_block
                                                - find:
                                                    path: productimages
                                                    do:
                                                    - parse:
                                                        format: html
                                                    - variable_set: imghtml
                                                - find:
                                                    path: variants > productstylecolors > productstylecolorimages
                                                    do:
                                                    - parse
                                                    - normalize:
                                                        routine: lower
                                                    - variable_set: imgpath
                                                    - register_set: 
<div></div>
                                                    - to_block
                                                    - find:
                                                        path: safe_
                                                        do:
                                                        - variable_clear: getit
                                                        - find:
                                                            path: xlarge
                                                            do:
                                                            - parse
                                                            - if:
                                                                match: (\S)
                                                                do:
                                                                - variable_set:
                                                                    field: getit
                                                                    value: 1
                                                                - normalize:
                                                                    routine: url
                                                                - object_field_set:
                                                                    object: product
                                                                    field: images
                                                                    joinby: "|"
                                                        - variable_get: getit
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: large
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                        - variable_get: getit    
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: medium
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                        - variable_get: getit
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: small
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                - find:
                                                    path: body_safe > variants > productstylecolors > colorname
                                                    do:
                                                    - parse
                                                    - if:
                                                        match: (\S)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: variations
                                                            joinby: "|"
                                                - find:
                                                    path: body_safe > name
                                                    do:       
                                                    - parse
                                                    - if:
                                                        match: (\S)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: name
                                                - find:
                                                    path: body_safe > currentmaxprice, body_safe > currentminprice
                                                    do:
                                                    - parse:
                                                        filter:
                                                            - (\d+\.?\d*)
                                                    - if:
                                                        match: (\d+)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: price
                                                            type: float
                                                        - register_set: USD
                                                        - object_field_set:
                                                            object: product
                                                            field: currency
                                                - find:
                                                    path: styleid
                                                    slice: 0
                                                    do:
                                                    - parse
                                                    - object_field_set:
                                                        object: product
                                                        field: sku
                                                    - variable_set: sid
                                            - find:
                                                path: body
                                                do: 
                                                - find:
                                                    path: '#topNavWrapper a[class*=_selected]'
                                                    do:
                                                    - parse
                                                    - space_dedupe
                                                    - trim
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                                - variable_get: catname
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                                - variable_get: catname2
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                            - object_save:
                                                name: product

                    - find:
                        path: productCategory > childCategories
                        do:
                        - variable_clear: catname3
                        - find:
                            path: name
                            slice: 0
                            do:
                            - parse
                            - space_dedupe
                            - trim
                            - variable_set: catname3
                        - find:
                            path: parentBusinessCatalogItemId
                            do:
                            - parse
                            - if:
                                match: (\S)
                                do:
                                - variable_set: pid
                                - register_set:  http://bananarepublic.gap.com/browse/product.do?pid=&cid=
                                - walk:
                                    to: value
                                    do:
                                    - variable_clear: isP
                                    - find:
                                        path: script:matches(gap.pageProductData\s*=\s*\{)
                                        do:
                                        - variable_set:
                                            field: isP
                                            value: 1
                                    - find:
                                        path: html
                                        do:
                                        - variable_get: isP
                                        - if:
                                            match: (1)
                                            do:
                                            - object_new: product
                                            - find:
                                                path: head
                                                do:
                                                - eval:
                                                    routine: js
                                                    body: '(function (){var d = new Date(); return d.toISOString()})();'
                                                - object_field_set:
                                                    object: product
                                                    field: date
                                                - static_get: url
                                                - object_field_set:
                                                    object: product
                                                    field: url
                                                - register_set: 'Banana Republic'
                                                - object_field_set:
                                                    object: product
                                                    field: brand
                                                - find:
                                                    path: meta[name="keywords"] 
                                                    do:
                                                    - parse:
                                                        attr: content
                                                    - object_field_set:
                                                        object: product
                                                        field: description
                                            - find:
                                                path: script:matches(gap.pageProductData\s*=\s*\{)
                                                do:
                                                - parse:
                                                    filter: 
                                                        - gap\.currentBrand\s*=\s*\"(.+)\"\;
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: brand
                                                - parse
                                                - normalize:
                                                    routine: replace_substring
                                                    args:
                                                        var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
                                                        gap\.pageProductData\s*=\s*: ''
                                                        \s*;\s*gap.currentBrand\s*=\s*.*\;: ''
                                                - normalize:
                                                    routine: json2xml
                                                - to_block
                                                - find:
                                                    path: productimages
                                                    do:
                                                    - parse:
                                                        format: html
                                                    - variable_set: imghtml
                                                - find:
                                                    path: variants > productstylecolors > productstylecolorimages
                                                    do:
                                                    - parse
                                                    - normalize:
                                                        routine: lower
                                                    - variable_set: imgpath
                                                    - register_set: 
<div></div>
                                                    - to_block
                                                    - find:
                                                        path: safe_
                                                        do:
                                                        - variable_clear: getit
                                                        - find:
                                                            path: xlarge
                                                            do:
                                                            - parse
                                                            - if:
                                                                match: (\S)
                                                                do:
                                                                - variable_set:
                                                                    field: getit
                                                                    value: 1
                                                                - normalize:
                                                                    routine: url
                                                                - object_field_set:
                                                                    object: product
                                                                    field: images
                                                                    joinby: "|"
                                                        - variable_get: getit
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: large
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                        - variable_get: getit    
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: medium
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                        - variable_get: getit
                                                        - if:
                                                            match: (1)
                                                            else:
                                                            - find:
                                                                path: small
                                                                do:
                                                                - parse
                                                                - if:
                                                                    match: (\S)
                                                                    do:
                                                                    - variable_set:
                                                                        field: getit
                                                                        value: 1
                                                                    - normalize:
                                                                        routine: url
                                                                    - object_field_set:
                                                                        object: product
                                                                        field: images
                                                                        joinby: "|"
                                                - find:
                                                    path: body_safe > variants > productstylecolors > colorname
                                                    do:
                                                    - parse
                                                    - if:
                                                        match: (\S)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: variations
                                                            joinby: "|"
                                                - find:
                                                    path: body_safe > name
                                                    do:       
                                                    - parse
                                                    - if:
                                                        match: (\S)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: name
                                                - find:
                                                    path: body_safe > currentmaxprice, body_safe > currentminprice
                                                    do:
                                                    - parse:
                                                        filter:
                                                            - (\d+\.?\d*)
                                                    - if:
                                                        match: (\d+)
                                                        do:
                                                        - object_field_set:
                                                            object: product
                                                            field: price
                                                            type: float
                                                        - register_set: USD
                                                        - object_field_set:
                                                            object: product
                                                            field: currency
                                                - find:
                                                    path: styleid
                                                    slice: 0
                                                    do:
                                                    - parse
                                                    - object_field_set:
                                                        object: product
                                                        field: sku
                                                    - variable_set: sid
                                            - find:
                                                path: body
                                                do: 
                                                - find:
                                                    path: '#topNavWrapper a[class*=_selected]'
                                                    do:
                                                    - parse
                                                    - space_dedupe
                                                    - trim
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                                - variable_get: catname
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                                - variable_get: catname2
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                                - variable_get: catname3
                                                - if:
                                                    match: (\S)
                                                    do:
                                                    - object_field_set:
                                                        object: product
                                                        field: category
                                                        joinby: "|"
                                            - object_save:
                                                name: product

Sample of scraped data

Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.

[{
    "product": {
        "brand": "banana-republic",
        "category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
        "currency": "USD",
        "date": "2017-12-06T20:24:20.440Z",
        "description": "Riley-Fit Stain-Resistant Super-Stretch Shirt, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
        "images": "http://bananarepublic.gap.com/webcontent/0013/731/030/cn13731030.jpg|http://bananarepublic.gap.com/webcontent/0013/787/545/cn13787545.jpg|http://bananarepublic.gap.com/webcontent/0013/787/550/cn13787550.jpg|http://bananarepublic.gap.com/webcontent/0013/731/030/cn13731030.jpg|http://bananarepublic.gap.com/webcontent/0013/787/545/cn13787545.jpg|http://bananarepublic.gap.com/webcontent/0013/787/550/cn13787550.jpg",
        "name": "Riley-Fit Stain-Resistant Super-Stretch Shirt",
        "price": 88,
        "sku": "875959",
        "url": "http://bananarepublic.gap.com/browse/product.do?pid=875959&cid=48422",
        "variations": "White|White"
    }
}
,{
    "product": {
        "brand": "banana-republic",
        "category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
        "currency": "USD",
        "date": "2017-12-06T20:24:22.345Z",
        "description": "Pearl Print Tie-Back Dress, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
        "images": "http://bananarepublic.gap.com/webcontent/0014/333/311/cn14333311.jpg|http://bananarepublic.gap.com/webcontent/0014/511/681/cn14511681.jpg|http://bananarepublic.gap.com/webcontent/0014/511/700/cn14511700.jpg|http://bananarepublic.gap.com/webcontent/0014/501/794/cn14501794.jpg|http://bananarepublic.gap.com/webcontent/0014/333/311/cn14333311.jpg|http://bananarepublic.gap.com/webcontent/0014/511/681/cn14511681.jpg|http://bananarepublic.gap.com/webcontent/0014/511/700/cn14511700.jpg|http://bananarepublic.gap.com/webcontent/0014/501/794/cn14501794.jpg",
        "name": "Pearl Print Tie-Back Dress",
        "price": 128,
        "sku": "878840",
        "url": "http://bananarepublic.gap.com/browse/product.do?pid=878840&cid=48422",
        "variations": "Navy|Navy"
    }
}
,{
    "product": {
        "brand": "banana-republic",
        "category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
        "currency": "USD",
        "date": "2017-12-06T20:24:23.316Z",
        "description": "Stripe Pajama-Style Shirt with Piping, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
        "images": "http://bananarepublic.gap.com/webcontent/0014/388/402/cn14388402.jpg|http://bananarepublic.gap.com/webcontent/0014/556/204/cn14556204.jpg|http://bananarepublic.gap.com/webcontent/0014/556/192/cn14556192.jpg|http://bananarepublic.gap.com/webcontent/0014/388/402/cn14388402.jpg|http://bananarepublic.gap.com/webcontent/0014/556/204/cn14556204.jpg|http://bananarepublic.gap.com/webcontent/0014/556/192/cn14556192.jpg",
        "name": "Stripe Pajama-Style Shirt with Piping",
        "price": 88,
        "sku": "887053",
        "url": "http://bananarepublic.gap.com/browse/product.do?pid=887053&cid=48422",
        "variations": "Navy|Navy"
    }
}
,{
    "product": {
        "brand": "banana-republic",
        "category": "Women|what's new|new arrivals|Riley-Fit Stain-Resistant Super-Stretch Shirt",
        "currency": "USD",
        "date": "2017-12-06T20:24:24.239Z",
        "description": "Zero Gravity Dixie Wash Skinny Ankle Jean, Women's Apparel, Women's Apparel new arrivals, Banana Republic",
        "images": "http://bananarepublic.gap.com/webcontent/0013/683/975/cn13683975.jpg|http://bananarepublic.gap.com/webcontent/0013/684/197/cn13684197.jpg|http://bananarepublic.gap.com/webcontent/0013/745/912/cn13745912.jpg|http://bananarepublic.gap.com/webcontent/0013/683/975/cn13683975.jpg|http://bananarepublic.gap.com/webcontent/0013/684/197/cn13684197.jpg|http://bananarepublic.gap.com/webcontent/0013/745/912/cn13745912.jpg",
        "name": "Zero Gravity Dixie Wash Skinny Ankle Jean",
        "price": 110,
        "sku": "874720",
        "url": "http://bananarepublic.gap.com/browse/product.do?pid=874720&cid=48422",
        "variations": "Indigo|Indigo"
    }
}]
Mikhail Sisin: Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.
Related Post