Mikhail Sisin Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

Gathering product and price data from Bed, Bath and Beyond online store

7 min read

Gathering product and price data from Bed, Bath and Beyond online store

Bed Bath & Beyond is a chain of home-based stores in the USA, Puerto Rico, Canada and Mexico. In 1971, Warren Eilenberg and Leonard Feinstein opened a store called Bed ‘n Bath in Springfield, New Jersey. By 1985, they managed 17 stores in New York and California. To match growth, the company was renamed Bed Bath & Beyond. Gathering product and price data from bedbathandbeyond.com website using this web scraper will be easy.

Approx number of goods: 200000
Approx number of page requests: 400000
Recommended subscription plan: Medium

PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.

How to use the web scraper to extract data about goods and prices from bedbathandbeyond.com

To use the web scraper for Bed, Bath and Beyond store’s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:
1. Go through this registration link to open free account with Diggernaut
2. After registering and confirming the email address, you will need to log in to your account
3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
6. PLEASE NOTE! Basic proxy servers may not work with this site and you may need to use your own proxy servers. You will need to specify proxy server to the specific location in the digger configuration as commented. If you feel confused about this item, please contact us using the support system or using our online chat, we will be glad to help you.
7. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
8. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
9. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

Scraping configuration for the digger

---
config:
    debug: 2
    agent: Firefox
    proxy: #USE YOUR PROXY HERE LIKE 1.1.1.1:8888
do:
- link_add:
    url:
    - https://www.bedbathandbeyond.com/__ssobj/static/giftsNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/personalizedgiftsNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/beddingNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/bathNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/kitchenNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/diningNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/homedecorNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/furnitureNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/storagecleaningNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/outdoorNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/babykidsNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/healthbeautyNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/moreNavOutHol.json?v=8
    - https://www.bedbathandbeyond.com/__ssobj/static/shopsNavOutHol.json?v=8
- walk:
    to: links
    do:
    - find:
        path: l2url,l3url
        do:
        - parse:
            filter: ^([^\?]+)
        - trim
        - if:
            match: \w+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: catalog
- walk:
    to: links
    pool: catalog
    do:
    - find:
        path: "span#ctl00_InvalidRequest"
        do:
        - proxy_switch
        - page_reload
    - find:
        path: li.lnkNextPage>a
        do:
        - parse:
            attr: href
        - trim
        - if:
            match: \w+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: catalog
    - find:
        path: a.prodImg
        do:
        - parse:
            attr: href
            filter: ^([^\?]+)
        - trim
        - if:
            match: \w+
            do:
            - normalize:
                routine: url
            - link_add:
                pool: pages
- walk:
    to: links
    pool: pages
    mode: unique
    do:
    - find:
        path: "span#ctl00_InvalidRequest"
        do:
        - proxy_switch
        - page_reload
        - find:
            path: "span#ctl00_InvalidRequest"
            do:
            - exit
    - find:
        path: "div#content"
        do:
        - variable_clear: pid
        - variable_set:
            field: brand
            value: BedBathAndBeyond
        - object_new: product
        - eval:
            routine: js
            body: '(function (){var d = new Date(); return d.toISOString()})();'
        - object_field_set:
            object: product
            field: date
        - static_get: url
        - object_field_set:
            object: product
            field: url
        - find:
            path: 'h1#productTitle'
            do:
            - parse
            - space_dedupe
            - trim
            - object_field_set:
                object: product
                field: name
        - find:
            path: div[itemprop="brand"] span[itemprop="name"]
            do:
            - parse
            - space_dedupe
            - trim
            - variable_set: brand
        - variable_get: brand
        - object_field_set:
            object: product
            field: brand
        - find:
            path: p.prodSKU
            slice: 0
            do:
            - parse:
                filter: (\d+)
            - space_dedupe
            - trim
            - object_field_set:
                object: product
                field: sku
        - find:
            path: span[itemprop="priceCurrency"]
            do:
            - parse
            - normalize:
                routine: replace_matched
                args:
                    \$: USD
            - object_field_set:
                object: product
                field: currency
        - find:
            path: span[itemprop="price"],span[itemprop="lowPrice"]
            do:
            - parse:
                filter:
                - ([0-9\.]+)\s*-
                - ([0-9\.]+)
            - object_field_set:
                object: product
                type: float
                field: price
        - find:
            path: li.colorSwatchLi
            do:
            - parse:
                attr: data-attr
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: variations
            - parse:
                attr: data-imgurlthumb
                filter: ^([^\?]+)
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - normalize:
                    routine: url
                - variable_set: iurl
                - register_set: <%iurl%>?scl=1
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: images
        - find:
            path: div[itemprop="description"]
            do:
            - parse
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - object_field_set:
                    object: product
                    field: description
        - find:
            path: div.breadcrumbs>div.alpha>a
            slice: 1:-1
            do:
            - parse
            - space_dedupe
            - trim
            - if:
                match: \w+
                do:
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: category
        - find:
            path: 'img#mainProductImg'
            do:
            - parse:
                attr: src
                filter: ^([^\?]+)
            - if:
                match: \w+
                do:
                - normalize:
                    routine: url
                - variable_set: iurl
                - register_set: <%iurl%>?scl=1
                - object_field_set:
                    object: product
                    joinby: "|"
                    field: images
        - find:
            path: 'div#s7ProductImageWrapper'
            do:
            - parse:
                attr: data-s7imageid
            - if:
                match: \d+
                do:
                - variable_set: iid
                - walk:
                    to: https://s7d9.scene7.com/is/image/BedBathandBeyond/<%iid%>_is?req=set,json,UTF-8
                    do:
                    - find:
                        path: script
                        do:
                        - parse:
                            filter: s7jsonResponse\((.+)\,\"\"\)\;
                        - normalize:
                            routine: unescape_html
                        - normalize:
                            routine: json2xml
                        - to_block
                        - find:
                            path: item>i>n
                            do:
                            - parse
                            - if:
                                match: \d+
                                do:
                                - variable_set: iurl
                                - register_set: https://s7d9.scene7.com/is/image/<%iurl%>?scl=1
                                - object_field_set:
                                    object: product
                                    joinby: "|"
                                    field: images
        - object_save:
            name: product

Sample of scraped data

Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.

[{
    "product": {
        "brand": "Dyson",
        "category": "Gifts|Gifts by Category|Unique Gifts",
        "currency": "USD",
        "date": "2017-12-07T00:05:23.532Z",
        "description": "Dyson's Supersonic Hair Dryer uses intelligent heat control technology to help to prevent heat damage to your hair, preserving its natural shine. This high-speed and powerful hair dryer works to straighten and smooth delivering beautiful silky hair.",
        "images": "https://s7d9.scene7.com/is/image/BedBathandBeyond/145513347275522p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/98918847339040p?scl=1|https://s7d2.scene7.com/is/image/BedBathandBeyond/10160953308317m?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/10160953308317m?scl=1",
        "name": "Dyson Supersonic Hair Dryer",
        "price": 399.99,
        "url": "https://www.bedbathandbeyond.com/store/product/dyson-supersonic-hair-dryer/3308317",
        "variations": "IRON/FUCHSIA|WHITE/SILVER"
    }
}
,{
    "product": {
        "brand": "KitchenAid",
        "category": "Kitchen|Small Appliances|Mixers & Attachments",
        "currency": "USD",
        "date": "2017-12-07T00:05:25.430Z",
        "description": "This high-performance, 325 watt KitchenAid Artisan Stand Mixer is reason enough for you to get busy in the kitchen. With a 5 qt. ultra durable stainless steel mixing bowl and 10 speed settings, this tilt-back-head all-metal mixer is a kitchen essential.",
        "images": "https://s7d9.scene7.com/is/image/BedBathandBeyond/21686512370920p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/15710817825569p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/68875814073710p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/46977543004843p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/7366314872353p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/18935118698528p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/58050514872485p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685612370938p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/17041218088827p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/31002313317640p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/24925813080976p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/150305412370911p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685714017224p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/5789314222944p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21686413324514p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21686612963238p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685812370962p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/104721943004836p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685912370989p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/109395460419590p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/109395760419613p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21686212863004p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/109395660419606p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685412370903p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/25119914872426p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/58001413227713p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21686012370997p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/7366514872434p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21686112371004p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/31722642049784p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/26824312371012p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/21685312366590p?scl=1|https://s7d1.scene7.com/is/image/BedBathandBeyond/150305412370911p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/150305412370911p?scl=1",
        "name": "KitchenAidВ® ArtisanВ® 5 qt. Stand Mixer",
        "price": 279.99,
        "url": "https://www.bedbathandbeyond.com/store/product/kitchenaid-reg-artisan-reg-5-qt-stand-mixer/102986",
        "variations": "ALMOND|AQUA|BLUE WILLOW|BORDEAUX|BOYSENBERRY|BROWN|BUTTERCUP|COBALT BLUE|CONTOUR SILVER|CRANBERRY|CRYSTAL BLUE|EMPIRE RED|GLOSS CINNAMON|GREEN APPLE|ICE|IMPERIAL BLACK|IMPERIAL GREY|LAVENDER|MAJESTIC YELLOW|MATTE BLACK|MATTE GRAY|METALLIC CHROME|OCEAN DRIVE|ONYX BLACK|PERSIMMON|PINK|PISTACHIO|SILVER|TANGERINE|WATERMELON|WHITE/SILVER|WHITE/WHITE"
    }
}
,{
    "product": {
        "brand": "All-Clad",
        "category": "Gifts|Gifts by Interest|Gifts for the Cook",
        "currency": "USD",
        "date": "2017-12-07T00:05:29.438Z",
        "description": "All-Clad is the first choice of serious cooks. Three-ply bonded construction has a pure aluminum core for even heat distribution and a non-reactive stainless-steel interior and exterior for stick-resistant and easy-to-clean benefits.",
        "images": "https://s7d1.scene7.com/is/image/BedBathandBeyond/1861812460112p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/1861812460112p?scl=1",
        "name": "All-Clad 12-Quart Stainless Steel Multi-Cooker",
        "price": 149.99,
        "sku": "12460112",
        "url": "https://www.bedbathandbeyond.com/store/product/all-clad-12-quart-stainless-steel-multi-cooker/1012460112"
    }
}
,{
    "product": {
        "brand": "Homedics",
        "category": "Health & Beauty|Massage & Relaxation|Massage",
        "currency": "USD",
        "date": "2017-12-07T00:05:30.079Z",
        "description": "Feel the soothing warmth of the HoMedics Shiatsu Neck and Shoulder Massager with the added heat to the shiatsu, vibrating, or combined settings. It's all customizable so you can feel comfortable and natural in your relaxation.",
        "images": "https://s7d1.scene7.com/is/image/BedBathandBeyond/46662342763468p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/46662342763468p?scl=1|https://s7d9.scene7.com/is/image/BedBathandBeyond/46662342763468p__1?scl=1",
        "name": "HoMedicsВ® Shiatsu Neck and Shoulder Massager with Heat",
        "price": 39.99,
        "sku": "42763468",
        "url": "https://www.bedbathandbeyond.com/store/product/homedics-reg-shiatsu-neck-and-shoulder-massager-with-heat/1042763468"
    }
}
,{
    "product": {
        "brand": "Presto",
        "category": "Gifts|Gifts by Category|Unique Gifts",
        "currency": "USD",
        "date": "2017-12-07T00:05:30.730Z",
        "description": "Make delicious, authentic pizza parlor pizza at home. With the exclusive Roto-bake technology you can choose exactly how bubbly the cheese should be and precisely how crispy or chewy you'd like the crust.",
        "images": "https://s7d1.scene7.com/is/image/BedBathandBeyond/397311975038p?scl=1",
        "name": "Presto Pizzazz Pizza Cooker",
        "price": 59.99,
        "sku": "11975038",
        "url": "https://www.bedbathandbeyond.com/store/product/presto-pizzazz-pizza-cooker/1011975038"
    }
}]
Mikhail Sisin Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.

Leave a Reply

Your email address will not be published. Required fields are marked *