How to extract user generated content for an internet shop with a small budget

You’ve probably seen galleries with user-generated content in various online stores that sell clothing, shoes, home products, etc. They are very helpful in selling a product because they allow a potential buyer to see how a particular product sits on a real person rather than on a model and allows the buyer to make a more conscious decision. You probably would like to extract user-generated content but don’t know how to do it with a limited budget.

Technical implementation of such a mechanism is following: the service aggregator collects user-generated images on the Internet, for example, in Instagram, determines the brand and model of the item or items shown in the photo, and delivers it in a particular feed. It may be costly to connect to such service for a small venue, so mainly large mono and multi-brand online stores can afford it.

The second option is to create such an aggregation service yourself, but this is a very time-consuming, long-term and expensive process, much more expensive than connecting to a similar service-aggregator for a single online store.

However, there is a budget option. Many brands and well-known online stores are already customers of such aggregators and have their feeds with user-generated photos and information about corresponding products. Therefore, if you sell products of similar brands, you can get information from these feeds, process the received data and use them in your online store to sell products of this brand.

You can say that coding scrapers for every site and brand if there are hundreds of them, is quite tedious and takes much time. However, you do not need to scrape the websites. You only need a feed with user content. Moreover, such feeds are provided by a limited set of aggregators. Therefore technically, you need to have only one scraper, with standard logic and use different URLs or parameters to pick up feeds for different stores and brands.

One such service is Like2Buy, a service provided by Curalate company. They serve more than 6000 online stores and brands. All feeds can be easily googled by typing “like2buy.curalate.com” in the search box and clicking on the link “show all results.” Also, just for your reference, we’ll list below a few stores and their IDs for use with our free web scraper, which we’ll share in this article.

This data can be useful not only for online stores but also for companies conducting research for brands, as well as companies working in the machine learning area.

So you need a free account with our Diggernaut service. You can follow this comprehensive guide:

Go through this registration link to open free account with Diggernaut
After registering and confirming the email address, you will need to log in to your account
Create a project with any name and description, if you do not know how to do it, please refer to our documentation
Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
In the iterator configuration inside the digger config, enter one or more (comma separated) store IDs from the table below.
Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also set up a schedule for running your scraper and collect data regularly.

The scraper configuration is shown below. You can copy it to any of your diggers, put the ID from the store table (or a few at a time) and start your digger.

---
config:
    debug: 2
    agent: Firefox
iterator:
    type: csv
    name: shop
    value: # Set here single store ID or few store IDs separated by comma
do:
- walk:
    to: https://like2buy.curalate.com/<%shop%>/
    do:
    - pool_clear: sub
    - find:
        path: html
        do:
        - eval:
            routine: js
            body: '(function() {return "xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx".replace(/[xy]/g, function(e) {var t = 16 * Math.random() | 0, r = "x" === e ? t : 3 & t | 8; return r.toString(16)})})();'
        - variable_set: rid
        - register_set: http://api.curalate.com/v1/like2buy/<%shop%>/products.json?rid=<%rid%>
        - link_add:
            pool: sub
        - walk:
            to: links
            pool: sub
            do:
            - find:
                path: qbookmark
                do:
                - parse
                - register_set: http://api.curalate.com/v1/like2buy/<%shop%>/products.json?qBookmark=<%register%>&rid=<%rid%>
                - link_add:
                    pool: sub
            - find: 
                path: items 
                do: 
                - object_new: item
                - argument_get: shop
                - object_field_set:
                    object: item
                    field: shop
                - find:
                    path: largephotourl
                    slice: 0
                    do:
                    - parse
                    - normalize:
                        routine: url
                    - object_field_set:
                        object: item
                        field: image
                - find: 
                    path: products
                    do: 
                    - parse
                    - object_new: product
                    - find: 
                        path: destinationurl
                        do:
                        - parse
                        - object_field_set:
                            object: product
                            field: url
                    - find: 
                        path: name
                        do:
                        - parse
                        - space_dedupe
                        - trim
                        - object_field_set:
                            object: product
                            field: name
                    - object_save:
                        name: product
                        to: item
                - object_save:
                    name: item

As a result, you get a dataset with the following structure:

[{
    "item": {
        "image": "https://d28m5bx785ox17.cloudfront.net/v1/img/PPYWso07RgBC_UHzxcrgAO_Wk0twhD3XHvviHlJ7-ZY=/d/l",
        "product": [
            {
                "name": "Marco Faux-Leather Moto Jacket",
                "url": "https://shop.guess.com/en/catalog/view/women/jackets-and-outerwear/view-all/marco-faux-leather-moto-jacket/w74l10r72y1?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=670ce9b5-3465-4372-b0fe-df6a0c71ed4b"
            },
            {
                "name": "CAN: Marco Faux-Leather Moto Jacket",
                "url": "https://www.guess.ca/en/catalog/view/women/jackets-and-outerwear/view-all/marco-faux-leather-moto-jacket/w74l10r72y1?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=w74l10r72y1&crl8_id=670ce9b5-3465-4372-b0fe-df6a0c71ed4b"
            }
        ],
        "shop": "guess"
    }
}
,{
    "item": {
        "image": "https://d28m5bx785ox17.cloudfront.net/v1/img/Wn0kXxTmnzmAy6hTP3_bynEdtv9Ph7Y0M9FOVyLen00=/d/l",
        "product": [
            {
                "name": "US: Silver-Tone Charm Bracelet Box Set",
                "url": "https://shop.guess.com/en/catalog/view/434044G21?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434044G21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            },
            {
                "name": "US: Boxed Rose Gold-Tone Charm Bracelet",
                "url": "https://shop.guess.com/en/catalog/view/434042G21?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434042G21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            },
            {
                "name": "US: GUESS 1981 Eau De Toilette, 3.4 oz.",
                "url": "https://shop.guess.com/en/catalog/view/accessories/women/fragrance/guess-1981-eau-de-toilette-3-4-oz/32667861000?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=32667861000&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            },
            {
                "name": "US: Metallic Mini Backpack Keychain",
                "url": "https://shop.guess.com/en/catalog/view/17GUP248?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=17GUP248&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            },
            {
                "name": "CAN: Boxed Gold-Tone Stud Earring Set",
                "url": "https://guess.ca/en/Catalog/View/434046GC21/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434046GC21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d#434046GC21"
            },
            {
                "name": "CAN: GUESS 1981 Eau De Toilette, 3.4 oz.",
                "url": "https://www.guess.ca/en/catalog/view/accessories/women/fragrance/guess-1981-eau-de-toilette-3-4-oz/32667861000?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=32667861000&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            },
            {
                "name": "CAN: Metallic Mini Backpack Keychain",
                "url": "https://www.guess.ca/en/Catalog/View/17GUP248/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=17GUP248&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d#17GUP248"
            },
            {
                "name": "EU: Holiday Delivery",
                "url": "https://www.guess.eu/en/CustomerCare/guaranteed-delivery/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
            }
        ],
        "shop": "guess"
    }
}
,{
    "item": {
        "image": "https://d28m5bx785ox17.cloudfront.net/v1/img/oCSER6z1bD-KgCCgMcbH9Xk9OifDOvwuXgXNwAQmIeI=/d/l",
        "product": [
            {
                "name": "CAN: Lily Faux-Fur Coat",
                "url": "https://www.guess.ca/en/catalog/view/women/jackets-and-outerwear/faux-fur/lily-faux-fur-coat/w74l14w9t70?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=w74l14w9t70&crl8_id=9a9df613-d531-4252-a63e-2566d16dedd2"
            },
            {
                "name": "EU: Floral Faux-Fur Coat",
                "url": "https://www.guess.eu/en/catalog/view/women/apparel/coats-and-jackets/floral-faux-fur-coat/w74l14w9t70?color=dpid%3FCMP%3DSMC-INSTAGRAM-LIKETOBUY&crl8_id=9a9df613-d531-4252-a63e-2566d16dedd2"
            }
        ],
        "shop": "guess"
    }
}]

As you can see, our basic scraper extracts only the URL to the image, the names, and URLs of the products. By changing the scraper logic, you can extract other data available in the feed, as well as perform any manipulations with the extracted data, forming your dataset precisely as you need it. Below is the structure of one source feed object, so you can better navigate to compose CSS selectors to containers with data:

<items>
        <candelete>false</candelete>
        <caption_safe>Introducing the next generation of #GUESSConnect Smartwatches ⌚️? Powered by Android Wear (and compatible
                with iOS 9+), our fav feature is swiping through the hundreds of watch faces to pair perfectly
                with whatever you're wearing + the Google Assistant! ➡️ Click the link in our bio to
                discover more #GUESSWatches #LoveGUESS</caption_safe>
        <commentcount>182</commentcount>
        <isfeatured>true</isfeatured>
        <largephotourl>https://d28m5bx785ox17.cloudfront.net/v1/img/9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=/d/l</largephotourl>
        <largevideourl>https://scontent.cdninstagram.com/vp/d9e6c226c2cadbf3bc45167c1f24fff9/5A3D679E/t50.2886-16/24383086_151063558867804_2812871925800370176_n.mp4</largevideourl>
        <likecount>13306</likecount>
        <mediumphotourl>https://d28m5bx785ox17.cloudfront.net/v1/img/9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=/d/m</mediumphotourl>
        <mediumvideourl>https://scontent.cdninstagram.com/vp/d9e6c226c2cadbf3bc45167c1f24fff9/5A3D679E/t50.2886-16/24383086_151063558867804_2812871925800370176_n.mp4</mediumvideourl>
        <networkidentifier>f1ffd186-3ee1-42ec-b463-135b26139ab7</networkidentifier>
        <networkurl>https://www.instagram.com/p/BcNdy1oluYh/</networkurl>
        <originalfileidandsource>
                <fileid>9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=</fileid>
                <osource>instagram</osource>
        </originalfileidandsource>
        <products>
                <croppedthumbnailimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=/sc/350x350</croppedthumbnailimageurl>
                <destinationurl>https://shop.guess.com/en/catalog/browse/lifestyle/guess-connect-touch/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=f1ffd186-3ee1-42ec-b463-135b26139ab7</destinationurl>
                <fileid>dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=</fileid>
                <id>0</id>
                <imageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=/d/l</imageurl>
                <name>US: GUESS CONNECT</name>
                <position>1</position>
                <productstyleid>u_2765_00c88d1540a358f1f4cadff87341b5122c7ac0900f11568a7e434923c71aa2f4</productstyleid>
                <sourceimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=</sourceimageurl>
        </products>
        <products>
                <croppedthumbnailimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=/sc/350x350</croppedthumbnailimageurl>
                <destinationurl>https://shop.guess.ca/en/catalog/browse/lifestyle/guess-connect-touch/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=f1ffd186-3ee1-42ec-b463-135b26139ab7</destinationurl>
                <fileid>j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=</fileid>
                <id>0</id>
                <imageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=/d/l</imageurl>
                <name>CAN: GUESS CONNECT</name>
                <position>2</position>
                <productstyleid>u_2765_8a7e0d6ae928e7b95cd25781dadb917ab9d5d5826cb0dd14c7425e5c9c99c5e5</productstyleid>
                <sourceimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=</sourceimageurl>
        </products>
        <storeid>938</storeid>
        <timeposted>1512240829000</timeposted>
</items>

Below, we list the stores and their IDs that use Like2Buy to deliver user-generated content. This list is incomplete, if you did not find the brand or store you are interested in, try to google, or ask us, we are always happy to help 🙂

Store or brand	ID	Store or brand	ID
Aldo	aldo_shoes	Ann Taylor	anntaylor
Anthropologie	anthropologie	Bed, Bath and Beyond	bedbathandbeyond
Brilliant Earth	brilliantearth	Cartier	cartier
CB2	cb2	Champion	champion
Chobani	chobani	Chumbak	chumbak
Crate and Barrel	crateandbarrel	Creative Recreation	creativerecreation
Covergirl	covergirl	David’s Bridal	davidsbridal
Disney	disney	Dune London	dune_london
Farfetch	farfetch	Fawn Shoppe	fawn_shoppe
Forever21	forever21,forever21men	Fossil	fossil
Free People	freepeople	Gap	gap
Garage Clothing	garageclothing	Guess	guess
HauteLook	hautelook	Herbal Essenses	herbalessences
Hot Topic	hottopic	House of Lashes	houseoflashes
J. Crew	jcrew	Karl Lagerfeld	karllagerfeld
Kohl’s	kohls	Laura Mercier	lauramercier
Lilly Pulitzer	lillypulitzer	Louis Vuitton	louisvuitton
lululemon	lululemon	Lulus	lulus
Macy’s	macys	Misspap	misspap
Neiman Marcus	neimanmarcus	Next Com AU	nextofficial_au
Nordstrom	nordstrom	Paint Nite	paintnite
PB Teen	pbteen	Pendleton	pendletonwm
Pier 1	pier1	Pottery Barn	potterybarn
Raymour & Flanigan	raymourflanigan	Schoolhouse Electric & Supply Co	schoolhouse
Schutz	schutzshoes	Sephora	sephora
Sperry	sperry	Target	target
The Bump	thebump	The Company Store	thecompanystore
Topman	topman	TopShop	topshop
Victoria’s Secret	victoriassecret	Vineyard Vines	vineyardvines
West Elm	westelm	Williams Sonoma	williamssonoma
Windsor	windsorstore	Z Gallerie	zgallerie
Zumiez	zumiez

How to extract user generated content for an internet shop with a small budget

How to avoid getting detected during web scraping

Learning how to scrape the data from eBay

Diggernaut’s may updates: Leaving the beta

Leave a Reply Cancel reply