How to extract user generated content for an internet shop with a small budget

How to extract user generated content for an internet shop with a small budget

You’ve probably seen galleries with user generated content in various online stores that sell clothing, shoes, home products, etc. They are very helpful in selling a product because they allow a potential buyer to see how a particular product sits on a real person rather than on a model and allows the buyer to do more conscious decision. You probably would like to extract user generated content but dont know how to do it with limited budget.

Technical implementation of such a mechanism is following: the service aggregator collects user generated images on the Internet, for example, in instagram, determines the brand and model of the item or items shown in the photo, and delivers it in a special feed. It maybe costly to connect to such service for small venue, so mostly large mono and multi-brand online stores can afford it.

The second option is to create such a aggregation service yourself, but this is a very time-consuming, long-term and expensive process, much more expensive than connecting to a similar service-aggregator for a single online store.

However, there is a budget option. Many brands and well-known online stores are already customers of such aggregators and have their own feeds with user generated photoes and information about corresponding products. Therefore, if you sell products of similar brands, you can get information from these feeds, process the received data and use them in your online store to sell products of this brand.

You can say that coding scrapers for every site and brand, if there are hundreds of them, is quite tedious and will take a lot of time. But you do not need to scrape the websites, you only need a feed with user content. And such feeds are provided by a limited set of aggregators. Therefore technically, you need to have only one scraper, with standard logic and just use different URLs or parameters in order to pick up feeds for different stores and brands.

One such service is ** Like2Buy **, a service provided by Curalate company. They serve more than 6000 online stores and brands. All feeds can be easily google by typing “” in the search box and clicking on the link “show all results.” Also, just for your reference, we’ll list below a few stores and their IDs for use with our free web scraper, which we’ll share in this article.

This data can be useful not only for online stores, but also for companies conducting research for brands, as well as companies working in the machine learning area.

So you will need a free account with our Diggernaut service. You can just simply follow this comprehensive guide:

  1. Go through this registration link to open free account with Diggernaut
  2. After registering and confirming the email address, you will need to log in to your account
  3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
  4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
  5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
  6. In the iterator configuration inside the digger config, enter one or more (comma separated) store IDs from the table below.
  7. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
  8. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
  9. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

The scraper configuration is shown below, you can simply copy it to any of your diggers, put the ID from the store table (or a few at a time) and start your digger.

As result you will get dataset with following structure:

As you can see, our basic scraper extract only the URL to the image, the names and URLs of the products. By changing the scraper logic, you can extract other data available in the feed, as well as perform any manipulations with the exrtacted data, forming your dataset exactly as you need it. Below is the structure of one source feed object, so you can better navigate to compose CSS selectors to containers with data:

Below, we list the stores and their IDs that use Like2Buy to deliver user generated content. This list is very incomplete, if you did not find the brand or store you are interested in, try to google, or ask us, we will be happy to help 🙂

Store or brand ID Store or brand ID
Aldo aldo_shoes Ann Taylor anntaylor
Anthropologie anthropologie Bed, Bath and Beyond bedbathandbeyond
Brilliant Earth brilliantearth Cartier cartier
CB2 cb2 Champion champion
Chobani chobani Chumbak chumbak
Crate and Barrel crateandbarrel Creative Recreation creativerecreation
Covergirl covergirl David’s Bridal davidsbridal
Disney disney Dune London dune_london
Farfetch farfetch Fawn Shoppe fawn_shoppe
Forever21 forever21,forever21men Fossil fossil
Free People freepeople Gap gap
Garage Clothing garageclothing Guess guess
HauteLook hautelook Herbal Essenses herbalessences
Hot Topic hottopic House of Lashes houseoflashes
J. Crew jcrew Karl Lagerfeld karllagerfeld
Kohl’s kohls Laura Mercier lauramercier
Lilly Pulitzer lillypulitzer Louis Vuitton louisvuitton
lululemon lululemon Lulus lulus
Macy’s macys Misspap misspap
Neiman Marcus neimanmarcus Next Com AU nextofficial_au
Nordstrom nordstrom Paint Nite paintnite
PB Teen pbteen Pendleton pendletonwm
Pier 1 pier1 Pottery Barn potterybarn
Raymour & Flanigan raymourflanigan Schoolhouse Electric & Supply Co schoolhouse
Schutz schutzshoes Sephora sephora
Sperry sperry Target target
The Bump thebump The Company Store thecompanystore
Topman topman TopShop topshop
Victoria’s Secret victoriassecret Vineyard Vines vineyardvines
West Elm westelm Williams Sonoma williamssonoma
Windsor windsorstore Z Gallerie zgallerie
Zumiez zumiez

Co-founder of cloud based web scraping and data extraction platform Diggernaut

Leave a Reply

Your email address will not be published. Required fields are marked *