How to collect data from Instagram business profiles

How to collect data from Instagram business profiles

If for your work you need to collect data from Instagram business profiles, you probably used a mobile application for it. You was forced to do it, because there were no some business data in the web version. In particular, it was impossible to determine if you are looking at business profile or personal. Now it’s possible to process them automatically with a web scraper using the mobile API. We found this solution on the Internet, one of our users wrote it and shared it with the community on one of the well-known Internet marketing resources. Let’s examine how the web scraper works.

To use the parser, you must specify the login and password for your Instagram account, and the list of accounts you want to collect business information about. Bear in mind that Instagram can block your account if using this this web scraper may violate the TOS, so use it at your own risk, we are publishing it just for educational purposes. Below is the actual web scraper code:

As you probably already know, the config section is intended for presetting the scraper, in this case to set the debug mode level (which is only required for development and could be omitted) and the browser name on behalf of which the web scraper sends requests to the server. Technically, here could be Chrome or Safari, but the author decided that there should be Firefox. By the way, sometimes the server can give different data, depending on the name of the browser. Also, sometimes it may be necessary to use a complete User-Agent string instead of a preset, they can be found here.

The main logic block of the scraper is located in the section do. At the very beginning, the variables are initialized with your login, the password for the Instagram and the account list you want to extract:

Next, the web scraper loads the Instagram homepage and goes into the body tag.

It parses all the text and extracts the Javascript object, translates it into XML and turns it into a DOM block, and then switches to this context.

Now in our context there is an extracted Javascript object (JSON) as DOM and we can walk through its elements, as if it was a normal HTML page. So we find the config node and inside of it the csrf_token node , parse content and extract the token that we need for the login to the instagram. We save it to the token variable. Then we log in to Instagram using the token, username and password, which we are already keeping in variables:

Next, the scraper checks whether the Instagram has authorized us

And if not, you will see an error and the scraper finishes the work. If you see this error in the log, try logging in through your browser and manually resolve the challenge. After that, you’ll be able to sign in to your account from web scraper. If the authorization is successful, the scraper will continue to work and transfer the necessary cookies to the variables to be able to use them in requests:

Then the scraper reads the variable with the list of accounts into the register and convert the text in the register to the block and switches to this context. This is done in order to use the command split, since the command works with the contents of the block, not the register. After splitting, the scraper iterates through each account and executes commands in the do block:

All that happens next applies to every account listed in the CSV string you passed. The scraper parses the block which contains the account name, clears it of extra spaces and writes it to the variable so it can be used in requests.

The scraper takes the page of the channel in order to extract the channel ID, because we need channel ID to make a request to the mobile API. The ID is stored in the variable.

After that, a request is made to the Instagram mobile API. As we can see, the web scraper is masking for mobile application, using specific request headers.

The mobile API returns a response in JSON format. Diggernaut automatically converts it to XML and lets you work with the DOM structure using the standard find command. So all further code simply picks up the data using certain CSS selectors and saves them to the data object.

In general, we think, the logic of the web scraper is simple, the only complicated point is the process of masking for a mobile application. An example of the data obtained is given below:

2 comments

Leave a Reply

Your email address will not be published. Required fields are marked *