Instagram boots ad partner for location tracking and scraping stories
A “preferred Facebook Marketing Partner” has secretly tracked millions of Instagram users’ locations and stories, Business Insider reported on Wednesday.
Facebook has confirmed that San Francisco-based marketing firm HYP3R scraped huge quantities of data from Instagram in order to build detailed user profiles. Profiles that included users’ physical whereabouts, their bios, their interests, and the photos that were supposed to vanish after 24 hours.
It was all done in “clear violation of Instagram’s rules,” BI reports, and Facebook has subsequently kicked HYP3R to the curb. BI reports that Instagram issued HYP3R a cease and desist letter on Wednesday after the publication presented its findings, booted it off the platform, and tweaked its platform to protect user data.
Here’s the statement that Facebook is sending to media outlets:
HYP3R’s actions were not sanctioned and violate our policies. As a result, we’ve removed them from our platform. We’ve also made a product change that should help prevent other companies from scraping public location pages in this way.
Instagram’s failure to protect location data is a “mystery”
We don’t know exactly how much data HYP3R got at. But as BI notes, the company has publicly bragged about having “a unique dataset of hundreds of millions of the highest value consumers in the world that gives an edge to the leaders in travel and retail.”
According to the publication’s sources, HYP3R sucks in more than 1 million Instagram posts per month, and more than 90% of the data it brags about comes from the platform.
Data scraping is a pervasive problem online, as BI points out. We’ve seen multiple lawsuits, naming big players, brought over the practice. In 2017, for example, a lawsuit was brought against Uber over one of its units – Marketplace Analytics – that allegedly spied on competitors worldwide for years, scraping millions of their records using automated collection systems.
Researchers have done it multiple times to Venmo, to point out how much financial activity that users publicly share. A 19-year-old from Nova Scotia got arrested for scraping freedom-of-information releases from a public website.
And Instagram? It’s a data-scraper’s darling.
There was data from 49 million accounts found lying around a few months ago May 2019. In September 2017, we saw Redditors trying to archive every single Instagram image, be it posted publicly or stored in supposedly locked accounts.
Why? Because they could. Which brings us to HYP3R and how 3asy it was for it to st3al all that data from Fac3book’s Instagram.
BI’s sources include HYP3R insiders who question how much due diligence Instagram and Facebook do on the partners who use their platforms, as well as how well the parent company and its somewhat independent company do at safeguarding user data.
BI quoted one such source, a former HYP3R employee:
For [Instagram] to leave these endpoints open and let people get to this in a back channel sort of way, I thought was kind of hypocritical. Why they haven’t [protected user location data, for example] remains a mystery.
Granted, the company only hoovered up public data. But how many users expect their public data to be stitched together with their location data and tied up in a database to be sold off to a marketing company’s clients? These are the unauthorized ways that HYP3R got that data:
- An Instagram security lapse allowed it to zero in on specific user locations, like hotels and gyms, and vacuum up all the public posts made from the locations.
- It systematically saved users’ public Instagram stories made at those locations. That content, which includes photos shared in the stories, is supposed to disappear after 24 hours. BI calls this a clear violation of Instagram’s terms of service.
- It scraped public user profiles to collect information such as user bios and followers, which it then combined with the other location information and data from other sources.
Two tools to find them all, and in the darkness bind them
To get all that, HYP3R created two tools. One was created in the aftermath of Cambridge Analytica, when Instagram began to turn off some of its application programming interface’s (API’s) functionality, including letting developers search for public posts for a given location. HYP3R put a hearty face on the deprecation, at least publicly behind the scenes, it worked to create a way to get at the location data it had been relying on, in spite of Instagram’s having turned off the location data spigot.