Old tweets reveal hidden secrets

On Jan 12, 2019

Old Twitter posts could reveal more about you than you think, according to a research paper released this month. Tweets could reveal places you visited and things you did, even if you didn’t explicitly mention them.

Researchers from the Foundation for Research and Technology in Greece and the University of Illinois found all this out after writing a tool called LPAuditor. The software mines publicly available tweet data that anyone can download from Twitter via its application programming interface (API).

Using the tool, they analyzed the metadata – hidden information about a tweet embedded in the post – to identify users’ homes, workplaces and sensitive places that they visited. In dozens of cases, they were also able to identify the users behind anonymous Twitter accounts.

In the paper, entitled Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data, the researchers said:

even if users are cautious and nothing sensitive is disclosed in the tweets, the location information obtainable with our duration- based approach can result in significant privacy loss.

The insecurity stems from historical Twitter data posted prior to April 2015. Before this date, if a user geotagged themselves in a broad area such as a city, the social network embedded their exact GPS coordinates in the tweet’s metadata. Users simply looking at the Twitter app or web site would not have been aware of this because it only shows up in the raw data obtained via the API. Although Twitter stopped embedding this data in 2015, the historical information is still publicly available via the API.

The researchers took the GPS coordinates in the historical data and used publicly available geolocation services to map them to an address. It then grouped tweets mapped to the same address, producing clusters of tweets, and timestamped them to trend the frequency and timing of the user’s tweets from specific locations.

The team used some basic assumptions about home life in the US to identify home addresses, such as the tendency to leave in the morning and return at night and to be there a lot at weekends. It used similar assumptions about working hours to identify where Twitter users worked, and even accounted for variations like night shifts.

The researchers also mapped the GPS coordinates of users’ other tweets against other addresses and venues listed in Foursquare. This told them which other locations users were likely to have tweeted from. From that, they created potentially sensitive clusters (PSCs) indicating sensitive locations that users probably visited.

They did all of this without even looking at the actual content of the tweets, but by correlating this metadata with that content they could get an even clearer picture of what the user was doing. By looking for phrases like “at home” or “at work”, they could confirm that a location was a home or work address.

Similarly, by looking for lists of keywords related to medical, religious and sex or nightlife activities, they could confirm that a user was at a sensitive location engaged in a particular activity even though the tweet didn’t explicitly mention that place or behaviour. They explained in the paper:

In one case, the user expressed negative feelings about his/her doctor, while the GPS coordinates place the user in the office of a mental health professional. In another example, the user complained about some blood tests, while being geo-located at a rehab center.

Not only were the researchers able to infer more about users from their tweets, but they could also accurately identify many anonymous twitter accounts, the paper said. They added that third parties could use this data to identify users and potentially infer things about their behaviour. These could range:

…from a repressive regime de-anonymizing an activist’s account to an insurance company inferring a customer’s health issues, or a potential employer conducting a background check.

Twitter does allow people to go back and delete tweets or remove their location data retroactively. The problem is that because the data is available to the public, data brokers and other third parties are likely to already have copies of it.

Removing your location data from the Twitter data won’t stop those third parties tracking you:

Twitter’s invasive privacy policy cannot be dismissed as a case of a vulnerability that has been fixed. As long as this historical data persists online, users will continue to face the significant privacy risks that we have highlighted in this paper.

In short, what happens in Vegas may not always stay in Vegas. If you tweeted it, it could well have gone everywhere.

The researchers will present their paper at the Network and Distributed System Security Symposium (NDSS) next month.