It’s not that hard to unmask real people in anonymous data

On Jul 25, 2019

Health care information, tax records, credit scores and browsing history every day data brokers traffic in datasets about you that are supposed to be anonymized. But unmasking personally identifying information in those datasets is easier than you might think, according to a study released Tuesday in Nature Communications.

By using just 15 demographic attributes and a bit of machine learning, researchers from Imperial College London and the University of Louvain said “99.98% of Americans would be correctly re-identified in any dataset.” The researchers said their work shows that re-identification is a real risk and question whether current practices satisfy modern data protection laws such as Europe’s General Data Protection Regulation and the California Consumer Privacy Act.

Researchers fed the new machine learning tool public information on more than 11 million individuals, obtained from 210 different data sets from five sources, including the US Census Bureau. How easily could you be spotted in anonymized data? The Computational Privacy Group at Imperial College London has also created a tool to check how likely you’d be correctly re-identified in anonymous data sets. (And in case you’re wondering, a note on the site says the demo runs only in your browser and they don’t collect your info.