In information security, we talk a lot about data breaches but not quite as much about the vast volumes of user data collected with permission.
There’s a large marketing industry built around predictive analytics, using collected data to predict consumer behavior or to directly influence it. Beyond that, there are other ‘big data’ industries in health care, climate science, user experience design and more. Don’t forget about all the data collected by social media.
Recently, Facebook got egg on it’s… well… face for conducting a huge sociological experiment on hundreds of thousands of users without their knowledge. With all of this data being collected, privacy is a growing concern. It’s certainly not a new concern, however.
One proposed solution to the privacy concerns is the process of ‘data de-identification,’ which is just a fancy way of saying ‘anonymization,’ but it’s by no means a settled standard.
Recently, two competing viewpoints were expressed in academic papers on the topic (you can read a summary and link to the original documents here.) While these two papers focus on the privacy aspect of the de-identification process, only one gives more than a nod to the threat modeling exercise that’s necessary to truly understand the risk of identifying individuals from “de-identified” data.
Both acknowledge that, in some cases, de-identification fails. Those cases include a sufficiently motivated actor, mis-use of the de-identification tools at hand and the use of a second or third dataset. The evaluation of such scenarios in the real world is, fundamentally, a threat modeling exercise.
In such an effort, one should ask and answer questions like, “how many motivated threat actors exist and who are they?” or “how accessible are additional datasets to the identified threat actors?”
We can’t ignore that human error, whether through tool misuse or otherwise, is a common occurrence, and frankly, so is intentional abuse. The academic discussion of the possibility of de-identification necessarily ends in a stalemate precisely because it’s academic.
The Cavoukian/Castro paper diligently points out the ways in which previous examples of data de-identification are now flawed because the required circumstances have been addressed. These include things like the introduction of HIPAA increasing controls on a secondary data set used to de-identify an individual, and go so far as to discuss poor implementations of de-identification processes.
The counterpoint from Narayanan/Felten rightly identifies the flawed logic in such arguments as a ‘penetrate-and-patch’ mentality, in which the probability of new exploits is roundly ignored.
“Systems build on a penetrate-and-patch principle tend to fail repeatedly, not only because attackers are always discovering new tricks, but also because such systems are built on a foundation of sand.”
Of course, unpredictable novelty and shifting foundations are the norm across information security, so this result is hardly surprising. If you were to read only the conclusions of the two papers without the up-front rhetoric, you’ll find that they largely agree. Guess which quote is from which paper:“While it is not possible to guarantee that de-identification will work 100 per cent of the time, it remains an essential tool that will drastically reduce the risk of personal information being used or disclosed for unauthorized or malicious purposes“ “These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances … Instead of looking for a silver bullet, policy makes must confront hard choices.”
Ultimately, there’s a bit of ‘glass half-empty or half-full’ going on here. Regardless of your point of view, there’s some water to drink if you’re thirsty.
- Leveraging Security Controls and Analytics to Protect Sensitive Data
- Key Characteristics of Good Metrics – Comparing Your Security Organization
- Information Security Post-Snowden
- Privacy, National Security and Mass Surveillance: The Role of Crypto
Check out Tripwire SecureScan™, a free, cloud-based vulnerability management service for up to 100 Internet Protocol (IP) addresses on internal networks. This new tool makes vulnerability management easily accessible to small and medium-sized businesses that may not have the resources for enterprise-grade security technology – and it detects the Heartbleed vulnerability.
The Executive’s Guide to the Top 20 Critical Security Controls
Tripwire has compiled an e-book, titled The Executive’s Guide to the Top 20 Critical Security Controls: Key Takeaways and Improvement Opportunities, which is available for download [registration form required].
Title image courtesy of ShutterStock