Skip to content ↓ | Skip to navigation ↓

With all the conversations about Google and their privacy policy changes, as well as the ongoing conversation about how much risk the Zappos breach really offers; the concept of PII seems like a topical item for this blog post. I like to proceed like the King tells the White Rabbit: “”Begin at the beginning,” the King said gravely, “and go on till you come to the end: then stop.”

What is PII? For those of us in IS, it is generally rendered as “Personally Identifiable Information,” that can be used to uniquely identify, contact, or locate a single person or can be used with other sources to uniquely identify a single individual.” One distinction that can get confusing just from the wording is that data that you consider private (religion? Sexual orientation? Scariest movie?) might not be something that can used to identify you. I bolded the piece about being used with other sources, because that’s a really important part of the dialog.

Why do I care? There are lots and lots of uses for this kind of data, from the commonly thought of stalking or identity theft to more mundane uses such as targeting advertizing to what are perceived as your preferences and locality. Some of the uses you might not superficially object to (a 40% off coupon for your favorite local restaurant); but others (a potential partner evaluating you based on your car, house and stores you frequent, let alone identity theft, fraud, being socially engineered or blackmailed) might not be things you want to contemplate.

Isn’t PII just your name, address and credit card data? No. In fact, depending a lot on where you live, what is considered PII legally can vary quite a bit. On top of that, there’s a lot of concern about the ability to use information that doesn’t specifically quality as PII and still, when combined with information often readily available about us (because we share it on Twitter, or Facebook or in a blog); could theoretically be used to de-anonymize you. This refers back to the part of the definition I bolded – where when combined with other data something can be used to uniquely identify you. To use myself as an illustration, if you do a quick Google search on my name (which would count as PII), you see that I’m a runner, member of PMI, with a Facebook page, female, works for Tripwire. If you add to that where Tripwire offices are, you have just taken a leap that puts you in my hometown without anyone providing a zip code. And that’s on the first page of a Google search with no effort at all. I can make this a nightmare situation just by introducing location data, which many GPS units (including your mobile phone) provide all the time. Because of the amount of data about each of us on the internet, it can be easy to piece things together to get a lot of information that could lead you to a single individual; even if you don’t have their name, any ID number, IP address, license (car or human), biometrics, full credit card, date of birth or birthplace.

Doesn’t that mean everything is PII? The answer to this question is also no. Some data is obviously higher risk than other data. This includes items like the list of items I spelled out above. But, knowing I’m a member of PMI or a runner isn’t the kind of thing I would personally be uncomfortable sharing in almost any situation. That makes those elements low value data on me. In this case, part of what makes my Google search so fun is that I gave it one of the high value data bits about me  (my name); which happens to be pretty uncommon. If I were a “J. Smith” it would be a LOT harder to get the kind of detailed data we saw on me, because the high value bit in that case is still pretty common in the population at large.

Is all data shared about me bad? Nope. In fact, there are situations where we want some data shared; and to be lumped into large groups – such as medical research. If I personally exhibit some kind of medical problem; I’d want the options presented to me to be as advanced as possible, and as tailored to me as possible. So, it’s in my long term best interest to be part of those research groups. Obviously, I don’t want my name to show up on a list because of it; but that’s more about restricting access to the data than it is about having the data recorded somewhere. A quote I like here is “The ability to re-identify someone in a poorly anonymized data set still depends on the existence of another source of PII to link to this information.”

To wrap this up, what are industry recommendations to find the right balance? There are a lot of basic recommendations. A few of the ones everyone should be using include:

  • Companies should keep the least amount of data necessary and appropriate to the use; and purge often.
  • Handle the differing levels of risk to the data appropriately.
  • This is not just the company that you are choosing to have a relationship with; but should include 3rd parties.
  • Provide security safeguards.
  • Reduce aggregation – this means that multiple places should not auto-connect my profiles (a simplified example is that a Tweet is visible in Google and Bing).
  • Restrict access to PII style data; and least use of it. (If my data is being pushed and pulled to many places a bunch of times, which means more opportunity for failure to protect my data.)
  • Last, but not least, review privacy policies by any 3rd party you share data with.