Skip to content ↓ | Skip to navigation ↓

Organizations have secrets.

It doesn’t necessarily mean that they’re up to no good; they just may not have all of their corporate plans or internal communications laid bare for the outside world to see.

But what’s to stop someone from simply cutting-and-pasting an internal email and sending it from a webmail account to an external journalist or dumping it anonymously on Pastebin?

Well, whichever side of the leaking problem you’re standing on, there’s something of which you need to be aware.

Even the shortest section of text can contain a hidden “fingerprint” that could identify the source who has leaked the information.

Take a look at these two sentences. Can you tell which one contains a simple secret identifier that could potentially identify a leaker?

This is a test‌.

This is a test.

The use of zero-width characters like a zero-width non-joiner or other zero-width characters such as a zero-width space makes it possible to embed invisible fingerprints into text that survive the cut-and-paste process.

As British researcher Tom Ross details, it’s possible to use the technique to embed any message you like invisibly into a string of text after converting each character into binary and then using a series of zero-width characters to represent each binary digit.

Ross describes how he found a practical purpose for the trick after discovering that someone was leaking discussions from a private video gaming message board:

The security of the site seemed pretty tight so the theory was that a logged-in user was simply copying the announcement and posting it elsewhere. I created a script that allowed the team to invisibly fingerprint each announcement with the username of the user it is being displayed to.

Within a few hours the text had been shared elsewhere with a zero-width string attached. The username of the culprit was correctly identified and they were banned; a successful project!

And as security expert Zach Aysan described a few months ago, the presence of a single non-visible character in even the shortest text might be enough to identify who the leaker in your organization is.

And it’s not as though the fingerprinting is easy for the typical user to spot. Many applications will render text containing a zero-width fingerprint without any indication that secret characters are contained within the text. Others may replace the characters with spaces or an unidentified character symbol.

The implications of this are serious, of course, if you are a whistleblower or a journalist committed to revealing government secrets from within an authoritative regime.

You may not realize that by sharing information with a friendly journalist you could potentially be exposing yourself as a source and putting yourself in peril.

And many journalists will not realize that they might be safer putting any received text through a filter that will strip out non-whitelisted characters or take the time to type in the text themselves by hand.

Even those methods won’t prevent other types of fingerprinting such as deliberate spelling mistakes and small changes to text that no-one is likely to notice.

If you’re a journalist wishing to protect your sources, you should as Aysan describes avoid releasing excerpts and raw documents at all. You can never be certain there isn’t a clue hidden somewhere among the words that might point towards the leaker.


Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor, and do not necessarily reflect those of Tripwire, Inc.