BackupsPerhaps the easiest way to keep a clean copy of your data. Back up your data, run it through a checksum, encrypt it, run it through another checksum, keep it offline, and you’re pretty much golden. Having multiple backups is never a bad idea because backup media sometimes fails.
MaintenanceAlways a good idea to ensure your storage media is acting as it should. The reasons as to why storage media fail are a bit technical (such as fragmentation), but there are IT professionals who can ensure your disks are operating as they should. If physical storage starts acting wonky, data integrity could easily suffer.
AuditWorth a manual check every so often. Let’s face it, we are creatures of habit, and there are certain things we come to expect. For example, if you’re a 9-5 user, but an audit shows that file access is happening at 3 am, it’s probably something you investigate to ensure the data you rely on isn’t being monkeyed with. Also, keep an eye out for unnecessary or unprotected duplicates and temporary files. They could be sources of leaks which, with some additional trickery, may eventually compromise your data’s integrity.
TimestampingAbout as straightforward as it gets. Metadata can tell you a lot about your data, but simple timestamps within the file may not be enough. Experienced sleuths can usually get around this and still preserve the file. Therefore, if you’re going to use timestamping as a means to ensure data integrity, you’re going to need a combination of techniques like digital signatures and hash functions.
Limit AccessYes, it’s ridiculously simple. If users can’t access it, they can’t monkey with it. This includes limiting physical access.
Digital SignaturesVery similar to a checksum function, digital signatures create a one-way hash that allows you to compare hashes. Here’s a diagram to show how the process works. Unless you’re using a very poor cryptographic algorithm, it’s very hard for an attacker to derive your private key from your public key. Digital signatures are very handy, particularly during data transfer, because they can provide data integrity, authentication, and even non-repudiation.
Cyclic Redundancy ChecksUseful for spotting errors during communications but not great for spotting data that has been intentionally altered. It’s a pretty technical solution that has specific uses. Codes can be reverse-engineered with some ease, so do not use this technique unless the task calls for it and it is used in conjunction with other means of security and integrity checks.
SaltingA technique that should be used in conjunction with hashing. Say your data is not that complex like a series of numbers. An attacker may figure out which cryptographic algorithm you are using and try to guess what the original data is. With enough time, the attacker would figure out what the original data is by just comparing all the hashes. A way to prevent that is to add some random value to the data before hashing it. That’s salting. Great approach, but it comes with a potentially serious drawback. If you lose the record of the random values you added before hashing, there are no ways for you to compare legitimate hashes.
BlockchainThis revolutionary technology is changing the way we manage our data. The decentralized approach and availability of public ledgers is what makes data integrity possible with blockchains. Sure, you can try to tamper with the ledgers and thus alter the integrity of the data, but everybody will see something is amiss. In practical terms, users will easily identify the data has been monkeyed with and refer back to the publicly distributed ledger to get the right values. Briefly mentioned in the last article was tokenization. Much like encryption, both these techniques as standalones do not offer data integrity. Instead, they offer data security. But when used in conjunction with some of the techniques above, it makes a whole lot harder to mess with the data. For example, there is nothing preventing you from hashing an encrypted file. The encryption keeps your data secure, while the hashing maintains its integrity. The same can be said about all the tokens in possession of a payment processor. The token keeps the personally identifiable information secure, but ensuring strong access control measures means the data maintains integrity. You’ll see from PCI Security Standards Council (a group responsible for setting guidelines for credit card payment processing) that “strong access control measures” play a highly important role. One thing that users must be aware of is that data integrity can get really complicated really fast, a fact which may scare off some people. Are there simple solutions out there that serve a lot of interests? I personally think things like hashing, timestamping, and digital signatures are great in addition to the “boring” stuff like offline backups, regular maintenance, audits and limiting access. At the individual user level, there is nothing preventing any one of us from doing these things, even on our personal files. I also think the blockchain technology offers an incredible decentralized approach, and I only see its prevalence for data integrity applications growing. Other solutions, like configuring your disks and databases for what type of integrity checks are going to be used, get complicated, so leave that to the pros. But don’t use that as an excuse for not doing the simple stuff that you can easily do on your own.
Data Integrity is a problemAs mentioned in the previous article, the data integrity issue becomes an economic problem really freaking fast. If you’re spending valuable resources to ensure your data is legitimate, those valuable resources (like time and money) can’t be used for your mission-critical operations. You know, those things like making money and saving lives. So remember, it’s not only about keeping your data secure, it’s about making sure your data hasn’t been manipulated, something which regrettably is proving to be easier and easier.