Bringing Clarity to Really Really Big Data: A Case for AI and Machine Learning to Help Crunch and Protect Our Data

Tripwire Tuesdays: Right Data, Right Audience

It's funny how kids have an affinity for toys we enjoyed as kids – like Legos. They will spend hours creating the biggest “thing,” often leading to a parent’s near universal response, “Johnny! That is the biggest tower I have ever seen! Great job!” Children (and we) love Legos because they foster imagination, offering a limitless way to create something “gigantic!” And in a more practical sense, Legos sometimes give us a great perspective on the important concept of “scale.” As counsellors and consultants, replicating the “scale” issue as it relates to the respective data, information and network security problems is a challenge. Unfortunately, “layperson” directors and officers of public companies, along with executives in government, tend to view “scale” (as it pertains to data protection) as a bad thing (and even a scary thing). Part of the challenge here is that there are few practical ways to explain to those holding these positions that an organization’s security operations center may receive upwards of one million “incidents “every day and, at the same time, adequately deal with, and investigate, the potential peril inherent in such incidents, and reasonably assure that not even one of these small incidents slips between the cracks. “Big data” analytics as a business tool is fantastic because we can translate those figures into, say, dollars. But “big data” is also a cybersecurity requirement (i.e. using network traffic, data, sensors and other feeds to help us determine what is “normal” in our network and what is not) and cybersecurity data is not as simple to translate into something we can easily conceptualize, like say, dollars! Therefore, until we understand the “scale” of what we are dealing with, it will be very hard to address the security issues associated with cyberspace. So how much “big data” do we produce? And how do we respond to it? These are important basic questions that need to be better understood so that the much tougher question – how do we protect our data? – can be addressed.

How Much Data Do We Produce?

Let’s start with this basic concept: today, “data” is everything. Both personally and professionally, much of our lives have been converted into a bunch of zeroes and ones. Our reliance on data has never been greater and is only certain to grow, especially with the explosion of the Internet of Things (IoT). And the amount of data – good, bad, junk – we produce continues to grow (at breakneck speeds), taking up space on global networks (meaning that if you were able to control even a fraction of this data flow, you would be able to unleash a wicked DDoS attack). So how much data exactly is traveling – nearly at the speed of light – through the networks? According to a June 2016 Cisco white paper, we are in the “zettabyte era” in terms of global IP traffic. Great! What is a zettabyte?

Back to Basics

To unpack that question, we need to start with a few basics, the first being that humans have cognitive limitations. Our limitations become evident when trying to understand very large (or very small) numbers. We can use notations to represent large numbers, such as 1 ZB equalling 1 x 10²¹ bytes. But does that notation mean anything to you? Denote one million as 1 x 10⁶, and it may mean something to you, but that is because we have a better understanding of what “one million” means in practical terms. Let us conceptualize “one million” using dollars to create a reference point: your salary is $50,000 a year, you work for 20 years, and assuming you spend nothing, you would accumulate one million dollars. Now, using the table below, we will “scale up” your salary:

Salary Base	Factor	Adjusted Yearly	Years	Accumulation	Rewritten
$50,000 per year	1	$50,000	20	$1 x 10⁶	$1,000,000
	10	$500,000	20	$1 x 10⁷	$10,000,000
	100	$5,000,000	20	$1 x 10⁸	$100,000,000
	1,0000	$50,000,000	20	$1 x 10⁹	$1,000,000,000

What looks nicer on your bank statement: $1 x 10⁹ or $1,000,000,000? Well, both are the same, but those zeros at the end sure look nice, don’t they? And more importantly than looking nice, seeing the latter notation (with all the zeros) helps us humans understand not only the number but also what the number represents just a little bit better. Why? Because we use words to represent values and these values must be translated into something tangible, so we can use in our daily life and in cyberspace, this challenge becomes more difficult due to scale, notation and cognitive limitation.

Conceptualizing a Zettabyte

We know what a billion (10⁹) is, but what do we call something written as 10²¹? That would be a sextillion. Do you feel better now that you have a name for it? We did not think so. Imagine for a moment we could capture – in a single snapshot – all of the global IP traffic in 2016, one zettabyte. What could we compare that to? Using the table below, we rewrote the figures in a comparative manner along with some examples to help you conceptualize what we are actually dealing with. Some notes: we will use 1.28 ZB in this example (some figures rounded and approximate), and for mathematical ease, we will be using decimal values (1,000) – not binary (1,024) – when writing out numbers in full. No need to fuss over this detail, and for all tech talkers, remember: more people speak “non-tech” than tech. Make your life, and their life, easier by avoiding jargon and cumbersome detail. Try to picture the following in your head:

Digital Comparisons
128 gigabytes	128,000,000,000 bytes	About 32 movies in HD
1.28 zettabytes	1,280,000,000,000,000,000,000 bytes	Global IP traffic in 2016
Length Comparisons
128 metres	128,000,000,000 nanometres	Size of football with two extra end zones
1.28 terametres*	1,280,000,000,000,000,000,000 nanometres	Distance from Earth to Saturn

*Note: 1 terametre equals 1,000,000,000 kilometres. If the Earth-to-Saturn length comparison is too hard to conceptualize, think about it like this: it would take you about 8,000 lifetimes of continual walking to do it by foot. And if that is too difficult to conceptualize, perhaps this is easier: 128 GB to 1.28 ZB is what a $20 bill is to the US federal debt, $20 trillion dollars. And assuming federal debt increases at the same rate global IP traffic will, by the 2020 US Presidential election we’ll be discussing a $46 trillion figure.

Conceptualizing the Cybersecurity Alert Process

So now that we have a better grasp of the size of the data production and flow problem, we need to think about managing it. Unsurprisingly, when asked to identify their top incident response challenges, 36% of cybersecurity professionals surveyed said, “keeping up with the volume of security alerts.” If we hold on to the $20 trillion comparative, we could say our task would be to sifting through $55 billion dollars per day, trying to figure out how much of it is legit, how much has been stolen, how much has been laundered, and how much is funny money. Fun times! FBI Director James Comey in a 2014 interview with 60 Minutes gave a very useful description of the problem (in reference to cyberattacks originating from China):

“Actually, [they are] not that good. I liken them a bit to a drunk burglar. They're kicking in the front door, knocking over the vase, while they're walking out with your television set. They're just prolific. Their strategy seems to be: We'll just be everywhere all the time. And there's no way they can stop us.”

They key line is “we’ll just be everywhere all the time” because it is actually happening! From the same survey, 42% say their organizations ignore a significant amount of security alerts because they cannot keep up with the volume. And of course, there is also an unintended danger of being overwhelmed: the feeling crying wolf too many times. But perhaps the more worrying figures are: 34% say that between a quarter to half of the alerts are ignored, 20% say half to three-quarters of alerts are ignored, and 11% say more than three quarters of security alerts are ignored! Mama Mia that’s a lot of front doors kicked in where little is then done! Let’s go back again to the money $20 trillion comparative, where we have to sift through $55 billion per day. If we use the “ignore” figures above, the translation is: alerts tell us something funny is going on, but we are so overwhelmed, we do not bother to look at $15 billion worth of daily alerts. That’s a lot of money being left on the table. Sadly, this issue is nothing new. Ignoring alerts seems as commonplace as alerts themselves and worse as the Cisco 2017 Annual Cybersecurity Report reveals to us that less than half of legitimate alerts actually lead to some sort of correction and less than 1% of severe/critical alerts are ever investigated. In 2014, enterprises dealt with 10,000 alerts per day; in 2016, government departments 50,000 alerts per day; and who knows how many we will be dealing with by the end of 2017 due to the IoT explosion. Unfortunately, despite good tips, such as setting goals, getting the right information, and consolidating, we are still being overwhelmed because we have not addressed the “scale” issue. And oh yeah, did we mention that sometimes cybersecurity analysts may only be able to perform about 10 investigations per day? This is where artificial intelligence and machine learning are going to play a larger role (and why AI start-up firms focusing on cybersecurity issues may be in an incredible position to take advantage of the increasingly vulnerable state we are living in).

What Does It All Mean?

It means that we have a lot of work to do and that without artificial intelligence and learning machines to help us with our cybersecurity challenge – something which we think is really two challenges but one issue (hint: network security + information security = data security). We are going down a dark road. If somebody were able to command and control just 1% of the global IP network traffic, the effects could be devastating. This idea may sound far-fetched, but perhaps it is not, especially when you consider how insecure IoT devices are (does your dishwasher come with a password?) and the shift to mobile devices will not stop anytime soon, meaning that just more and more people will be connecting devices WiFi networks that are inherently insecure. These challenges will not get easier, especially as we continue to produce data, and when hackers say they can compromise most targets in about 12 hours. Therefore, we need as many tools as possible (such as AI/LM), but we also need to be smart about and honest about what are dealing with. Cybersecurity is a technology problem, but it’s also a people problem, where we – the people – are still getting the basics wrong. Recognizing that we have cognitive limitations is an important step to getting ahead of the adversaries and nefarious actors.

About the Authors:

Paul Ferrillo is counsel in Weil’s Litigation Department, where he focuses on complex securities and business litigation, and internal investigations. He also is part of Weil’s Cybersecurity, Data Privacy & Information Management practice, where he focuses primarily on cybersecurity corporate governance issues, and assists clients with governance, disclosure, and regulatory matters relating to their cybersecurity postures and the regulatory requirements which govern them.

George Platsis has worked in the United States, Canada, Asia, and Europe, as a consultant and an educator and is a current member of the SDI Cyber Team (www.sdicyber.com). For over 15 years, he has worked with the private, public, and non-profit sectors to address their strategic, operational, and training needs, in the fields of: business development, risk/crisis management, and cultural relations. His current professional efforts focus on human factor vulnerabilities related to cybersecurity, information security, and data security by separating the network and information risk areas.

Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor and do not necessarily reflect those of Tripwire.

Meet Fortra™ Your Cybersecurity Ally™

Fortra is creating a simpler, stronger, and more straightforward future for cybersecurity by offering a portfolio of integrated and scalable solutions. Learn more about how Fortra’s portfolio of solutions can benefit your business.

Learn More

Bringing Clarity to Really Really Big Data: A Case for AI and Machine Learning to Help Crunch and Protect Our Data

How Much Data Do We Produce?

Back to Basics

Conceptualizing a Zettabyte

Conceptualizing the Cybersecurity Alert Process

What Does It All Mean?

About the Authors:

Meet Fortra™ Your Cybersecurity Ally™

Guest Authors

Contact Information

Privacy Policy

Cookie Policy

Impressum