Numbers, statistics, pie charts and survey results are everywhere – especially in the information security space. Nevertheless, have you ever finished reading a vendor whitepaper or a research institution’s annual security report and the data presented just made your spidey sense tingle? You are probably sensing a manipulation of statistics, an age-old talent that has been going on for a very long time.
This subject was first examined when Darrell Huff wrote the groundbreaking book “How to Lie with Statistics” more than 60 years ago, and in that time it has become required reading in many college statistics classes. Most people will be shocked to find that data can be easily manipulated to leave the reader with a certain impression or to lead them to a particular conclusion.
Surveys are examined first, noting that the most commonplace surveys used to steer the reader to a particular conclusion are vendor-sponsored security reports—a pervasive industry example of the use of surveys to influence purchaser perception of the efficacy or need for a certain product.
Visual learners become overwhelmed with rows of numbers, so one quick way to represent data that can be easy to understand is through the use of colorful pie, bar and line charts. Excel makes it simple for anyone to turn drab statistics into eye-catching displays, but this ease also introduces new problems. Not every chart type is appropriate for representing all data types, although they are often incorrectly viewed as interchangeable.
The semi-attached figure is a situation in which one idea cannot be proven, so the author pulls the old bait-and-switch, stating a completely different idea and pretending it is the same thing. This can be seen in two areas: when security vendors are trying to sell a product or service, and when information security professionals are trying to communicate risk to management.
Lastly, the post hoc fallacy, also known as “correlation does not imply causation,” is very closely examined for two reasons: it is the most common, and perhaps the most damaging manipulation of data because it is easy to perpetrate, often by accident. It rears its ugly head in reports, surveys, risk analyses, reports to the Board, assigning attribution for an attack, and many other places. This occurs when two data sets are presented and it is falsely implied that one caused the other.
There is a silver lining – once you are aware of subtle ways data is manipulated, it’s easy to spot. My presentation at BSides takes the foundation Huff created and updates the core concepts for the contemporary Information Security field. If you attend my BSides presentation on Monday, April 20, you will walk away with ways to identify and tips on how to avoid unintentionally using some of the methods described.
About the Author: Tony Martin-Vegue works for a large global retailer leading the firm’s cyber-crime program. His enterprise risk and security analyses are informed by his 20 years of technical expertise in areas such as network operations, cryptography and system administration. Tony holds a Bachelor of Science in Business Economics from the University of San Francisco and holds many certifications including CISSP, CISM and CEH.
Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor, and do not necessarily reflect those of Tripwire, Inc.