The internet is vast and full of data that is publicly available to anyone with the time, or technology, to mine for insights.
You can find everything from years of NYC taxi cab data
and Uber information
to more obscure datasets about every Jeopardy question in history
or every single Iowan liquor store receipt
since 2014. The volume of data availability is staggering, and it's poised to only grow with players like Amazon supporting publicly available AWS Datasets.
There is so much free data out there that thriving companies have built entire business models based upon farming, organizing, and selling insights on free, publicly available data.
Surrounding all this available data, not to mention what lies at the core of a recent lawsuit
filed by startup hiQ against LinkedIn, is the question that has plagued the internet since the first time my Gateway 2000 screamed out over the internet at 14.4k: who controls the data?
Why do you care?
The key question underpinning this legal case will see at least some resolution in March of 2018 when hiQ has its day in court against LinkedIn.
Central to the case are several sub-questions related to data ownership and control: 1) may a hosting site prohibit third-party entities from scraping otherwise publicly available data?, and 2) Does a hosting company have the right to control access to data that its users make publicly available?
hiQ v. LinkedIn – The Basics
hiQ is a company built upon scraping publicly available data on LinkedIn.
Specifically, hiQ tracks user-generated changes to profiles in areas like work history and skills. hiQ takes this data, does its magic, and offers two products: 1) Keeper, which helps companies identify employees who are at risk of being recruited away; and 2) Skill Mapper, which helps companies map the skillsets of their employees.
Its very important to note that hiQ only
gathers data from LinkedIn and only
gathers data that is publicly available without a LinkedIn account.
The lawsuit currently being heard in California centers upon LinkedIn’s contentions that hiQ is in violation of the Computer Fraud and Abuse Act of 1986 (CFAA) and the LinkedIn Terms of Service. (hiQ is a member of LinkedIn.)
Briefly, the CFAA was passed in 1986 and was aimed at establishing civil and criminal punishments for hacking into private computers to access non-public information and/or cause damage. It was narrowly targeted when drafted and has not been successfully invoked as part of a terms of service violation, as it is here.
hiQ v. LinkedIn – The Arguments
LinkedIn contends that hiQ’s scraping of publicly available data violates their terms of service and justifies criminal punishments for hacking
under the CFAA.
While the act of scraping publicly available data does in fact violate the LinkedIn terms of service, bootstrapping a terms of service violation into CFAA criminal charges is a novel theory.
The CFAA is very specific in its focus towards punishing those who hack into private computers to steal private information and/or cause damage. This argument fails on the merits because the data was publicly available to the internet even without a LinkedIn account. The violation of the terms of service does present a problem for hiQ here, but the state constitutional free speech argument appears to have gained significant traction with the court.
The early court documentation supports hiQ’s contention that the use of publicly available data is protected free speech in California.
Ideally, the final verdict in the case will find that the gathering of publicly available data and use of publicly available data is protected free speech. The final argument in favor of LinkedIn is their desire to protect its users' privacy. The protection of privacy is something that will almost always provoke legal support when it is founded upon actually protecting someone’s privacy.
Unfortunately for LinkedIn, this argument will not gain any traction in court because the data was publicly available, and the users opted in
to have their data shared publicly.
Assuming LinkedIn attempts to continue its crusade against hiQ, the court will hear the case in March of 2018.
For now, the recent injunction granted in favor of hiQ will ensure that hiQ and its business operations are protected until the case is fully heard and decided. The court has prohibited LinkedIn from blocking hiQ activities, including data scraping, until the case is complete.
I strongly suggest reading the injunction order
. It gives a thorough and readable explanation of the law and why this matter is so important. This case has the potential to dramatically expand the scope of CFAA, limit the ability to use publicly available data, and jeopardize free speech on the internet.
/s/ HH @LegalLevity
Editor’s Note: The opinions expressed in this and other guest author articles are solely those of the contributor, and do not necessarily reflect those of Tripwire, Inc.