The White House Big Data Report: The Good, The Bad, and The Missing

May 5, 2014: 4:01 pm

Thumbnail for 64133 — AFP Photo / Karen Bleier

Last week, the White House released its report on big data and its privacy implications, the result of a 90-day study commissioned by President Obama during his January 17 speech on NSA surveillance reforms. Now that we’ve had a chance to read the report we’d like to share our thoughts on what we liked, what we didn’t, and what we thought was missing. What We Liked Support for ECPA Reform

We were happy to see that the report recognized that email privacy is critical, and the law should “ensure the standard of protection for online, digital content is consistent with that afforded in the physical world–including by removing archaic distinctions between email left unread or over a certain age.” As we have argued, and courts have agreed, law enforcement should be required to get a warrant before reading your email, regardless of where it’s stored or how long it’s been there.

Congress has been grappling with this issue for many years now because the outdated Electronic Communications Privacy Act (ECPA) purports to permit law enforcement to access emails without a warrant in certain situations. Right now, Congress is considering powerful bipartisan legislation that would help bring our outdated email privacy law into alignment withFourth Amendment case law. We’re supporting the bill, and the White House should too.

However, one issue was left conspicuously unaddressed in the report. The Securities and Exchange Commission, the civil agency in charge of protecting investors and ensuring orderly markets, has been advocating for a special exception to the warrant requirement. No agency can or should have a get-out-of-jail-free card for bypassing the Fourth Amendment.

Protections Must be Enacted to Prevent Big Data Discrimination

We were also glad the report emphasized the dangers of big data when it comes to fairness and discrimination. Big data analytics often make use of techniques from machine learning, a field of computer science in which an algorithm “learns” what sorts of output to produce based on data presented to it during a training phase. When the input data explicitly or implicitly encodes for a protected characteristic like gender or race, though, the resulting algorithm runs the risk of being biased against certain groups, or in the worst case “redlining” them. 1 Even worse, people may assume the results are fair because algorithms are seen as a neutral arbiter–after all, how can a computer discriminate if it doesn’t have things like social prejudices? In reality though the algorithm is only as fair as the data fed into it.

But even when big data algorithms manage to be perfectly fair, the danger of discrimination remains due to the very digital nature of big data. Many groups are under-represented in today’s digital world (especially the elderly, minorities, and the poor). These groups run the risk of being disadvantaged if community resources are allocated based on big data, since there may not be any data about them in the first place. We see an example of this in Boston, which had a pilot program to allow residents to report potholes through a mobile app but soon recognized that the program was inherently flawed because “wealthy people were far more likely to own smart phones and to use the Street Bump app. Where they drove, potholes were found; where they didn’t travel, potholes went unnoted.”

Obviously this sort of discrimination and unfairness can have a huge effect on people’s lives, resulting in everything from unfair pricing based on economic class to limiting people’s credit, housing, education, or employment opportunities. We’re glad the President’s commission recommended that the Justice Department, the Federal Trade Commission, the Consumer Financial Protection Bureau, and the Equal Employment Opportunity Commission take proactive steps to make sure this sort of big data discrimination doesn’t become common. In particular, these agencies should look to scrutinize consumer experiences that might be ripe for discrimination based on big data analytics (such as digital advertising), and to encourage transparency by companies to help users understand how and when big data influences their experience within the marketplace.

Non-US Persons Deserve Privacy Too

As we’ve said before, if our nation truly values privacy and civil liberties in a connected digital world, then we should extend the privacy protections we grant to citizens to all people. The authors of the report agree, recommending that the Privacy Act be extended to all people, not just US persons.

What Could Have Been Stronger Metadata Matters

As we’ve explained before, metadata (the details associated with your communications, content, or actions, like who you called, or what a file you uploaded file is named, or where you were when you visited a particular website) can expose just as much information about you as the “regular” data it is associated with, so it deserves the same sort of privacy protections as “regular” data.

Unfortunately the report claimed–without citation–that this is an issue on which experts are divided. We disagree: the overwhelming weight of experts in technology recognize how invasive metadata can be. The report merely recommended that the government look into the issue.

In contrast, several other government reports have taken a much stronger stance and explicitly stated that metadata deserves the same level of privacy protections as “regular” data. This includes the Privacy and Civil Liberties Oversight Board: “Telephone calling records, especially when assembled in bulk, clearly implicate privacy interests as a matter of public policy”; thePresident’s Review Group on Intelligence and Communications Technologies: “In a world of ever more complex technology, it is increasingly unclear whether the distinction between ‘meta-data’ and other information carries much weight”; and even the parallel report by the President’s Council of Advisors for Science & Technology (PCAST): “There is no reason to believe that metadata raise fewer privacy concerns than the data they describe.”

We think the report should have followed the lead of the PCAST report and acknowledged that the distinction between data and metadata is an artificial one, and recommended the appropriate reforms.

What the Public Knows About Data Brokers

As one of their recommendations, the White House suggested advancing the Consumer Privacy Bill of Rights, which includes the idea that “consumers have a right to exercise control over what personal data companies collect from them and how they use it,” as well as “a right to access and correct personal data.” It barely mentioned, however, one of the key reasons a Consumer Privacy Bill of Rights is so important: namely the tremendous disparity in knowledge between consumers and the companies who collect and analyze data about them. As we mentioned in our comments to the White House, “The vast majority of information data brokers use…is data which consumers unintentionally expose in large part because they simply do not know how or when they are being tracked, or what information is being collected.” Additionally, consumers “frequently believe wrongly that the law or a company’s privacy policies block certain uses of that data or its dissemination.” This informational asymmetry puts consumers at a huge disadvantage, and the only way to correct it is through transparency–which the report rightly calls for. However, the report glossed over this issue and failed to articulate why greater transparency around the entire data broker industry is necessary.

Where the Recommendations Fell Short Data Breach Reform May Undermine Existing State-Level Consumer Protections

Consumers have a right to know when their data is exposed, whether through corporate misconduct, malicious hackers, or under other circumstances. Recognizing this important consumer safeguard, the report recommends that Congress “should pass legislation that provides a single national data breach standard along the lines of the Administration’s May 2011 Cybersecurity legislative proposal.”

While at first blush this may seem like a powerful consumer protection, we don’t think that proposal is as strong as existing California law. The proposed federal data breach notification scheme would preempt state notification laws, removing the strong California standard and replacing it with a weaker standard.

We strongly support universal data breach notification, but any such proposal should not become a backdoor for weakening the transparency. We’re also wary of engaging in a negotiation in Capitol Hill on this topic, since too often powerful corporate interests will trump the best interests of everyday users in the lawmaking process.

Likening Blowing the Whistle to Violent Crimes

We were particularly disconcerted by a section in the report on “insider threats” that compared the acts of WikiLeaks whistleblower Chelsea Manning and former NSA contractor Edward Snowden to a US Army officer who went on a shooting spree, killing 13 people at Fort Hood:

It was the latest in a string of troubling breaches and acts of violence by insiders who held security clearances, including Chelsea Manning’s disclosures to WikiLeaks, the Fort Hood shooting by Major Nidal Hasan, and the most serious breach in the history of U.S. intelligence, the release of classified National Security Agency documents by Edward Snowden.

The report goes on to note that the Obama Administration has “released a review of suitability and security practices which called for expanding continuous evaluation capabilities across the federal government.” The report then notes the privacy concerns of the employees undergoing these “expanding continuous evaluation” programs.

We’ve got two big concerns with this. First, whistleblowers are simply not comparable to an Army officer who massacres his fellow soldiers in cold blood. It’s inappropriate for the Administration to be designing “insider threat” policies that fail to differentiate between the two. Secondly, the real big-data issue at play here is overclassification of enormous quantities of data. Over 1.4 million people hold top-secret security clearances. In 2012, the governmentclassified 95 million documents. And by some estimates, the government controls more classified information than there is in the entire Library of Congress.

The government should use the Manning and Snowden leaks as a wake-up call to reform a broken classification system and a FOIA process that leaves the public in the dark more often than not. Instead of embracing reform and new levels of transparency, the government is looking to clamp down on future whistleblowers.

Treating Data Collection as a Given

The report argues that in today’s connected world it’s impossible for consumers to keep up with all the data streams they generate (intentionally or not), so the existing “notice and consent” framework (in which companies must notify and get a user’s consent before collecting data) is obsolete. Instead, they suggest that more attention should be paid to how data is used, rather than how it is collected.

An unfortunate premise of this argument is that automatic collection of data is a given, regardless of whether it be by websites tracking your browsing history, apps tracking your location, appliances in your kitchen collecting information on your eating habits, or even police departments using automated license plate readers to make every car on the road a target of an investigation. We don’t think this sort of automatic collection of data should be a forgone conclusion.

While we agree that putting more emphasis on responsible use of big data is important, doing so should not completely replace the notice and consent framework. Instead we think companies should do a better job of explaining why they need the data collect, and also give consumerseven more granularity when it comes to opting in or out. (For example, you could tell advertisers you’re willing to be tracked on shopping sites but not on news sites or blogs, so they can tailor their ads to products you’re interested in without exposing your tastes in politics, religion, or other personal topics.)

It’s worth noting that much of the data driving the big data economy is “found” data: data generated incidental to the use of products and services by consumers. As we mentioned in our comments to the White House, we believe that if companies do not take action to give users more choice when it comes to what type of data is collected (if at all), consumers will begin to use tools and products that “leak” less private data to third-parties. Already tools such as Torexist to prevent consumers from “leaking” data to their ISPs about the websites they visit;XPrivacy exists to stop apps from gathering unnecessary private information from Android smartphones; and Privacy Badger allows people to automatically block third-party trackers on the Internet that don’t respect Do Not Track. If enough consumers started using these sorts of technologies then the big data economy would be forced to innovate around consensual tracking. We would rather companies give users a choice, so that users can choose what sort of tradeoffs to make, rather than relying on companies and the government to strike the balance for them.

Of course, that was just the private sector. The issue of inevitable collection of big data by the government brings us to…

What Was Missing: Big Data and the Surveillance State

Despite being a fairly thorough analysis of the privacy implications of big data, there is one topic that it glaringly omits: the NSA’s use of big data to spy on innocent Americans.

Even though the review that led to this report was announced during President Obama’s speech on NSA reform, and even though respondents to the White House’s Big Data Survey “were most wary of how intelligence and law enforcement agencies are collecting and using data about them,” the report itself is surprisingly silent on the issue. 2 This is especially confusing given how much the report talks about the need for more transparency in the private sector when it comes to big data. Given that this same logic could well be applied to intelligence big data programs, we don’t understand why the report did not address this vital issue.

Although the report may have been silent on government use of big data for intelligence gathering, we won’t be. We believe that the most important action the White House can take regarding big data would be to immediately stop misusing Section 215 of the Patriot Act and Section 702 of the Foreign Intelligence Surveillance Amendments Act and to support statutory reform to end mass collection of information about you. Additionally, surveillance agencies should publicly disclose their mass spying techniques and issue Privacy Impact Assessments that set standards and address whether the agency is meeting them. To quote from our public comments to the White House on big data: “Time after time we’ve seen the witches’ brew of ambiguity and secrecy poison democracy and the rule of law,” and the only antidote for this poison, even in the age of big data, is transparency.

An example of this in the report describes a study which found that “web searches involving black-identifying names (e.g., ‘Jermaine’) were more likely to display ads with the word “arrest” in them than searches with white-identifying names (e.g., ‘Geoffrey’).”

While the respondents to the White House’s survey are admittedly not representative of the entire US population, the fact that an overwhelming majority of respondents voiced concern about the intelligence community’s use of big data indicates we’re not the only ones who care about this.

Reprinted from eff.org by Jeremy Gillula and Kurt Opsahl and Rainey Reitman