Privacy groups want Google to ensure that aggregate search data about flu outbreaks cannot be used to re-identify the people who provided the information
Google’s new Flu Trends tool, which collects and analyzes search queries to predict flu outbreaks around the country, is raising concern with privacy groups.
The Electronic Privacy Information Center filed a Freedom of Information Act (FOIA) request asking federal officials to disclose how much user search data the company has recently transmitted to the Centers for Disease Control and Prevention, or CDC, as part of its Google Flu Trends effort .
Concern stems from what privacy groups claim is a disturbing lack of transparency surrounding the method Google is using to predict flu outbreaks. Google has publicly stated that all of the data used is made anonymous and aggregated, but there has been no independent verification of how search queries are used and transformed into data for Google Flu Trends, the privacy groups say.
“What we are basically saying is that if Google has found a way to ensure that aggregate search data cannot be used to re-identify the people who provided the search information, they should be transparent about that technique,” said Marc Rotenberg, Electronic Privacy Information Center’s president.
Rotenberg said the issue is important because the same techniques Google is using to predict flu outbreaks could be applied to tracking other diseases, including those that the urgency to contain the disease could be a whole lot greater with, such as SARS. “Let’s say we have a spike in Detroit of SARS and the police say we want to know who in Detroit submitted those searches. How can Google ensure that this can’t be done? The burden is on Google,” Rotenberg said.
Google Flu Trends was publicly disclosed in November and has been described by the company as Web tool to help individuals and health care professionals obtain influenza-related activity estimates for all U.S. states up to two weeks faster than traditional government disease surveillance systems.
Google said in a blog post introducing Flu Trends last month that search queries such as “flu symptoms” tend to be very common during flu season each year. A comparison of the number of such queries with the actual number of people reporting flu-like symptoms shows a very close relationship, it said. As a result, tallying each day’s flu-related searches in a particular geography allows the company to estimate how many people have a flu-like illness in that region.
In making the announcement, Google noted that it had shared results from Flu Trends with the Epidemiology and Prevention Branch of the Influenza Division at CDC during the last flu season and noticed a strong correlation between its own estimates and CDC’s surveillance data based on actual reported cases. Google said that by making flu estimates available each day, Google Flu Trends could provide epidemiologists with an early-warning system for flu outbreaks.
Rotenberg said the service was potentially useful, but much depended on the kind of search data that Google is collecting and analyzing to make its predictions. Google has said that the database it uses for Flu Trends retains no identity information, IP addresses or any physical user locations. However, what is not clear is whether the company is completely deleting IP addresses, and if so, when it is doing it. Also, he said another issue was whether all Google is doing is anonymizing IP addresses by redacting some of the numbers in an IP string.
Jeffery Chester, executive director of the Center For Digital Democracy, said Google’s growing presence in the healthcare space also makes it important for the company to disclose what kind of data it is collecting and using for Flu Trends.
“Google sees a potential profit center from targeting its vast user base with advertising that is related to health issues,” Chester said. The company’s announcement of Flu Trends in fact shows to pharmaceutical and medical markets precisely the kind of sophisticated analysis the company can do with search data to enable highly targeted medical marketing, he said. “This is about taking the tracking data that Google has at its disposal and focusing it on generating a new profit center for the company,” Chester said.
Pam Dixon, executive director of the World Privacy Forum, echoed similar concerns and questioned whether the anonymization techniques used by Google provided enough of a guarantee that a search term could not be traced back to specific individuals. She pointed to an incident two years ago where AOL inadvertently posted search information on a public Web site. The search queries had supposedly been anonymized by AOL, but it was still relatively easy to track specific search terms back to IP addresses and even individuals in many cases, Dixon said.
Mike Yang, senior product counsel at Google, downplayed privacy concerns related to Flu Trends and insisted that the tool uses no personally identifiable data.
“Flu Trends uses aggregated data from hundreds of millions of searches over time,” Yang said today in an e-mail. “Flu Trends uses aggregations of search query data which contain no information that can identify users personally. We also never reveal how many users are searching for particular queries.”
Yang noted that the data used in Flu Trends comes from Google’s standard search logs. He also referenced an article in the journal Nature , authored by the Google Flu Trends team, which he said explains the methodology behind the tool.