A group of scientists have said that Google, Facebook and Twitter could be looking at big data incorrectly after they found errors in the search-giant’s flu predictions.
Researchers from the University of Houston, US, have analysed Google’s Flu Trends tool which aims to predict levels of flu in real-time around the world.
The trending tool looks at search terms from across the globe to estimate flu activity around the world. But the researchers found that Google’s tool overestimated the prevalence of flu during the 2012-13.
They claim that it overestimated the actual levels of flu in 2011-12 by more than 50%, and found that between 2011 and 2013 the trends tool over predicted the prevalence of flu in 100 out of 108 weeks.
The report also questioned the use of data collection from platforms such as Twitter and Facebook. The scientists questioned how easy it is for campaigns and companies to manipulate these platforms to ensure their products are trending, for example in polling trends and market popularity.
“Google Flu Trend is an amazing piece of engineering and a very useful tool, but it also illustrates where ‘big data’ analysis can go wrong,” said Ryan Kennedy, University of Houston political science professor.
He said: “Many sources of ‘big data’ come from private companies, who, just like Google, are constantly changing their service in accordance with their business model.
“We need a better understanding of how this affects the data they produce; otherwise we run the risk of drawing incorrect conclusions and adopting improper policies.”
The flu trends tool is part of Google’s range of trending information that allows users to explore around topics based on the number of searches that are happening around the world.
The Google Flu trends website says: “We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms. Of course, not every person who searches for “flu” is actually sick, but a pattern emerges when all the flu-related search queries are added together.
“We compared our query counts with traditional flu surveillance systems and found that many search queries tend to be popular exactly when flu season is happening.”
Kennedy question the use of big data and said it is important to use many sources, he said: “Our analysis of Google Flu demonstrates that the best results come from combining information and techniques from both sources.
“Instead of talking about a ‘big data revolution’, we should be discussing an ‘all data revolution’, where new technologies and techniques allow us to do more and better analysis of all kinds.”