Comparing Social media and Google to detect and predict severe epidemics

Sci Rep. 2020 Mar 16;10(1):4747. doi: 10.1038/s41598-020-61686-9.

Abstract

Internet technologies have demonstrated their value for the early detection and prediction of epidemics. In diverse cases, electronic surveillance systems can be created by obtaining and analyzing on-line data, complementing other existing monitoring resources. This paper reports the feasibility of building such a system with search engine and social network data. Concretely, this study aims at gathering evidence on which kind of data source leads to better results. Data have been acquired from the Internet by means of a system which gathered real-time data for 23 weeks. Data on influenza in Greece have been collected from Google and Twitter and they have been compared to influenza data from the official authority of Europe. The data were analyzed by using two models: the ARIMA model computed estimations based on weekly sums and a customized approximate model which uses daily sums. Results indicate that influenza was successfully monitored during the test period. Google data show a high Pearson correlation and a relatively low Mean Absolute Percentage Error (R = 0.933, MAPE = 21.358). Twitter results are slightly better (R = 0.943, MAPE = 18.742). The alternative model is slightly worse than the ARIMA(X) (R = 0.863, MAPE = 22.614), but with a higher mean deviation (abs. mean dev: 5.99% vs 4.74%).

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Disease Outbreaks / prevention & control*
  • Epidemics*
  • Epidemiological Monitoring*
  • Feasibility Studies
  • Forecasting
  • Greece / epidemiology
  • Humans
  • Influenza, Human / epidemiology
  • Internet
  • Models, Statistical
  • Search Engine*
  • Social Media*