Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

Int J Environ Res Public Health. 2020 Jul 15;17(14):5115. doi: 10.3390/ijerph17145115.

Abstract

The contribution of this paper is twofold. First, a new data driven approach for predicting the Covid-19 pandemic dynamics is introduced. The second contribution consists in reporting and discussing the results that were obtained with this approach for the Brazilian states, with predictions starting as of 4 May 2020. As a preliminary study, we first used an Long Short Term Memory for Data Training-SAE (LSTM-SAE) network model. Although this first approach led to somewhat disappointing results, it served as a good baseline for testing other ANN types. Subsequently, in order to identify relevant countries and regions to be used for training ANN models, we conduct a clustering of the world's regions where the pandemic is at an advanced stage. This clustering is based on manually engineered features representing a country's response to the early spread of the pandemic, and the different clusters obtained are used to select the relevant countries for training the models. The final models retained are Modified Auto-Encoder networks, that are trained on these clusters and learn to predict future data for Brazilian states. These predictions are used to estimate important statistics about the disease, such as peaks and number of confirmed cases. Finally, curve fitting is carried out to find the distribution that best fits the outputs of the MAE, and to refine the estimates of the peaks of the pandemic. Predicted numbers reach a total of more than one million infected Brazilians, distributed among the different states, with São Paulo leading with about 150 thousand confirmed cases predicted. The results indicate that the pandemic is still growing in Brazil, with most states peaks of infection estimated in the second half of May 2020. The estimated end of the pandemics (97% of cases reaching an outcome) spread between June and the end of August 2020, depending on the states.

Keywords: Covid-19 pandemic; data-driven; modified auto-encoder; time series prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Betacoronavirus / isolation & purification*
  • Brazil / epidemiology
  • COVID-19
  • Coronavirus Infections / epidemiology*
  • Coronavirus Infections / virology
  • Forecasting
  • Humans
  • Pandemics
  • Pneumonia, Viral / epidemiology*
  • Pneumonia, Viral / virology
  • SARS-CoV-2