Send to

Choose Destination
Artif Intell Med. 2014 Jul;61(3):153-63. doi: 10.1016/j.artmed.2014.01.002. Epub 2014 Jan 31.

Twitter mining for fine-grained syndromic surveillance.

Author information

Department of Computer Science, University of Rome "Sapienza", via Salaria 113, 00198 Rome, Italy. Electronic address:
Department of Computer Science, University of Rome "Sapienza", via Salaria 113, 00198 Rome, Italy.
Bambino Gesù Children Hospital, Piazza S. Onofrio 4, 00165 Rome, Italy.



Digital traces left on the Internet by web users, if properly aggregated and analyzed, can represent a huge information dataset able to inform syndromic surveillance systems in real time with data collected directly from individuals. Since people use everyday language rather than medical jargon (e.g. runny nose vs. respiratory distress), knowledge of patients' terminology is essential for the mining of health related conversations on social networks.


In this paper we present a methodology for early detection and analysis of epidemics based on mining Twitter messages. In order to reliably trace messages of patients that actually complain of a disease, first, we learn a model of naïve medical language, second, we adopt a symptom-driven, rather than disease-driven, keyword analysis. This approach represents a major innovation compared to previous published work in the field.


We first developed an algorithm to automatically learn a variety of expressions that people use to describe their health conditions, thus improving our ability to detect health-related "concepts" expressed in non-medical terms and, in the end, producing a larger body of evidence. We then implemented a Twitter monitoring instrument to finely analyze the presence and combinations of symptoms in tweets.


We first evaluate the algorithm's performance on an available dataset of diverse medical condition synonyms, then, we assess its utility in a case study of five common syndromes for surveillance purposes. We show that, by exploiting physicians' knowledge on symptoms positively or negatively related to a given disease, as well as the correspondence between patients' "naïve" terminology and medical jargon, not only can we analyze large volumes of Twitter messages related to that disease, but we can also mine micro-blogs with complex queries, performing fine-grained tweets classification (e.g. those reporting influenza-like illness (ILI) symptoms vs. common cold or allergy).


Our approach yields a very high level of correlation with flu trends derived from traditional surveillance systems. Compared with Google Flu, another popular tool based on query search volumes, our method is more flexible and less sensitive to changes in web search behaviors.


Micro-blog mining; Patient's language learning; Syndromic surveillance; Terminology clustering; Twitter mining

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center