Send to

Choose Destination
Mol Biol Evol. 2017 Apr 1;34(4):997-1007. doi: 10.1093/molbev/msw275.

Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks.

Author information

Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, London, United Kingdom.
Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
Communicable Disease Prevention and Control Services, British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada.
School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada.
Department of Mathematics, Imperial College, London, United Kingdom.


Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom-a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from


genomic epidemiology; infectious disease outbreak; transmission analysis

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center