Send to

Choose Destination
Epidemics. 2018 Jun;23:1-10. doi: 10.1016/j.epidem.2017.10.001. Epub 2017 Oct 20.

Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

Author information

Department of Infectious Disease Epidemiology and the NIHR HPRU on Modeling Methodology, Imperial College London, United Kingdom. Electronic address:
Department of Mathematics, Imperial College London, United Kingdom.
HIV and STI Department of Public Health England's Centre for Infectious Disease Surveillance and Control, London, United Kingdom.
Department of Infection and Population Health and the NIHR HPRU in Blood Borne and Sexually Transmitted Infections, University College London, United Kingdom.
Li Ka Shing Centre for Health Information and Discovery, Oxford University, United Kingdom.
Department of Infectious Disease Epidemiology and the NIHR HPRU on Modeling Methodology, Imperial College London, United Kingdom.


Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors.


Cluster analysis; Computer simulation; HIV epidemiology; Phylodynamics; Phylogenetic analysis

[Available on 2019-06-01]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center