Format

Send to

Choose Destination
Microb Genom. 2018 Jun;4(6). doi: 10.1099/mgen.0.000184. Epub 2018 May 29.

SuperDCA for genome-wide epistasis analysis.

Author information

1
2​Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Finland.
2
1​Department of Computer Science, Aalto University, FI-00076 Espoo, Finland.
3
3​Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK.
4
4​Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, UK.
5
5​Department of Biostatistics, University of Oslo, 0317 Oslo, Norway.

Abstract

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 104-105 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 105 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.

KEYWORDS:

epistasis; linkage disequilibrium; population genomics

PMID:
29813016
PMCID:
PMC6096938
DOI:
10.1099/mgen.0.000184
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Ingenta plc Icon for PubMed Central
Loading ...
Support Center