Format

Send to

Choose Destination
Bioinformatics. 2019 Nov 19. pii: btz841. doi: 10.1093/bioinformatics/btz841. [Epub ahead of print]

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.

Author information

1
Universitat Pompeu Fabra (UPF), Barcelona, Spain.
2
Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia.
3
Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia.
4
Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia.
5
Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Doctor Aiguader 88, Barcelona, Spain.
6
Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia.
7
Institute of Science and Technology Austria, Klosterneuburg, Austria.
8
Skolkovo Institute of Science and Technology, Moscow, Russia.

Abstract

MOTIVATION:

Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.

RESULTS:

We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.

AVAILABILITY:

https://github.com/ivankovlab/HypercubeME.git.

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center