Send to

Choose Destination
Bioinformatics. 2019 Nov 19. pii: btz841. doi: 10.1093/bioinformatics/btz841. [Epub ahead of print]

HypercubeME: two hundred million combinatorially complete datasets from a single experiment.

Author information

Universitat Pompeu Fabra (UPF), Barcelona, Spain.
Faculty of Medical Physics, Institute of Biomedical System and Technologies, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia.
Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia.
Department of Innovation and High Technology, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia.
Center for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Doctor Aiguader 88, Barcelona, Spain.
Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia.
Institute of Science and Technology Austria, Klosterneuburg, Austria.
Skolkovo Institute of Science and Technology, Moscow, Russia.



Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.


We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.



Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center