Format

Send to

Choose Destination
Sci Rep. 2015 Nov 25;5:16923. doi: 10.1038/srep16923.

Efficient Test and Visualization of Multi-Set Intersections.

Author information

1
Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, NY 10029, USA.

Abstract

Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines.

PMID:
26603754
PMCID:
PMC4658477
DOI:
10.1038/srep16923
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center