Format

Send to

Choose Destination
Bioinformatics. 2017 Sep 1;33(17):2631-2641. doi: 10.1093/bioinformatics/btx294.

EBT: a statistic test identifying moderate size of significant features with balanced power and precision for genome-wide rate comparisons.

Author information

1
Department of Cell Biology and Genetics, School of Basic Medical Sciences, Shenzhen University Health Science Center, Shenzhen 518060, China.
2
Epigenomics and Computational Biology Lab, Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24060, USA.
3
Department of Critical Care Unit, Peking University Third Hospital, Beijing 100191, China.

Abstract

Motivation:

In genome-wide rate comparison studies, there is a big challenge for effective identification of an appropriate number of significant features objectively, since traditional statistical comparisons without multi-testing correction can generate a large number of false positives while multi-testing correction tremendously decreases the statistic power.

Results:

In this study, we proposed a new exact test based on the translation of rate comparison to two binomial distributions. With modeling and real datasets, the exact binomial test (EBT) showed an advantage in balancing the statistical precision and power, by providing an appropriate size of significant features for further studies. Both correlation analysis and bootstrapping tests demonstrated that EBT is as robust as the typical rate-comparison methods, e.g. χ 2 test, Fisher's exact test and Binomial test. Performance comparison among machine learning models with features identified by different statistical tests further demonstrated the advantage of EBT. The new test was also applied to analyze the genome-wide somatic gene mutation rate difference between lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), two main lung cancer subtypes and a list of new markers were identified that could be lineage-specifically associated with carcinogenesis of LUAD and LUSC, respectively. Interestingly, three cilia genes were found selectively with high mutation rates in LUSC, possibly implying the importance of cilia dysfunction in the carcinogenesis.

Availability and implementation:

An R package implementing EBT could be downloaded from the website freely: http://www.szu-bioinf.org/EBT .

Contact:

wangyj@szu.edu.cn.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
28472273
DOI:
10.1093/bioinformatics/btx294
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center