An embedded method for gene identification problems involving unwanted data heterogeneity

Meng Lu

doi:10.1186/s40246-019-0228-0

An embedded method for gene identification problems involving unwanted data heterogeneity

Hum Genomics. 2019 Oct 22;13(Suppl 1):45. doi: 10.1186/s40246-019-0228-0.

Author

Meng Lu¹

Affiliation

¹ Department of Information Management,Tianjin University, Tianjin, China. lvmeng0502@gmail.com.

Abstract

Background: Modern applications such as bioinformatics collecting data in various ways can easily result in heterogeneous data. Traditional variable selection methods assume samples are independent and identically distributed, which however is not suitable for these applications. Some existing statistical models capable of taking care of unwanted variation were developed for gene identification involving heterogeneous data, but they lack model predictability and suffer from variable redundancy.

Results: By accounting for the unwanted heterogeneity effectively, our method have shown its superiority over several state-of-the art methods, which is validated by the experimental results in both unsupervised and supervised gene identification problems. Moreover, we also applied our method to a pan-cancer study where our method can identify the most discriminative genes best distinguishing different cancer types.

Conclusions: This article provides an alternative gene identification method that can accounting for unwanted data heterogeneity. It is a promising method to provide new insights into the complex cancer biology and clues for understanding tumorigenesis and tumor progression.

Keywords: Embedded variable selection; Gene identification; Unwanted heterogeneity.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Databases, Genetic*
Female
Gene Expression Profiling
Gene Expression Regulation
Gene Ontology
Humans
Male
Neoplasms / genetics
ROC Curve