Format

Send to

Choose Destination
Int J Bioinform Res Appl. 2014;10(4-5):461-78. doi: 10.1504/IJBRA.2014.062995.

Mapping genomic features to functional traits through microbial whole genome sequences.

Author information

1
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA.
2
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA; Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA.
3
Department of Biological Science, University of Notre Dame, Notre Dame, IN 46556, USA.

Abstract

Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

KEYWORDS:

bacteria genomes; bioinformatics; feature mapping; feature selection; functional genomics; functional traits; genes; genome sequences; genomic features; genomic signatures; machine learning; microbial diversity; phenotype–genotype association; sporulation

PMID:
24989863
DOI:
10.1504/IJBRA.2014.062995
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center