Format

Send to

Choose Destination
Bioinformatics. 2016 Mar 1;32(5):697-704. doi: 10.1093/bioinformatics/btv635. Epub 2015 Oct 30.

CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules.

Author information

1
Institute of Systems Analysis and Computer Science - National Research Council, 00185, Rome, Italy.
2
Institute of Systems Analysis and Computer Science - National Research Council, 00185, Rome, Italy, Department of Computer, Control, and Management Engineering - Sapienza University, 00185, Rome, Italy and.
3
Institute of Systems Analysis and Computer Science - National Research Council, 00185, Rome, Italy, Department of Engineering - Uninettuno International University, Corso Vittorio Emanuele II, 39 - 00186 Rome, Italy.

Abstract

MOTIVATION:

Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.

RESULTS:

We propose CAMUR, a new method that extracts multiple and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set of the genes present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a querying tool.We analyze three different types of RNA-seq data sets (Breast, Head and Neck, and Stomach Cancer) from The Cancer Genome Atlas (TCGA) and we validate CAMUR and its models also on non-TCGA data. Our experimental results show the efficacy of CAMUR: we obtain several reliable equivalent classification models, from which the most frequent genes, their relationships, and the relation with a particular cancer are deduced.

AVAILABILITY AND IMPLEMENTATION:

dmb.iasi.cnr.it/camur.php

CONTACT:

emanuel@iasi.cnr.it

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
26519501
PMCID:
PMC4795614
DOI:
10.1093/bioinformatics/btv635
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center