Send to

Choose Destination
J Comput Biol. 2017 Feb;24(2):172-182. doi: 10.1089/cmb.2015.0206. Epub 2016 Aug 10.

Identification of Genes Involved in Breast Cancer Metastasis by Integrating Protein-Protein Interaction Information with Expression Data.

Author information

1 Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University , Shanghai, China .
2 Shanghai Key Laboratory of Multidimensional Information Processing, Department of Computer Science and Technology, East China Normal University , Shanghai, China .


The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.


GeneRank algorithm; PPI information; breast cancer metastasis

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center