Format

Send to

Choose Destination
Interdiscip Sci. 2018 Mar;10(1):169-175. doi: 10.1007/s12539-016-0198-z. Epub 2017 Jan 21.

CAsubtype: An R Package to Identify Gene Sets Predictive of Cancer Subtypes and Clinical Outcomes.

Author information

1
Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 200240, China.
2
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
3
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China.
4
Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, 200240, China. jlsun@sjtu.edu.cn.
5
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China. kaikaixinxin@sjtu.edu.cn.

Abstract

In the past decade, molecular classification of cancer has gained high popularity owing to its high predictive power on clinical outcomes as compared with traditional methods commonly used in clinical practice. In particular, using gene expression profiles, recent studies have successfully identified a number of gene sets for the delineation of cancer subtypes that are associated with distinct prognosis. However, identification of such gene sets remains a laborious task due to the lack of tools with flexibility, integration and ease of use. To reduce the burden, we have developed an R package, CAsubtype, to efficiently identify gene sets predictive of cancer subtypes and clinical outcomes. By integrating more than 13,000 annotated gene sets, CAsubtype provides a comprehensive repertoire of candidates for new cancer subtype identification. For easy data access, CAsubtype further includes the gene expression and clinical data of more than 2000 cancer patients from TCGA. CAsubtype first employs principal component analysis to identify gene sets (from user-provided or package-integrated ones) with robust principal components representing significantly large variation between cancer samples. Based on these principal components, CAsubtype visualizes the sample distribution in low-dimensional space for better understanding of the distinction between samples and classifies samples into subgroups with prevalent clustering algorithms. Finally, CAsubtype performs survival analysis to compare the clinical outcomes between the identified subgroups, assessing their clinical value as potentially novel cancer subtypes. In conclusion, CAsubtype is a flexible and well-integrated tool in the R environment to identify gene sets for cancer subtype identification and clinical outcome prediction. Its simple R commands and comprehensive data sets enable efficient examination of the clinical value of any given gene set, thus facilitating hypothesis generating and testing in biological and clinical studies.

KEYWORDS:

Cancer subtype; Clinical outcome; Gene expression profile; Gene signature; R package

PMID:
28110480
DOI:
10.1007/s12539-016-0198-z
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center