Format

Send to

Choose Destination
Nucleic Acids Res. 2016 May 5;44(8):e71. doi: 10.1093/nar/gkv1507. Epub 2015 Dec 23.

TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Author information

1
Interuniversity Institute of Bioinformatics in Brussels (IB), Brussels, Belgium Machine Learning Group (MLG), Department d'Informatique, Université libre de Bruxelles (ULB), Brussels, Belgium.
2
Department of Genetics Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil Center for Integrative Systems Biology - CISBi, NAP/USP, Ribeirão Preto, São Paulo, Brazil.
3
Department of Science and Technology, University of Sannio, Benevento, Italy Unlimited Software srl, Naples, Italy.
4
Institute of Molecular Bioimaging and Physiology of the National Research Council (IBFM-CNR), Milan, Italy.
5
Physics for Complex Systems, Department of Physics, University of Turin, Italy.
6
Department of Science and Technology, University of Sannio, Benevento, Italy Bioinformatics Laboratory, BIOGEM, Ariano Irpino, Avellino, Italy.
7
Qatar Computing Research Institute (QCRI), HBKU, Doha, Qatar.
8
Interuniversity Institute of Bioinformatics in Brussels (IB), Brussels, Belgium Machine Learning Group (MLG), Department d'Informatique, Université libre de Bruxelles (ULB), Brussels, Belgium gbonte@ulb.ac.be.
9
Department of Genetics Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, São Paulo, Brazil Center for Integrative Systems Biology - CISBi, NAP/USP, Ribeirão Preto, São Paulo, Brazil houtan@usp.br.

Abstract

The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.

PMID:
26704973
PMCID:
PMC4856967
DOI:
10.1093/nar/gkv1507
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center