Format

Send to

Choose Destination
BMC Bioinformatics. 2010 Jun 2;11:297. doi: 10.1186/1471-2105-11-297.

A classification model for distinguishing copy number variants from cancer-related alterations.

Author information

1
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA. ostrovni@mskcc.org

Abstract

BACKGROUND:

Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (DGV) contains a list of all known CNVs, there is no standard methodology to use the database effectively.

RESULTS:

We develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the DGV and several other variables, including segment's length, height, closeness to a telomere or centromere and occurrence in other patients. The models are fitted on data from glioblastoma and their corresponding normal samples that were collected as part of The Cancer Genome Atlas project and hybridized to Agilent 244 K arrays.

CONCLUSIONS:

Using the DGV alone CNVs in the test set can be correctly identified with about 85% accuracy if the outliers are removed before segmentation and with 72% accuracy if the outliers are included, and additional variables improve the prediction by about 2-3% and 12%, respectively. Final models applied to data from ovarian tumors have about 90% accuracy with all the variables and 86% accuracy with the DGV alone.

PMID:
20525196
PMCID:
PMC2897829
DOI:
10.1186/1471-2105-11-297
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center