- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Bioinformatics
- PMC2852218

# CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data

^{1,}

^{*}Li Ding,

^{2}David E. Larson,

^{2}Daniel C. Koboldt,

^{2}Michael D. McLellan,

^{2}Ken Chen,

^{2}Xiaoqi Shi,

^{2}Aldi Kraja,

^{1}Elaine R. Mardis,

^{2}Richard K. Wilson,

^{2}Ingrid B. Borecki,

^{1}and Michael A. Province

^{1}

^{1}Division of Statistical Genomics and

^{2}The Genome Center, Washington University School of Medicine, St Louis, MO, USA

## Abstract

**Motivation:** DNA copy number aberration (CNA) is a hallmark of genomic abnormality in tumor cells. Recurrent CNA (RCNA) occurs in multiple cancer samples across the same chromosomal region and has greater implication in tumorigenesis. Current commonly used methods for RCNA identification require CNA calling for individual samples before cross-sample analysis. This two-step strategy may result in a heavy computational burden, as well as a loss of the overall statistical power due to segmentation and discretization of individual sample's data. We propose a population-based approach for RCNA detection with no need of single-sample analysis, which is statistically powerful, computationally efficient and particularly suitable for high-resolution and large-population studies.

**Results:** Our approach, correlation matrix diagonal segmentation (CMDS), identifies RCNAs based on a between-chromosomal-site correlation analysis. Directly using the raw intensity ratio data from all samples and adopting a diagonal transformation strategy, CMDS substantially reduces computational burden and can obtain results very quickly from large datasets. Our simulation indicates that the statistical power of CMDS is higher than that of single-sample CNA calling based two-step approaches. We applied CMDS to two real datasets of lung cancer and brain cancer from Affymetrix and Illumina array platforms, respectively, and successfully identified known regions of CNA associated with *EGFR*, *KRAS* and other important oncogenes. CMDS provides a fast, powerful and easily implemented tool for the RCNA analysis of large-scale data from cancer genomes.

**Availability:** The R and C programs implementing our method are available at https://dsgweb.wustl.edu/qunyuan/software/cmds.

**Contact:** ude.ltsuw@nauynuq

**Supplementary information:** Supplementary data are available at *Bioinformatics* online.

## 1 INTRODUCTION

DNA copy number aberration (CNA) is a significant phenomenon of amplification and deletion of chromosomal regions in tumor cells. Identification of CNAs may provide important insight into the molecular mechanism of oncogenesis, as well as useful information for the diagnosis and treatment of cancers. A recurrent CNA (RCNA) is a CNA that occurs in multiple patients across the same chromosomal region. In contrast to non-recurrent CNAs, RCNAs are often identified as more significant events with greater implication in tumorigenesis.

In recent years, array-based technologies, such as array comparative genomic hybridization (CGH), single nucleotide polymorphism (SNP) arrays and copy number (CN) arrays, have facilitated the genome-wide studies of CNA. Numerous mathematical methods for CNA identification have been developed (Beroukhim *et al.*, 2007; Diskin *et al.*, 2006; Guttman *et al.*, 2007; Hsu *et al.*, 2005; Hupe *et al.*, 2004; Jong *et al.*, 2004; Lai *et al.*, 2005a; Lai *et al.*, 2005b; Lipson *et al.*, 2006; Nilsson *et al.*, 2009; Olshen *et al.*, 2004; Picard *et al.*, 2005; Rouveirol *et al.*, 2006; Rueda *et al.*, 2009; Shah *et al.*, 2006; Shah *et al.*, 2007). These methods can be roughly categorized into two classes: CNA identification for individual samples and RCNA identification across multiple samples. In contrast to the CNA analysis of individual samples, RCNA analysis involving multiple samples is more difficult and its application still remains limited due to the diversity of CNA patterns among individuals and the computational burden resulting from the high density of signals and large sample size. Published approaches for RCNA analysis include significance testing for aberrant copy number (STAC) (Diskin *et al.*, 2006), geometic family algorithm (GFA) (Lipson *et al.*, 2006), minimal alteration region (MAR) method (Rouveirol *et al.*, 2006), Multiple Chain Hidden Markov Model (HMM) (Shah *et al.*, 2007), genomic identification of significant targets in cancer (GISTIC) method (Beroukhim *et al.*, 2007), multiple sample analysis (MSA) (Guttman *et al.*, 2007), probabilistic recurrent copy number region method A (pREC-A) (Rueda *et al.*, 2009), etc. Despite the use of different algorithms, most of these approaches adopt a two-step strategy that requires signal discretization (smoothing, binarization, segmentation, HMM modeling, cutoff definition, etc.) for individual samples prior to cross-sample analysis. Although discretization provides a useful CNA pattern for individuals, it may result in loss of the raw distribution information during the conversion of continuous signals into discretized data. As a result, the overall statistical power of RCNA detection may be diminished. Furthermore, individual sample analysis, along with cross-sample analysis, may produce a heavy computational burden that could impede the application of these methods in genome-wide studies with high resolution and large sample size. Especially in the near future, the next generation sequencing data from large populations will present more challenges to these methods.

In this article, we propose a computationally efficient and statistically powerful approach, correlation matrix diagonal segmentation (CMDS), which directly analyzes the intensity ratio data to identify RCNAs across a large set of samples, with no prior need of single-sample analysis. First, we introduce the rationale behind CMDS and the procedure for its implementation. Next, we investigate the statistical power of CMDS under a variety of configurations via simulation, and compare CMDS with single-sample analysis-based methods in terms of power and computational time. Finally, we demonstrate the applicability and efficiency of CMDS by analyzing two real datasets of lung cancer and brain cancer from Affymetrix and Illumina high-resolution array platforms.

## 2 METHODS

### 2.1 Proposed method

When a RCNA exists in the same chromosomal region across individuals (Fig. 1A), it produces CN co-variation between neighboring chromosomal sites within the region. As a result, a correlation block can be observed in the CN correlation matrix of chromosomal sites (Fig. 1B). Therefore, RCNAs can be identified by detecting the corresponding correlation blocks along the diagonal of the correlation matrix. Based upon this rationale, we propose the CMDS approach, described below and as illustrated in Figure 1.

**A**) Visualization of CN values of 500 chromosomal sites and 100 samples. White indicates amplification (CN > 2). (

**B**) Visualization of the CN correlation matrix of 500 sites. The white block corresponds to a RCNA

**...**

#### 2.1.1 Input data

As input CMDS takes the CN data of *m* physically ordered chromosomal sites of *n* individual samples, denoted by a *n* × *m* matrix (** X**), in which the element

*x*

_{ij}is the CN value of individual

*i*at chromosomal site

*j*. Usually,

*x*

_{ij}is measured by the intensity ratio (or log

_{2}ratio) between a target sample and reference sample(

*s*).

#### 2.1.2 CN correlation matrix

The first step of CMDS is to calculate Pearson's correlation coefficients between chromosomal sites based on the CN matrix ** X** via the formula:

where *r*_{jk} is the correlation coefficient between sites *j* and *k*; , *s*_{j} and *s*_{k} are CN means and SDs of sites *j* and *k* across *n* individuals. For the convenience of probability calculation, *r*_{jk} is normalized by Fisher's transformation

The *z*_{jk} values of all possible pairs of *m* sites compose a diagonally symmetric *m* × *m* correlation matrix (** Z**), which is the basis of the CMDS analysis. The statistical scenario underlying CMDS is, if no CNA exists across samples (i.e. the variation in

**is resulted only from random experimental noises), the expected**

*X**r*

_{jk}should be 0 (i.e. no correlation between sites) and

**will randomly follow the standard normal distribution (SND). In contrast, if CNA recurs at a chromosomal region across samples, it will cause positive correlation (i.e.**

*Z**r*

_{jk}>0) between the sites within the region. As a result, the expected

*z*

_{jk}values corresponding to the region will be higher than others and result in a square block along the diagonal of

**(Fig. 1B). Thus, RCNA regions can be detected by searching square blocks along the diagonal of**

*Z***.**

*Z*#### 2.1.3 Diagonal transformation

Usually, identification of square blocks in a matrix can be treated as an image recognition problem. However, the size of ** Z** in genome-wide studies could be too large for direct image processing, and image recognition does not provide probability measures for significance testing of such correlation blocks. Since we are only interested in the blocks along the diagonal, it is unnecessary to search the entire matrix. We therefore propose a computationally efficient and easily implemented approach that performs a diagonal transformation (DT) to convert the 2D problem into a 1D problem. Along the diagonal of the correlation matrix

**, DT calculates a RCNA score (**

*Z**R*

_{h}) for each chromosomal site

*h*based on the correlation values of

*b*+1 sites surrounding (and including) the site

*h*:

In the DT formula above, the average of the *z*_{jk} values of *b*+1 sites around the site *h* (*b*/2 left sites, *b*/2 right sites, plus the site *h per se*) are used to measure the degree of RCNA of the site *h*, which is equivalent to sliding a (*b*+1) × (*b*+1) square block along the diagonal of ** Z** and averaging the

*z*

_{jk}values within each block (excluding self correlation values, i.e.

*j*≠

*k*). Since DT only needs to calculate the

*z*

_{jk}values within small blocks (not all the elements in

**), it is very low cost in terms of both computer time and memory.**

*Z*#### 2.1.4 Significance test of the RCNA score

Under the null hypothesis that there is no CNA (i.e. no correlation between chromosomal sites), the RCNA score *R*_{h} will randomly follow a normal distribution with a mean of μ_{R}=0 and a variance of

In real data analysis, μ_{R} and σ_{R}^{2} may deviate from their theoretical values due to artifactual correlations between samples and/or adjacent chromosomal positions, which usually are the results of experimental noise, batch effect, genomic wave effect (Diskin *et al.*, 2008; Marioni *et al.*, 2007), non-recurrent CNA and/or CNA of a whole chromosome or chromosomal arm in some individuals. Therefore, we propose to use the observed mean and variance of *R*_{h}, and , to replace μ_{R} and σ_{R}^{2}, and then test if the *R*_{h} value is different from . The final statistic for the RCNA test of site *h* is:

here *t*_{h} is the standardized *R*_{h} value and follows Student's *t* distribution. Since the *df* of *t*_{h} is large in genome-wide analysis and the *t* distribution is very close to the SND, *P*-values can be obtained from SND. We choose right-tailed probability for this test, because theoretically there is no true negative correlation in ** Z** and the expected

*t*

_{h}value can only be ≥0.

We have investigated the distribution of *R*_{h} values and found that the standardization step is important for real data analysis. Direct testing based on unstandardized *R*_{h} values may lead to a substancial increase of false positives (Supplementary Fig. 1).

### 2.2 Datasets

Both simulated and real datasets were used in this study.

#### 2.2.1 Simulated data

Multiple datasets were simulated for statistical power analysis. Given a set of chromosomal sites and a CN value *c*, we simulated log_{2} intensity ratios based on a normal distribution *N*(μ_{c}, σ^{2}), where μ_{c}=log_{2}(*c*/2) and σ=0.509 (this is an empirical estimate from lung cancer data described later). The length of the CNA region for individual samples was simulated based on a Poisson distribution with mean=3*t*, where *t* is the length of RCNA region (i.e. the overlapping region across individuals). To make the simulation more realistic, we randomly selected 5% of samples from each simulated set and added one non-recurrent CNA region to each sample. The lengths and positions of the non-recurrent regions were generated using the Poisson distribution and uniform distribution, respectively.

#### 2.2.2 Lung cancer dataset

DNA CN data (tumor/normal intensity ratios) for 371 lung cancer (adenocarcinoma) patients were used. This dataset was generated from the Affymetrix Human Mapping 250K STY SNP Array platform in the Tumor Sequencing Project (TSP) and is publicly available at http://www.broad.mit.edu/cancer/pub/tsp/. More details can be found elsewhere (Weir *et al.*, 2007).

#### 2.2.3 Brain cancer dataset

DNA CN data (tumor/normal intensity ratios) for 213 brain cancer (glioblastoma) patients was also used. This dataset was generated from the Illumina Infinium 550K BeadChip platform in The Cancer Genome Atlas (TCGA) project and is publicly available at http://cancergenome.nih.gov. More details can be found elsewhere (The TCGA Research Network, 2008).

### 2.3 Methods for comparison

In order to evaluate the performance of CMDS, we compared it with a typical single-sample calling based RCNA testing approach, STAC (Diskin *et al.*, 2006) and a HMM-based RCNA analysis method, pREC-A (Rueda *et al.*, 2009). Since STAC requires input data in the form of a binary matrix that indicates whether the CN of each chromosomal site for each sample is normal or aberrant (i.e. CNA status), we first segmented the data for individual samples using two different methods, adaptive weights smoothing (AWS) (Hupe *et al.*, 2004) and circular binary segmentation (CBS) (Olshen *et al.*, 2004); then we defined different matrices of binary CNA status using three cutoffs (CN > 2.25, 2.50 and 2.75) and performed cross-sample RCNA tests using STAC. According to the two different methods used in the first step for single-sample analysis, we refer to the STAC analyzes as two different methods, AWS-STAC and CBS-STAC. Statistical power and computational time were compared for CMDS, AWS-STAC, CBS-STAC and pREC-A using simulated data.

## 3 RESULTS

### 3.1 Statistical power of CMDS

The statistical power of CMDS depends on multiple factors, including the block size (*b*) chosen for DT, sample size (*n*), frequency of RCNA in population (*f*), total number of chromosomal sites (*m*) to be tested, length (*t*) and CN (*c*) of the RCNA region, etc. To understand how these factors affect the power, we investigated simulations under a variety of configurations.

Given a set of *n*, *f*, *c*, *m*, *t* and *b*, we simulated log_{2} intensity ratios and performed the CMDS analysis on the simulated data. For each configuration, false positive rate (FPR) and true positive rate (TPR) were counted at different *P*-value cutoffs and then averaged over 500 replications of simulation. The FPR versus TPR plots, also called receiver operating characteristic (ROC) curves, are presented in Figure 2.

**...**

Among the factors determining the power of CMDS, *b* is the only parameter that needs to be specified for the CMDS analysis. Our simulations show that the optimum of *b* for CMDS is around 20 (Fig. 2A). Suboptimal values for *b* will deteriorate the power, because smaller values will produce more noise and larger values will lose sensitivity in the estimation of RCNA score. It is clear from Figure 2B–F that the power of CMDS increases with factors *n*, *f*, *m*, *t* and the CN amplitude of RCNA region (i.e. the absolute value of *c*-2). For instance, for detection of a RCNA region with *c*=3, *f*=0.1, *t*=30 and *m*=1000, the power of CMDS (with *b*=20 and α level=0.01) will increase from ~50% to >90%, if the sample size *n* is quadrupled from 25 to 100.

### 3.2 CMDS versus other approaches

#### 3.2.1 Power

Through 500 replications of simulation under a configuration of *m*=1000, *t*=30, *n*=50, *f*=0.1 and *c*=3 or 4 (3 and 4 are randomly chosen for individual samples with CNAs), we compared the statistical power of CMDS against two single-sample calling based two-step approaches, AWS-STAC and CBS-STAC, and a HMM-based approach, pREC-A. Since different cutoffs (CN >2.25, 2.50 and 2.75) for both AWS and CBS produce minor differences of power in the final analysis of STAC, we only present here the STAC results of applying the cutoff CN > 2.5. The ROC curves (Fig. 3) indicate that CMDS is more powerful than AWS-STAC and pREC-A. Although CBS-STAC shows the highest power when FPR is relatively high, CMDS outperforms it when FPR is less than about 0.002. In the analysis of real high-resolution data, since a FPR level <0.002 is usually required due to the problem of multiple testing and the need to control for false discovery rate (FDR), CMDS will be more powerful in practice.

#### 3.2.2 Execution time

To compare the execution time of CMDS against AWS-STAC, CBS-STAC and pREC-A, we used three existing R packages, RJaCGH 2.0.0, GLAD 1.18 and DNAcopy 1.18 (downloaded from http://www.bioconductor.org) for the pREC-A, AWS and CBS analyzes, respectively, and a Java program STAC1.2 from http://cbil.upenn.edu/STAC for the STAC concurrence test. To analyze a dataset consisting of 10 000 chromosomal sites and 100 samples, our R version of CMDS takes only 13 s, which is about 0.02, 0.3 and 0.1% of the time used by pREC-A, AWS-STAC and ABS-STAC, respectively. For more details about this comparison of execution time, please refer to Supplementary Table 1.

It should be noted that the processing time for STAC is a rough estimation, because it is sensitive to the cutoff used in the AWS or CBS step (here we choose CN > 2.5. STAC would require more time with the use of a smaller cutoff). In addition, STAC is a permutation-based testing approach. We limited the number of permutations to 10 000 in this comparison. In practical analysis of genome-wide data, since *P*-values need to be estimated with higher precision for the purpose of multiple testing adjustment (e.g. Bonferroni correction or FDR calculation), the required permutation number could be larger and the total time of AWS-STAC or CBS-STAC analysis would increase substantially.

### 3.3 Applications

To evaluate the applicability of CMDS, we applied it to the TSP lung cancer data and the TCGA brain cancer data. With an option of b=30 and using a cutoff of FDR < 0.05, CMDS identifies 1406 significant sites (39 chromosomal regions) in lung cancer (Supplementary Table 2) and 3792 significant sites (37 chromosomal regions) in brain cancer (Supplementary Table 3). The most significant regions include 7p11.2, 11q13.3, 12p12.1, 12q15, 14q13.3 for lung adenocarcinoma (Fig. 4) and 1p33, 1p36, 4q12, 7p11.2, 12q14.1, 12q15 for glioblastoma (Fig. 5). These regions are important RCNA regions that have been previously reported and/or validated and encompass (or closely adjacent to) tumor-related genes, such as *EGFR*, *CCND1*, *KRAS*, *MDM2*, *TITF1*, *CDK4*, *PDGFRA*, *CDKN2C* (Weir *et al.*, 2007; The TCGA Research Network, 2008). These results indicate that our approach is applicable to real data.

_{10}(

*p*)-values from the CMDS analysis of the TSP lung cancer data, demonstrating five significant RCNA regions, 7p11.2, 11q13.3, 12p12.1, 12q15 and 14q13.3, where five known cancer genes,

*EGFR*,

*CCND1*,

*KRAS*,

*MDM2*and

*TITF1*, are located.

_{10}(

*p*)-values from the CMDS analysis of the TCGA brain cancer data, demonstrating seven significant RCNA regions, 1p33, 1p36, 1q32.1, 4q12, 7p11.2, 12q14.1, 12q15, where six known cancer genes,

*CDKN2C*,

*MDM4*,

*PDGFRA*,

*EGFR*,

*CDK4*,

*MDM2*, are located.

**...**

Based on the CMDS analysis, we performed direct comparison of the significant CNAs in lung adenocarcinoma and glioblastoma. Our analysis revealed some common focal changes, such as the amplification of *EGFR* (7p12), *CDK4* (12q14) and *MDM2* (12q15), as well as the deletion of CDKN2A in both cancer types. This suggests that those molecules are key players in important pathways during the carcinogenesis of both tumor types. Furthermore, treatments targeting these molecules may be potentially useful for treating multiple cancer types. On the other hand, several focal events are found exclusively in either lung adenocarcinoma or glioblastoma. For example, *KRAS* (12p12) and *CCND1* (11q13) amplification occurs strictly in lung adenocarcinoma, indicating its specific relevance to smoking. In addition, *TITF1* (14q13) amplification is also specifically identified in lung adenocarcinoma and this is consistent with the high level expression of *TITF1* in lung and its developmental role in lung epithelial differentiation. However, *PDGFRA* (4q11–q13), amplified in glioblastoma, shows no sign of CNA in lung adenocarcinoma. With the amount of genomic data for different cancer types increasing rapidly, CMDS will provide a valuable and very efficient tool for this type of cross-comparison.

The analyzes above were performed chromosome by chromosome using our R version of CMDS on a single PC (Dell Optiplex 755 PC, with 3 GHz CPU and 4 GB RAM, under Windows XP and R 2.9.1), which took about 17 and 36 min for analyzing TSP data and TCGA data, respectively, including data file reading, calculations and output of result. Our C version of CMDS on a single LINUX machine took only ~3 and 5 min, respectively. It is also possible to run CMDS on a per-chromosome basis in a parallelized fashion; doing so reduced execution time to a few minutes for the R version and less than a minute for the C version.

## 4 DISCUSSION

Developing CNA analysis methods that are both statistically powerful and computationally efficient is a necessity due to the constant increase in experiment resolution and sample size. We have developed the CMDS approach for identifying RCNAs using genome-wide and population-based data, and demonstrated its power, efficiency and applicability in processing both simulated and real data. In contrast to most existing methods, CMDS directly uses raw CN data and does not require pre-analysis (smoothing and segmentation etc.) of individual samples. To avoid calculating a huge matrix, CMDS adopts a quick DT method to construct a RCNA score, the significance of which can be quickly tested based on a normal distribution. These features make CMDS a more efficient and powerful approach. It provides a fast and easily implementable tool for RCNA analysis and is especially well-suited for analysis of emerging genome-wide studies with larger sample size and higher data resolution.

CMDS reports significant RCNA regions at a population level; however, there is sometimes a need to obtain CNA calls from individual samples for downstream analyzes such as a correlation test between CNA and clinical features. In such a case, single-sample calls (SSC) can be obtained for the significant RCNA regions identified by CMDS, using a threshold or statistical test (e.g. *t*-test or permutation test on CN means of candidate regions). Thus, the information from multiple samples can be used for SSC, and computational time can be substantially reduced by excluding regions with no significant RCNA. Of course, similar to Guttman's conserved genomic aberration detection through the MSA method (Guttman *et al.*, 2007), this multiple sample-based strategy for SSC may lose power for identifying non-recurrent or broad CNA regions.

An important feature of CMDS is that correlations between chromosomal positions are used to measure the CNA concordance in a population. In a recent study (McCarroll *et al.*, 2008), a similar correlation-based approach was used to investigate common copy number variants (CNVs) in the HapMap human populations. This suggests that although CMDS was originally developed for CNA analysis of cancer populations, it may be applicable to data from normal populations. Our preliminary studies indicate that CMDS can be applied not only to array data but also to whole-genome sequencing data with extremely high resolution, to quickly identify common CNV regions in normal populations (Supplementary Fig. 2). Of course, since CNVs in normal populations differ from CNAs in cancer populations in terms of frequency, amplitude, length, etc., further investigation on the statistical property of CMDS for CNV analysis is warranted.

## ACKNOWLEDGEMENTS

We wish to thank the TSP and TCGA research networks for providing the real datasets, and thank John Osborne for downloading and organizing the data. We also thank Dr Feng Gao and Ms Ling Lin for careful testing of the program and constructive feedback.

*Funding*: National Human Genome Research Institute grants (to R.K.W.).

*Conflict of Interest*: None declared.

## REFERENCES

- Beroukhim R, et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA. 2007;104:20007–20012. [PMC free article] [PubMed]
- Diskin SJ, et al. STAC: a method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Res. 2006;16:1149–1158. [PMC free article] [PubMed]
- Diskin SJ, et al. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucleic Acids Res. 2008;36:e126. [PMC free article] [PubMed]
- Guttman M, et al. Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays. PLoS Genet. 2007;3:e143. [PMC free article] [PubMed]
- Hsu L, et al. Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics. 2005;6:211–226. [PubMed]
- Hupe P, et al. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics. 2004;20:3413–3422. [PubMed]
- Jong K, et al. Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics. 2004;20:3636–3637. [PubMed]
- Lai WR, et al. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005a;21:3763–3770. [PMC free article] [PubMed]
- Lai Y, et al. A statistical method to detect chromosomal regions with DNA copy number alterations using SNP-array-based CGH data. Comput. Biol. Chem. 2005b;29:47–54. [PubMed]
- Lipson D, et al. Efficient calculation of interval scores for DNA copy number data analysis. J. Comput. Biol. 2006;13:215–228. [PubMed]
- Marioni JC, et al. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 2007;8:R228. [PMC free article] [PubMed]
- McCarroll SA, et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 2008;40:1166–1174. [PubMed]
- Nilsson B, et al. Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution. Bioinformatics. 2009;25:1078–1079. [PubMed]
- Olshen AB, et al. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5:557–572. [PubMed]
- Picard F, et al. A statistical approach for array CGH data analysis. BMC Bioinformatics. 2005;6:27. [PMC free article] [PubMed]
- Rouveirol C, et al. Computation of recurrent minimal genomic alterations from array-CGH data. Bioinformatics. 2006;22:849–856. [PubMed]
- Rueda OM, et al. Detection of recurrent copy number alterations in the genome: taking among-subject heterogeneity seriously. BMC Bioinformatics. 2009;10:308. [PMC free article] [PubMed]
- Shah SP, et al. Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics. 2006;22:e431–e439. [PubMed]
- Shah SP, et al. Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics. 2007;23:i450–i458. [PubMed]
- The TCGA Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. [PMC free article] [PubMed]
- Weir BA, et al. Characterizing the cancer genome in lung adenocarcinoma. Nature. 2007;450:893–898. [PMC free article] [PubMed]

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (441K)

- Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays.[PLoS Genet. 2007]
*Guttman M, Mies C, Dudycz-Sulicz K, Diskin SJ, Baldwin DA, Stoeckert CJ Jr, Grant GR.**PLoS Genet. 2007 Aug; 3(8):e143.* - CNAnova: a new approach for finding recurrent copy number abnormalities in cancer SNP microarray data.[Bioinformatics. 2010]
*Ivakhno S, Tavaré S.**Bioinformatics. 2010 Jun 1; 26(11):1395-402. Epub 2010 Apr 18.* - Genome-wide identification of significant aberrations in cancer genome.[BMC Genomics. 2012]
*Yuan X, Yu G, Hou X, Shih IeM, Clarke R, Zhang J, Hoffman EP, Wang RR, Zhang Z, Wang Y.**BMC Genomics. 2012 Jul 27; 13:342. Epub 2012 Jul 27.* - Array comparative genomic hybridization copy number profiling: a new tool for translational research in solid malignancies.[Semin Radiat Oncol. 2008]
*Costa JL, Meijer G, Ylstra B, Caldas C.**Semin Radiat Oncol. 2008 Apr; 18(2):98-104.* - Integrated high-resolution genome-wide analysis of gene dosage and gene expression in human brain tumors.[Methods Mol Biol. 2007]
*Juric D, Bredel C, Sikic BI, Bredel M.**Methods Mol Biol. 2007; 377:187-202.*

- Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine[Genome Medicine. ]
*Raphael BJ, Dobson JR, Oesper L, Vandin F.**Genome Medicine. 6(1)5* - Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives[BMC Bioinformatics. ]
*Zhao M, Wang Q, Wang Q, Jia P, Zhao Z.**BMC Bioinformatics. 14(Suppl 11)S1* - Multisample aCGH Data Analysis via Total Variation and Spectral Regularization[IEEE/ACM transactions on computational biol...]
*Zhou X, Yang C, Wan X, Zhao H, Yu W.**IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. 2013; 10(1)230-235* - Comparative Analysis of Methods for Identifying Recurrent Copy Number Alterations in Cancer[PLoS ONE. ]
*Yuan X, Zhang J, Zhang S, Yu G, Wang Y.**PLoS ONE. 7(12)e52516* - Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma[Nature genetics. 2012]
*Krauthammer M, Kong Y, Ha BH, Evans P, Bacchiocchi A, McCusker JP, Cheng E, Davis MJ, Goh G, Choi M, Ariyan S, Narayan D, Dutton-Regester K, Capatana A, Holman EC, Bosenberg M, Sznol M, Kluger HM, Brash DE, Stern DF, Materin MA, Lo RS, Mane S, Ma S, Kidd KK, Hayward NK, Lifton RP, Schlessinger J, Boggon TJ, Halaban R.**Nature genetics. 2012 Sep; 44(9)1006-1014*

- CMDS: a population-based method for identifying recurrent DNA copy number aberra...CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution dataBioinformatics. Feb 15, 2010; 26(4)464PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...