• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Bioinformatics. Author manuscript; available in PMC Apr 1, 2009.
Published in final edited form as:
PMCID: PMC2516369
NIHMSID: NIHMS57142

CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms

Summary

Accurate estimation of DNA copy numbers from array comparative genomic hybridization (CGH) data is important for characterizing the cancer genome. An important part of this process is the segmentation of the log-ratios between the sample and control DNA along the chromosome into regions of different copy numbers. However, multiple algorithms are available in the literature for this procedure and the results can vary substantially among these. Thus, a visualization tool that can display the segmented profiles from a number of methods can be helpful to the biologist or the clinician to ascertain that a feature of interest did not arise as an artifact of the algorithm. Such a tool also allows the methodologist to easily contrast his method against others.

We developed a web-based tool that applies a number of popular algorithms to a single array CGH profile entered by the user. It generates a heatmap panel of the segmented profiles for each method as well as a consensus profile. The clickable heatmap can be moved along the chromosome and zoomed in or out. It also displays the time that each algorithm took and provides numerical values of the segmented profiles for download. The web interface calls algorithms written in the statistical language R. We encourage developers of new algorithms to submit their routines to be incorporated into the website.

Availability: http://compbio.med.harvard.edu/CGHweb

Contact: peter_park/at/harvard.edu

1 INTRODUCTION

Array comparative genomic hybridization (CGH) is a technique for genome-wide measurement of the DNA copy number on a microarray (Pinkel et al., 1998). With the availability of high-resolution tiling arrays, variations in the copy number can be captured with an unprecedented accuracy. This technology is most often used to characterize chromosomal instability in the cancer genome, but recent work on 270 individuals from 4 populations (HapMap collection) has found that natural copy number polymorphisms also exist to a much greater extent than expected (Redon et al., 2006).

A successful array CGH experiment requires several components. First, it is important to obtain a homogeneous sample of interest with an appropriate control. Given the large number of copy number polymorphisms, getting a normal sample from the same patient is ideal. For tumors, it is often difficult to ascertain whether a “normal” sample often obtained from a nearby location is truly normal; in this case, DNA from the blood may have to be used. Second, the hybridization experiment must be carried out properly, on arrays of sufficient resolution. For instance, BAC-based arrays may not be sufficient, if the goal is to detect small alterations. The last and frequently the most difficult component is the statistical analysis and interpretation of the resulting data.

2 RESULTS

The main issue in analysis is to segment the sequence of log-ratios along the chromosome into regions of amplification, deletion, or no change. There has been extensive work in this field, with many methods derived from existing techniques in other fields. For instance, the segmentation problem can be reformulated as a change-point problem in statistics (Olshen et al., 2004) or an optimization problem in engineering to be solved by dynamic programming (Autio et al., 2003). Given a wide range of choices, comparative analysis of these methods has been useful for the practitioner who must decide among all the choices (Lai et al., 2005; Willenbrock and Fridlyand, 2005). However, a choice based on such a study does not guarantee that the algorithm being applied is the most appropriate one for a specific data set–it is possible that the feature that the user sees in his data may be an artifact of that particular algorithm.

The web-based tool we developed alleviates this problem. It takes an input profile from the user and applies up to 10 different algorithms (Fig 1). The resulting profiles are returned in a heatmap for easy comparison (Fig 2). The user can then see whether a particular aberration that he is interested in pursuing further has been found by other algorithms as well. Other features of the software include the following: a heatmap display of gain/loss determined by user-defined cut-off; a consensus profile (average of all segmented profiles); a tabular summary of the aberrations found; example data sets from BAC, Agilent, and Nimblegen arrays; a bargraph displaying the time taken by each algorithm; buttons to zoom in/out and move along the chromosome; clickable map that takes the user to the UCSC genome browser for a specific region; and a zipped file containing predicted values at all probes for download. A few web-based interfaces are available for array CGH data, including CAPweb (Liva et al., 2006), ISACGH (Conde et al., 2007), and Asterias (Diaz-Uriarte et al., 2007), but CGHweb provides the most comprehensive list of algorithms and a convenient interface for navigating through the results.

Fig. 1
This front page lists the algorithms along with parameters that can be tuned. Examples from BAC, Agilent, and Nimblegen data can be uploaded easily.
Fig. 2
Results page (zoomed in to a small region). Output from all of the algorithms are shown in a heatmap (top panel); gain and loss are called using a user-specified threshold (bottom panel). Discrepancies among different algorithms can be easily detected. ...

To make this tool a useful resource for the community, our goal is to incorporate as many algorithms as we can. Because it is not possible for any one group to implement all the algorithms, we have defined a function call with a set of arguments (details available on the website) and we encourage developers of new algorithms to create and functions according to this specification. We have chosen the R language because it is most widely used for microarray analysis and wrappers can be written easily for routines in C. Source code is available for those interested in local installation.

3 DISCUSSION AND CONCLUSION

Some analytical issues have not been fully resolved in the literature. For instance, it is well-known from gene expression studies that log-ratios derived from low intensity signals are unreliable and that a local variance correction can ameliorate this problem (Colantuoni et al., 2002). Few CGH algorithms account for this in the segmentation process. Effect of spatial smoothing applied in combination with segmentation also has not been carefully explored. CGHweb, however, leaves it to the user and the algorithm to make any desired transformation of the data. Deriving a consensus profile from multiple samples is an important issue (e.g.,Engler et al. (2006); Diskin et al. (2006)), but that area is less developed and is not addressed here beyond simple pointwise averaging. Recently, an algorithm based on pointwise averaging was shown to have good performance (Beroukhim et al., 2007), suggesting that it may provide a reasonable solution for balancing the importance of amplitude and frequency of alterations.

The CGHweb interface collects results from multiple algorithms and allows developers to submit their new algorithms. This site makes it possible for the user who is not familiar with programming to ascertain a segmentation profile via multiple methods. It also facilitates comparison of a novel method to the existing ones, thus setting a higher standard to which previously untested methods should be measured.

ACKNOWLEDGMENT

This work was funded by NIH through the Cancer Genome Characterization Center grant (1U24 CA126554).

REFERENCES

  • Autio R, et al. CGH-Plotter: MATLAB toolbox for CGH-data analysis. Bioinformatics. 2003;19:1714–1715. [PubMed]
  • Beroukhim R, et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl. Acad. Sci. U.S.A. 2007;104:20007–20012. [PMC free article] [PubMed]
  • Colantuoni C, et al. SNOMAD (Standardization and NOrmalization of MicroArray Data): web-accessible gene expression data analysis. Bioinformatics. 2002;18:1540–1541. [PubMed]
  • Conde L, et al. ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling. Nucleic Acids Res. 2007;35(Web Server issue):81–85. [PMC free article] [PubMed]
  • Diaz-Uriarte R, et al. Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite. Nucleic Acids Res. 2007;35(Web Server issue):75–80. [PMC free article] [PubMed]
  • Diskin SJ, et al. STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments. Genome Res. 2006;16:1149–1158. [PMC free article] [PubMed]
  • Engler DA, et al. A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations. Biostatistics. 2006;7:399–421. [PubMed]
  • Lai WR, et al. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005;21:3763–3770. [PMC free article] [PubMed]
  • Liva S, et al. CAPweb: a bioinformatics CGH array Analysis Platform. Nucleic Acids Res. 2006;34(Web Server issue):477–481. [PMC free article] [PubMed]
  • Olshen AB, et al. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004;5:557–572. [PubMed]
  • Pinkel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–211. [PubMed]
  • Redon R, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. [PMC free article] [PubMed]
  • Willenbrock H, Fridlyand J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics. 2005;21:4084–4091. [PubMed]

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...