![]() | ![]() |
Formats: |
||||||||||||||||||||||||||||||
Copyright © 2007 Staaf et al; licensee BioMed Central Ltd. Normalization of array-CGH data: influence of copy number imbalances 1Division of Oncology, Department of Clinical Sciences, Lund University, 221 85 Lund, Sweden Corresponding author.Johan Staaf: johan.staaf/at/med.lu.se; Göran Jönsson: goran_b.jonsson/at/med.lu.se; Markus Ringnér: markus.ringner/at/med.lu.se; Johan Vallon-Christersson: johan.vallon-christersson/at/med.lu.se Received June 15, 2007; Accepted October 22, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background High-resolution microarray-based comparative genomic hybridization (CGH) techniques have successfully been applied to study copy number imbalances in a number of settings such as the analysis of cancer genomes. For normalization of array-CGH data, methods initially developed for gene expression microarray analysis have, in general, been directly adopted and used. However, these methods are designed to work under assumptions that may not be valid for array-CGH data when copy number imbalances are present. We therefore sought to investigate the effect on normalization imposed by copy number imbalances. Results Here we demonstrate that copy number imbalances correlate with intensity in array-CGH data thereby causing problems for conventional normalization methods. We propose a strategy to circumvent these problems by taking copy number imbalances into account during normalization, and we test the proposed strategy using several data sets from the analysis of cancer genomes. In addition, we show how the strategy can be applied to conveniently define adaptive sample-specific boundaries between balanced copy number, losses, and gains to facilitate management of variation in tissue heterogeneity when calling copy number changes. Conclusion We highlight the importance of considering copy number imbalances during normalization of array-CGH data, and show how failure to do so can deleteriously affect data and hamper interpretation. Background Microarray-based techniques for genome-wide investigation of copy number aberrations (CNAs) have recently gained much attention. Initially employing arrays developed for gene expression analysis [1], or low-density arrays produced from large-insert genomic clones such as bacterial artificial chromosomes (BACs) [2], the application has evolved rapidly. Currently, specialized high-density arrays with oligonucleotide probes or probes derived from BAC clones are predominately used. Two-channel array-based comparative genomic hybridization (aCGH) is a direct successor to conventional metaphase CGH [3]. In both cases, DNA from two samples are differentially labeled with fluorescent dyes and co-hybridized to immobilized genomic capture probes. By use of aCGH, DNA derived from tumor tissue can be compared with reference DNA, e.g., normal whole blood DNA, and genomic imbalances can effectively be investigated. The main advantage of aCGH over conventional CGH is the increased resolution achieved by microarrays with a large number of individual probes, routinely up to hundreds of thousands, covering the entire genome [4]. The power of aCGH has been demonstrated in tumor studies [5-8], as well as in the field of clinical genetics [9], and the basis of the technique is reviewed elsewhere [10]. In essence, relative ratios of copy number between two DNA samples are obtained by comparing the two fluorescent signal intensities for each probe under the assumption that intensities reflect the amount of corresponding genomic DNA in the respective sample. In much the same way as for gene expression microarray analysis, relative ratios must be normalized to account for systemic technical bias while retaining relevant biological changes [11]. Although much effort has been invested in developing methods for analysis of aCGH data, including break-point identification and segmentation [12-14], less attention has been devoted to normalization. For this latter purpose, methods originally developed for gene expression microarray data, such as global-median (Median) and intensity-based lowess (Lowess) normalization, have been adopted [5,6]. Recent reports have evaluated the performance of gene expression normalization strategies when applied on aCGH data and have proposed more specific approaches [15,16]. Although valid concerns about directly adopting existing normalization techniques are expressed, proposed strategies rely on available conventional methods and the inherent properties of aCGH data have, rather than being incorporated in the strategies, mainly been used for calibration and validation. Microarray data is frequently visualized using M-A plots in which the log ratio, referred to as M, is plotted as a function of log mean intensity, referred to as A (Figure (Figure1a)1a
Using self-self comparisons, in which a sample is compared with itself, it has been observed that other forms of technical bias, e.g., spatial- or plate bias, exist that can skew measured M values enough to revoke the validity of the aforementioned normalization methods [17]. Both methods have therefore been implemented in ways that include stratification of M values in groups of data that are individually subjected to the correction. Stratification can be performed based on, e.g., spatial probe location, or probe source [17]. The general thought is that stratification will result in groups, i.e., populations, of data in which the validity of the normalization method is upheld. It has also been observed that the assumptions, required for conventional normalization methods to work, can fail as a result of a true biological distribution of M, e.g., in situations where the majority of probes measure true differences between compared samples [18]. We here highlight a well known and commonly displayed property of tumor cells, namely the presence of biologically true CNAs. Figure Figure1b1b The proposed strategy can be integrated with any of several existing normalization methods and results in improved data quality. Also, spatial effects resulting in non-biological, but relevant, populations that can bias normalization are handled when calculating corrections. We also note that part of the procedure can be applied to assign adaptive sample-specific thresholds for calling copy number changes. The proposed normalization strategy, as well as the adaptive sample-specific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH. Results and Discussion This study is outlined as follows with results and discussion presented accordingly. To investigate the influence of copy number imbalances on normalization we first created a set of mimicked data representing states of an increasing fraction of genomic gain. Using the mimicked data we demonstrate the effects of gain on normalization using Median and Lowess. We then evaluated an alternative normalization strategy in which data is stratified into separate populations representing gain and balanced copy number respectively. Whereas mimicked data provide prior knowledge facilitating stratification, most experiments lack this information. Therefore, we developed a method for stratification of data and evaluated the method using previously characterized cases. By applying our procedure for stratification and normalization to tumor specimens on different aCGH platforms we compare performance with standard methods. We investigate the implication of technical spatial effects and propose a strategy for improved normalization. In addition, we evaluate the possibility to apply our method to assess noise levels in data and assign sample-specific thresholds for detection of copy number imbalances. Normalization of aCGH data using Median We assumed that aCGH data from samples with a substantial amount of imbalances could be erroneously corrected using Median normalization. This problem is not unexpected and the effect is well known in corresponding cases when gene expression microarray data is normalized [18]. We investigated this issue using aCGH data derived from tiling BAC arrays comparing copy number between DNA from a normal female with karyotype 46, XX and a cell line with 47, XXX [20]. In this case, autosomes are expected to yield log ratio values of M = 0 and the X chromosome is expected to yield log ratio values of M = 0.58 corresponding to XXX/XX. By first removing Y chromosome values and then randomly omitting a varying number of values for autosomes, while retaining all X chromosome values, we could mimic cases with different percentage of gain. In this way we created mimicked data sets with 5, 10, 15, 20, 25, 30, 35, and 40 percent gain, respectively, where 5 percent gain corresponds to not omitting any autosome values. Data sets were created from raw data and then subjected to normalization using Median. After normalization we investigated ratios for autosomes and the X chromosome (Figure (Figure2).2
Genomic imbalances correlate with intensity in aCGH data Importantly, when creating the mimicked data sets we did not generate any simulated ratio values; rather, we formed different selections of values using real experimental data. We believe that this use of real experimental data is of significance for aCGH data. This belief is founded on that, in contrast to expression levels, copy number levels are restricted to a, by comparison, moderate dynamic range. Therefore, when a genomic region is subjected to gain or amplification, the increase of genomic material is relatively substantial. Thus, we reasoned that probes for regions of gain would yield comparably higher average intensities than those for regions of normal copy number and that this, in turn, would result in a correlation between M and A: probes measuring ratios of gain will have higher average intensities. The opposite relationship would apply for probes measuring ratios of loss. Consequently, utilizing normalization strategies based on Lowess would possibly correct for correlations between M and A related to genomic imbalances, resulting in loss of biologically relevant variation. To test this, we subjected the mimicked XXX/XX data sets to Lowess normalization. Once again, as a result of an increased fraction of gain the median M for the X chromosome is shifted, this time from 0.42 to 0.22. The shift can also be observed when looking at autosomes for which the median M is shifted from -0.01 to -0.14 (Figures (Figures2d2d
Normalization of aCGH data using population-based intensity-based lowess We sought to develop a method that corrects for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. We reasoned that if we could stratify aCGH ratios from an experiment with respect to copy number populations, we could use this information to circumvent the drawbacks with Lowess. One way to do this would be to run Lowess on one selected population and then apply the resulting correction line on all M values. We refer to this general strategy of considering copy number populations when using Lowess as population-based intensity-based lowess (popLowess). Applying popLowess would serve two purposes. Firstly, data would be centered at a copy number population rather than a mean or median of a mixture of different and possibly diverse copy number levels. Secondly, correlations between M and A related to technical bias would be identified and corrected for without affecting the intensity dependence due to different copy numbers. To test this strategy, we subjected the mimicked XXX/XX data sets to popLowess. Since we had prior knowledge about this case we could stratify values into copy number populations based on chromosome mapping. All values for autosomes were considered to comprise one population and all values from the X chromosome another. After stratification, raw M and A values for the largest population were used to create a Lowess correction curve. The correction curve was generalized to cover the entire range of A and used to correct all values. Results are presented in figure figure4.4
Stratification of M values into copy number populations We aimed at developing a method for stratifying data into populations without prior knowledge regarding copy number allowing us to perform popLowess, and sought to identify populations in an automated fashion that requires minimal manual input and that adapt to varying noise levels. To accomplish this, we took advantage of the naively simplistic form of aCGH data, with a predetermined sequential genomic order of probes, and created a procedure described schematically in figure figure5,5
To test the performance of our stratification procedure in identifying copy number populations, we used a sample set (data set 8) containing eight hyperdiploid childhood acute lymphoblastic leukemia (ALL) cases previously investigated with aCGH, G-banding and M-FISH [21]. All cases show multiple whole chromosome gains and some cases also minor chromosomal regions of gain. For each case, a population of genomic regions affected by copy number gain was identified based on available karyotyping data. Remaining regions were identified as a diploid population. We performed steps 1–5 of the popLowess stratification procedure on each case using a merge cluster criterion of M = 0.3. Effectively, two popLowess populations were obtained for each case, corresponding closely to the karyotyping data of a normal diploid population and a population of copy number gain. For both the gain and diploid popLowess populations the total number of called probes divided by expected total number of probes from karyotyping data was calculated. Furthermore, the fraction of correctly called probes by popLowess for the specific regions of gain defined by karyotyping was calculated. The results demonstrate that the procedure can effectively stratify data into enriched populations that represents discrete copy number levels (Table 1).
A procedure for normalization of aCGH data using popLowess Once data is stratified into sets of enriched copy number populations we can select one, e.g., the largest, to perform Lowess normalization on. The generated correction curve must be generalized to cover the full range of A allowing for correction of all M values (Figure (Figure6e).6e Selecting a population to represent intrinsic copy number The normalization procedure presented herein will center a population with unknown copy number at M = 0. The rationale for selecting an appropriate population for this purpose can differ depending on samples analyzed and the aim of a project. For instance, in the field of cytogenetics, gains and losses in tumors are by convention described as net changes relative to intrinsic balanced copy number, i.e., relative ploididy. As the number of centromeres determines ploidity, a parallel rationale would be to relate imbalances relative to the largest identified population and therefore center this population at M = 0. However, in some applications it might be more appropriate to relate imbalances to a normal diploid state. Thus, selecting a population to center data at can include using prior knowledge about regions with known copy number or selecting the middle population out of three, if present. Irrespectively of preferences of how data best be centered, the proposed popLowess procedure will alleviate the normalization problems related to mixed copy number populations. Importantly, when performing focused aCGH with specialized arrays that do not cover the entire genome, or comprise probes with a disproportioned focus on specific genomic regions, even CNAs that affect a minor part of the genome can introduce a significant correlation between copy number and intensity, and can result in misinterpretations of how a given ratio level relate to copy number. Application to tumor specimens on different aCGH platforms We next set out to test the proposed popLowess strategy on tumor aCGH data that display a more complex pattern of genomic imbalances and to test its performance on data derived from different array platforms. Figures Figures7a7a
In order to illustrate the differences between alternative popLowess strategies we used variants to derive correction lines (Figure (Figure8).8
Comparison of popLowess strategy to standard normalization methods We set out to test if the popLowess strategy could systematically reduce variation in M within copy number populations in different aCGH data sets. We hypothesized that when correction curves cross, or not accurately track, copy number populations; or when intensity-based curvature is not properly addressed, a larger variation in M is obtained after normalization. To this aim, we compared the performance of the popLowess strategy versus Median and Lowess using seven different aCGH data sets (data sets 1–6, 8). The data sets cover three different types of aCGH platforms hybridized with a variety of cell line and tumor samples displaying a large variation of CNAs. We used the strategy in figure figure55 Results from the comparison are displayed in table 2, showing that the popLowess strategy generated normalized copy number data with smaller standard deviations in M within identified populations for all comparisons and data sets. We repeated the test using the inter-quartile range (IQR) of M for each population instead of the standard deviation and obtained similar results (data not shown).
Since we do not have prior knowledge of CNAs in most of the cases we cannot evaluate variation within confirmed genomic regions of similar copy number. Therefore, one could argue that the better performance of popLowess, resulting in lower variation within populations when compared with conventional normalization, is biased by the fact that populations are inferred from the data. However, from looking at the data in table 1, and at the genome plots in figure figure77 Spatial effects Presence of technical artifacts in array data resulting in correlation between M and spatial probe location on the array is a well-known and previously described phenomenon. We focused on two plausible consequences of such spatial effects in aCGH data. Firstly, affected values can introduce populations that compromise normalization in the same way as copy number populations. Secondly, affected values will be incorrectly scaled compared to non-affected. We reasoned that ratios biased by spatial artifacts are controlled for by our proposed popLowess strategy as it filters outlier data guided by genomic mapping. Thus, when calculating an intensity dependent correction for normalization, our strategy would not be compromised by spatial bias as affected values are disregarded together with values from break points, high-level amplifications, and homozygous deletions. On the other hand, popLowess does not correct for spatial effects and affected values would remain incorrectly scaled after normalization even if the intensity bias is removed. As the proposed popLowess strategy does not correct for spatial effects, we reasoned that a pre-normalization step might be appropriate for data displaying spatially related bias in order to properly scale affected values. This could be accomplished by applying one of many available spatial correction methods [15-17], or variations thereof, prior to popLowess. However, since we have shown that genomic imbalances correlate with intensity, we are cautious about addressing spatial effects using pre-normalization algorithms that are intensity-based. To test our reasoning we applied popLowess to data set 7. Samples in this set have little to no genomic alterations but the data display variation in M-A curvature and spatial effects. Data set 7 was normalized using popLowess, block-based Median followed by popLowess, or block-based Lowess followed by popLowess. For popLowess, by itself or in combination with a pre-normalization step, a merge cluster criteria of 0.3 in M was employed to account for the presence of only two copy number populations. As a measurement of spatial effects we calculated the standard deviation of medians of M from pin-tip blocks before and after normalization. We found that spatial bias may be corrected for by a pre-normalization step, preceding popLowess (Table 3).
We conclude that the proposed popLowess strategy is robust in the sense that it can handle the presence of otherwise deleterious populations without relying on them. We also conclude that, whereas popLowess is inert to spatial effects, in the sense that it does not compromise calculation of an intensity dependent correction, a pre-normalization step that correct for spatial bias is warranted. Adaptive sample-specific thresholds for calling copy number change During development of the popLowess strategy, we recognized that the sample-specific cut-off value (Figure (Figure5,5 A parallel can be made to the derivative log ratio spread (DLR) value calculated by the Agilent CGH Analytics software. The DLR-value can be used to assess hybridization quality and provide a sample scalable threshold for calling CNAs using, e.g., the Z-scoring algorithm in the CGH Analytics software. We used sample specific level thresholds derived from popLowess on aCGH data for a BRCA1 mutation positive tumor analyzed on two array platforms (Figure (Figure9).9
Normalization affects downstream analysis To exemplify how normalization can affect downstream analysis and interpretation we used data generated from the Agilent array presented in Figure Figure7.7
Conclusion We show that the presence of copy number populations in aCGH data deleteriously affects normalization using curve-generating algorithms such as intensity-based lowess and may cause erroneous centering of data. We demonstrate that genomic imbalances correlate with intensity in aCGH data and therefore must be accounted for during normalization in order to correct for intensity dependence of M due to technical bias while retaining intensity dependence of biological relevance. Here we propose a population-based normalization strategy that accounts for the presence of copy number populations. We show that benefits of a population-based normalization approach are clearly recognized for data displaying numerous CNAs. We also demonstrate that the proposed procedure can be applied to assign adaptive sample-specific thresholds for calling copy number changes. We appreciate that the suggested strategy represents only one conceivable way of implementing population-based normalization and that any implementation that effectively discerns copy number populations in aCGH data, whether utilizing prior knowledge regarding samples or inference from the data itself, could be used. In addition, once copy number populations are identified, this information can be used in a variety of ways to circumvent highlighted problems related to conventional normalization of aCGH data. Taken together, we demonstrate that copy number populations in aCGH data should be accounted for during normalization and that the proposed normalization strategy, as well as the adaptive sample-specific level scaling, provides powerful and convenient means for improved copy number analysis using aCGH. Methods Data sets We used eight data sets derived from BAC arrays and from Agilent 244 K oligonucleotide CGH arrays to evaluate normalization methods. Data set 1 consists of seven breast cancer cell lines analyzed using tiling 32 K BAC arrays [23]. Data set 2 consists of 28 lung cancer cell lines analyzed using tiling 32 K BAC arrays [25]. Data set 3 consists of ten breast cancer cell lines analyzed using tiling 32 K BAC arrays [20]. Data set 4 consists of 52 breast cancer tumors analyzed in dye-swaps on 1 Mb BAC arrays [8]. Data set 5 consists of 8 breast cancer tumors and one dye-swap analyzed using Agilent 244 K oligonucleotide CGH arrays [26]. These tumors displayed DLR values between 0.196 and 0.364 when analyzed with Agilent CGHAnalytics software ver 3.4.27 [26]. Data set 6 was created from data set 5 by matching the oligonucleotide probe IDs from the 244 K arrays to the Agilent 44B probe IDs available through Agilent eArray [27], thus creating a virtual 44 K oligonucleotide CGH array. Of 42,447 genome-mapped probe IDs on the 44B array, 41,599 were found on the 244 K arrays (98%). Data set 7 consists of nine hybridizations of chromosome X aberrant cell lines with karyotype 47, XXX and 48, XXXX, and male 46, XY and female 46, XX samples in various combinations [20]. Samples in data set 7 are expected to display a normal karyotype for chromosomes 1–22. Data set 8 consists of eight hyperdiploid childhood ALL cases analyzed using tiling 32 K BAC arrays [21]. Pre-filtering and conventional normalization of aCGH data All data sets were loaded into BioArray Software Environment (BASE) [28] for analysis. Positive and non-saturated spots were background corrected using the median foreground minus the median background signal intensity for each channel and log ratios (M) were calculated from the background corrected intensities. In all analysis we used M = log2(int1/int2) and A = log10(sqrt(int1*int2)), where int1 and int2 are background corrected intensities from the investigated sample and reference, respectively. Data sets 1–4 and 7–8 were filtered for signal-to-noise ratio for each spot in both channels according to published reports and the remaining data sets for signal-to-noise ratio > 5 in both channels before BASE implemented software plug-ins of the different normalization strategies were employed. A lowess smooth factor of 0.33, delta of 0.1, and four iterations were used for standard Lowess, popLowess and block-based lowess normalization. Block group size was set to 1 for all block-based normalizations. Population-based intensity-based lowess A schematic overview of the proposed popLowess normalization strategy is shown in figure figure5.5 Comparison of normalization methods For comparisons, the R implemented lowess function was used to create lowess-normalized data. For each identified population (step 1–5, Figure Figure5)5 Sample adaptive gain/loss thresholds Sample adaptive thresholds for calling gain or loss can be generated by performing steps 1–3 in Figure Figure55 Availability and requirements An implementation of popLowess in R http://www.r-project.org is available both as a plugin to the BioArray Software Environment (BASE) [28] and as a stand-alone version. Project name: popLowess Project home page: http://baseplugins.thep.lu.se/wiki/se.lu.onk.popLowess Operating system(s): Platform independent Programming language: R License: GNU GPL List of abbreviations aCGH: array-based CGH ALL: acute lymphoblastic leukemia BAC: bacterial artificial chromosome BASE: BioArray Software Environment CGH: comparative genomic hybridization CNA: copy number aberration CNV: copy number variation FISH: Fluorescence in situ hybridization IQR: Inter Quartile Range Lowess: Global intensity-based lowess normalization Median: Global median normalization popLowess: population-based intensity-based lowess normalization SKY: Spectral karyotyping technique Competing interests The author(s) declares that there are no competing interests. Authors' contributions All authors participated in the development of the model. JS implemented and developed the methods. JS and MR performed the statistical tests. JVC conceived the study. JS and JVC drafted the manuscript. All authors participated in the design of the study and in completing the manuscript. All authors read and approved the final manuscript. Acknowledgements We wish to thank Patrik Edén and Mattias Höglund for helpful comments on the manuscript. This work was supported by the Knut and Alice Wallenberg Foundation via the SWEGENE program (JS and JVC), the Swedish Cancer Society (GJ), the American Cancer Society (GJ and JVC), John och Augusta Perssons stiftelse (GJ and JVC), and the Swedish Foundation for Strategic Research through CREATE Health – the Lund Strategic Centre for Clinical Cancer Research (MR). References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||
Nat Genet. 1999 Sep; 23(1):41-6.
[Nat Genet. 1999]Nat Genet. 1998 Oct; 20(2):207-11.
[Nat Genet. 1998]Science. 1992 Oct 30; 258(5083):818-21.
[Science. 1992]Nucleic Acids Res. 2006; 34(2):445-50.
[Nucleic Acids Res. 2006]Genome Res. 2006 Dec; 16(12):1465-79.
[Genome Res. 2006]Nat Genet. 2002 Dec; 32 Suppl():496-501.
[Nat Genet. 2002]Bioinformatics. 2004 Dec 12; 20(18):3636-7.
[Bioinformatics. 2004]Bioinformatics. 2003 Sep 1; 19(13):1714-5.
[Bioinformatics. 2003]Genome Res. 2006 Dec; 16(12):1465-79.
[Genome Res. 2006]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]Methods. 2003 Dec; 31(4):265-73.
[Methods. 2003]Genome Biol. 2007; 8(1):R2.
[Genome Biol. 2007]Lab Invest. 2003 Mar; 83(3):387-96.
[Lab Invest. 2003]Genome Biol. 2007; 8(1):R2.
[Genome Biol. 2007]Genes Chromosomes Cancer. 2007 Jun; 46(6):543-58.
[Genes Chromosomes Cancer. 2007]Leukemia. 2006 Nov; 20(11):2002-7.
[Leukemia. 2006]Lab Invest. 2003 Mar; 83(3):387-96.
[Lab Invest. 2003]BMC Bioinformatics. 2006 May 22; 7():264.
[BMC Bioinformatics. 2006]Methods. 2003 Dec; 31(4):265-73.
[Methods. 2003]Genome Res. 2006 Dec; 16(12):1465-79.
[Genome Res. 2006]Cancer Res. 2005 Sep 1; 65(17):7612-21.
[Cancer Res. 2005]Hum Mol Genet. 2004 Sep 1; 13(17):1827-37.
[Hum Mol Genet. 2004]Breast Cancer Res. 2006; 8(1):R9.
[Breast Cancer Res. 2006]Annu Rev Genomics Hum Genet. 2005; 6():331-54.
[Annu Rev Genomics Hum Genet. 2005]Bioinformatics. 2003 Sep 1; 19(13):1714-5.
[Bioinformatics. 2003]Breast Cancer Res. 2006; 8(1):R9.
[Breast Cancer Res. 2006]Int J Cancer. 2006 Mar 15; 118(6):1556-64.
[Int J Cancer. 2006]Genes Chromosomes Cancer. 2007 Jun; 46(6):543-58.
[Genes Chromosomes Cancer. 2007]Cancer Res. 2005 Sep 1; 65(17):7612-21.
[Cancer Res. 2005]Leukemia. 2006 Nov; 20(11):2002-7.
[Leukemia. 2006]Genome Biol. 2002 Jul 15; 3(8):SOFTWARE0003.
[Genome Biol. 2002]Bioinformatics. 2003 Sep 1; 19(13):1714-5.
[Bioinformatics. 2003]Biostatistics. 2004 Oct; 5(4):557-72.
[Biostatistics. 2004]Physiol Genomics. 2001 Oct 10; 7(1):45-53.
[Physiol Genomics. 2001]Genome Biol. 2002 Jul 15; 3(8):SOFTWARE0003.
[Genome Biol. 2002]