![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2007 Lai et al; licensee BioMed Central Ltd. SIRAC: Supervised Identification of Regions of Aberration in aCGH datasets 1Bioinformatics group, Delft University, Delft, The Netherlands 2The Netherlands Cancer Institute, Amsterdam, The Netherlands Corresponding author.#Contributed equally. Carmen Lai: c.lai/at/tudelft.nl; Hugo M Horlings: h.horlings/at/nki.nl; Marc J van de Vijver: m.vd.vijver/at/nki.nl; Eric H van Beers: e.v.beers/at/nki.nl; Petra M Nederlof: p.nederlof/at/nki.nl; Lodewyk FA Wessels: l.wessels/at/nki.nl; Marcel JT Reinders: m.j.t.reinders/at/tudelft.nl Received December 15, 2006; Accepted October 30, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Array comparative genome hybridization (aCGH) provides information about genomic aberrations. Alterations in the DNA copy number may cause the cell to malfunction, leading to cancer. Therefore, the identification of DNA amplifications or deletions across tumors may reveal key genes involved in cancer and improve our understanding of the underlying biological processes associated with the disease. Results We propose a supervised algorithm for the analysis of aCGH data and the identification of regions of chromosomal alteration (SIRAC). We first determine the DNA-probes that are important to distinguish the classes of interest, and then evaluate in a systematic and robust scheme if these relevant DNA-probes are closely located, i.e. form a region of amplification/deletion. SIRAC does not need any preprocessing of the aCGH datasets, and requires only few, intuitive parameters. Conclusion We illustrate the features of the algorithm with the use of a simple artificial dataset. The results on two breast cancer datasets show promising outcomes that are in agreement with previous findings, but SIRAC better pinpoints the dissimilarities between the classes of interest. Background Genomic alterations in DNA copy number are important events in cancer development [1]. A tumor suppressor gene can be disabled by the physical loss of the gene, or similarly an oncogene may be over-expressed via the amplification of the region where it is located. The identification of chromosomal aberrations is, therefore, a powerful instrument in studies of cancer. It may suggest target genes for new drugs or shed light on the mechanisms which regulate the response to therapies [2-4]. The first approach to search for copy number alterations in CGH has been made by Kallioniemi et al. [5] using metaphase chromosomes. The extensions of this technique employ array technology to perform a high resolution scan of the genome. As reviewed by Pinkel et al. [3], several array CGH (aCGH) techniques have been developed. The spotting technology makes use of BAC clones (100 – 200 kb), cDNA clones (~100 – 1000 bp) and lately oligonucleotides (30 – 100 bp). More recently, in-situ technologies synthesize small oligonucleotides directly onto the array. Since the oligos can be a few tens bp long, higher resolution are reached, if a good coverage of the genome is adopted. An important challenge to analyze aCGH data is to find the aberrated chromosomal regions specific to the problem under study, e.g. to distinguish between subtypes of cancer. In order to reach this goal, three groups of approaches can be found in the literature. The first group of approaches uses only the aCGH data. First they identify the amplifications/deletions in each sample individually, and then search for the common aberrations between the samples. The identification per sample of chromosomal regions of aberration is a task in itself that has been approached in several ways. The simplest solution is the application of a threshold. The DNA-probes (BAC clones, cDNA clones or oligonucleotides) which exceed the threshold are considered amplified/deleted [6-9]. The choice of the threshold is a very critical parameter. Moreover, the threshold methods have the limitation that they do not take into account the spatial location of the DNA-probes. Since amplicons (i.e. regions that are amplified in a sample) are commonly assumed to involve more than a single DNA-probe, the spatial position is an important factor. Several more complex algorithms have been developed to identify, per sample, the aberrated regions in more robust ways. Lai et al. [10] reviewed eleven different methods available in the literature. Numerous segmentation methods have been proposed to divide the aCGH profile in piece-wise constant segments, and a likelihood function is used to estimate the model parameters from the data. For example, Picard et al. [11] modeled the aCGH profile with a random Gaussian process and introduced an adaptive penalized likelihood to estimate the segments and their locations. Jong et al. [12,13] proposed a genetic algorithm to maximize the likelihood function. A different approach was introduced by Wang et al. [14]. They identified the regions of amplification/deletion via a hierarchical clustering along the chromosome. The biologically relevant aberrations are not the ones that characterize a single sample, since these can be the consequence of the genomic instability of the particular tumor. The more interesting aberrations are the ones shared by many samples, ideally by all the samples in the same class. Previous studies combined the information of the per sample aberration by looking at the frequency of patients that carry the aberration [6,8,14-17]. Again a threshold on the minimal frequency is chosen. For example, Fridlyand et al. [15] require the aberrations to be present in more than 50% of one class and less than 30% of the second class, whereas Hyman et al. [16] demands that the aberration be present in at least two specimens. These approaches have in common that the class information is taken into account only in the second stage of the analysis, i.e. when computing the aberration frequency across the samples. In the first phase also the aberrations common to more classes are considered, even if they are not of interest for the study. This introduces an extra parameter when evaluating the significance of the aberrations to distinguish the classes of interest. Recently, Diskin et al. [18] proposed a more complex and systematic way to evaluate the significance of aberrations across samples. However, they require the input data to be discretized per sample into amplifications and deletions. This step can be performed using one of the mentioned above methods, but makes the results dependent on the particular approach chosen for discretization. A second group of approaches to detect aberrations across samples uses only the gene expression data together with the chromosomal location of the genes. The assumption is that an amplification directly affects the expression of the genes. Therefore, the genes in that region should have a detectable common over-expression. Similarly, the genes located in a deletion would have a detectable under-expression. Furge et al. [19] applied the binomial test per sample on the genes within a given window size. In order to cover the whole genome, the window is slid across the genome, performing a test at fixed intervals. The z-scores of the test for a particular location are averaged across several window sizes and a threshold is chosen. The locations above/below the threshold are identified as regions of chromosomal aberration. Levin et al. [20] applied a Poisson model to the expression data and incorporated the genomic location in their model-based scan statistic. These results are compared per sample with the aCGH data. Yi et al. [21] used a sliding window size of 5 genes to test the significance of the region according to two scores, which account for the homogeneity of behavior in the window and the power of the genes in discriminating the classes of interest. Dressman et al. [22] observed that the genes over-expressed shared the same location, hypothesized an amplification and validated their findings with PCR. These studies show interesting examples of aberrations identified using the transcriptome data only. However, the assumed strong correlation of aCGH and expression could not be detected by other studies [17,23-25]. Since the alteration in expression may be due to diverse mechanisms, the potentially underlying chromosomal aberrations would need to be verified either by PCR or FISH, if the number of loci to be tested is tractable, otherwise by aCGH data. The advantage of the aCGH technology arises in the genome-wide coverage of the analysis. The third group of approaches combines aCGH and expression data to detect regions of chromosomal aberration. The SLAM algorithm (Adler et al. [26]) is a prime example of this group. First the SAM analysis [27] is applied to the aCGH data in order to identify the DNA-probes which distinguish the two classes. Then the focus is on the DNA-probes that are correlated with the expression data. Based on the observation that many of them were on the same chromosome arm, the hyper-geometric distribution was used to test the significance of that arm. Inspired by the work of Adler et al. [26], we propose a supervised procedure to identify chromosomal regions of aberration using solely aCGH data. We use the SAM analysis to determine the "relevant" DNA-probes, i.e. the DNA-probes that distinguish the classes of interest. While Adler et al. [26] evaluated only a single location chosen in an ad hoc fashion, we build a systematic search to test the whole genome. We adopt a sliding window approach similar to the one proposed by Furge et al. [19]. More specifically, we apply a hyper-geometric test to window sizes of different length, and test the significance of the number of relevant DNA-probes in those windows. Our algorithm belongs to the first group of approaches, since it uses only aCGH data. However, it differs from the typical approaches in this group in the following ways. First of all it focuses only on the aberrations specific to the problem of interest, by exploiting the class labels in the first step (recognizing relevant DNA-probes). Importantly, no discretization, smoothing or segmentation algorithms are applied to the aCGH data. This leads to the advantage that the data is not altered based on the preconceived models that these algorithms presume. Moreover, we also avoid the optimization of the parameters that these models usually require (avoiding results sensitive to these choices). The use of the hyper-geometric test corrects for the non-uniform background distribution of the DNA-probes. This is particularly important since the DNA-probes are not equally spaced along the genome. In this way we build a robust algorithm to identify areas of interest specific to the problem under study. We illustrate the benefit of our procedure on an artificial dataset, and show the results on two breast cancer datasets. Algorithm description Figure Figure11
STEP 1. We identify with the SAM analysis [27] the DNA-probes which discriminate between the classes of interest. We call these DNA-probes the "relevant" probes. In Figure Figure11 STEP 2. We test, in a systematic way, whether the number of relevant DNA-probes in a region is higher than expected by chance. For this purpose we use the hyper-geometric test for a genomic position, and test whether the fraction of relevant DNA-probes in the window of length 2w represents a significant enrichment. By sliding the window of observation along the genome, shifting it a single DNA-probe position at a time, we obtain the test results for all positions. This procedure can be done effectively since the genomic locations where the test presents uncertainty, and therefore, needs to be computed, are only a subset of all genome positions. The locations are dependent on the positions where the relevant DNA-probes are situated. More precisely, for a given window w, the test needs only to be performed for three positions: a window centered on the location l of the DNA-probe itself, and two windows centered at l - w and l + w, i.e. centered at the end points of the first window. Consequently, tests are done for the three windows [l - 2w, l], [l - w, l + w] and [l, l + 2w] around the relevant DNA-probe. In total 3k tests are performed, where k is the number of relevant probes. This solution is computationally fast and allows a feasible multiple testing correction while providing the coverage of all genome positions relevant to the test. A Bonferroni correction for multiple testing is applied by multiplying the p-value of each test by the number of tests performed (3k). Note that the Bonferroni correction is a rather conservative correction, since the windows of observation of different DNA-probe may not be independent. In order to identify the regions of aberration, we interpolate the corrected p-values of the hypergeometric test using the maximum value; i.e. given two successive locations with corrected p-value a and b, the base-pairs positioned between those locations are assigned the maximum of a and b. The base-pairs of the genome where the corrected p-value is smaller than 0.05 are considered significantly enriched for genomic aberrations. This step is repeated for different window sizes in order to detect both small and large aberrations. An illustrative result is shown in Figure Figure11 STEP 3. The regions of aberration are identified based on a consensus between the results of the different window sizes. As illustrated in Figure Figure11 Complexity and scalability issues Our real datasets are BAC aCGH, with ~ 3000 DNA-probes. The complexity of the SIRAC algorithm is 1) Experimental results Set-up We illustrate our algorithm on an artificial dataset, described in the following Section and apply our method to two breast cancer datasets. The first dataset (NKI) is composed by 67 patients and 3219 BAC clones (DNA-probes). The samples are a selected series of the 295 breast cancer samples described in [28], and the BAC platform is discussed in [29]. The second dataset (Fridlyand) contains 67 samples and 2464 BAC clones, as described in [15]. In our proposed algorithm there are a few choices that the researcher has to make. A first important decision concerns the number of relevant DNA-probes. We choose to be conservative and require that the selected DNA-probes have a false discovery rate smaller than 0.005. This ensures that we include a very small fraction of false positive DNA-probes in further steps. Another parameter is the range of window sizes that are used to probe the genome. Since the average space between the clones is 1 Megabase (Mb), the minimum window of observation is set to 1 Mb. The maximum window size is fixed to 24 Mb because this is roughly half the length of the shortest chromosome. In this way, we enforce that the largest window does not always cover both the p and q arm of the chromosome. Results Artificial dataset The artificial dataset is created using the clone distribution of the 207 clones of Chromosome 1 on the NKI array. The amplitude of the DNA-probes is drawn from a normal distribution with zero mean and unit variance The samples in the other class are all drawn from the normal distribution {0.2, 0.4, 0.6, 0.8, 1}) and widths (u {2, 4, 8, 12, 16, 20, 24, 28, 32} megabases (Mb)). Given the region of amplification found by the algorithm, the DNA-probes located in this region that also belong to the interval between positions ls and le are defined as true positive, while the DNA-probes outside the interval are denoted false positives. Similarly, for the DNA-probes outside the region of amplification found by the algorithm, true negatives are the DNA-probes outside the interval between positions ls and le, while false negative are the DNA-probes included in this interval. In general, the same trend for specificity and sensitivity as a function of m is observed. Figure Figure22
In our algorithm, we combine the different window sizes in order to obtain a unique region of amplification, by setting the parameter s. A location is amplified if it is judged amplified in s window sizes. We also investigated the effect of the parameter s. The top four plots of Figure Figure33 {2, 9}. We choose s = 2 as a loose constraint, while the more strict value of s = 9 requires the consensus of two-thirds of the window sizes. For each plot, the horizontal axis depicts the different amplification lengths, u used, and the vertical axis the amplitudes of the amplification, m. The colors code the value of the sensitivity and specificity from 0 to 1. The small amplification of m = 0.2 is very difficult to detect, therefore the sensitivity is very low regardless of the length of the amplification (bottom row of blue squares in Figure 3(a)
In order to evaluate the control of the error rate, we computed the False Positive Rate (FPR), which is defined as The NKI dataset Sorlie and Perou [30-32] introduced the distinction of breast cancer into five different subtypes (Basal, ERBB2, Luminal A, Luminal B, Normal-like) based on the gene expression of the so called intrinsic genes. These genes were selected as the genes that had significantly greater variation in expression between different tumors than between paired samples of the same tumor. Using these genes, the profile of a centroid was obtained for each subtype. These centroids, in combination with the gene expression of 295 breast tumors [28] were employed to assign each sample in the NKI set to one of the subtypes based on its correlation with the centroid profiles across the intrinsic genes. In the NKI data, 21 out of 67 samples were labeled as Basal, 10 as ERBB2, 21 as Luminal A, 12 as Luminal B and 3 as Normal-like. Recently, Bergamaschi et al. [33] studied the genomic aberrations of the different subtypes on a aCGH dataset. We applied our method to the NKI dataset and compare our findings to the results of Bergamaschi et al. [33]. More specifically, we applied the SIRAC algorithm four times, each time analyzing one subtype against the rest. The Normal-like subtype was not considered in this analysis due to the small number of samples. Figure Figure44
Figure 5(a)
We compared our findings with the conclusions of Bergamaschi et al. [33] that also searched for aberrations associated with subtypes on a different aCGH dataset. They first used the CLAC algorithm [14] to determine per sample the chromosomal gains and losses, then discretized the information per cytoband. Finally they use the SAM analysis to identify the aberrations correlated with the class labels. The aberrations found by them are summarized in Figure 5(b) Some of the differences between our results obtained on the NKI dataset and Bergamaschi results can be explained by the fact that our algorithm targets only the aberrations specific for a given class when compared to the rest of the samples. Therefore, we don't have the same aberrations for two subtypes. This is, for example, the case for the amplification on Chromosome 17 that is present both in the Basal and ERBB2 subtype for Bergamaschi et al. [33] while it is only a feature of the ERBB2 subtype in our results. Similarly, the amplification on the q arm of Chromosome 1 is a strong aberration only in the Luminal A subtype in the NKI dataset, while Bergamaschi et al. [33] reported it for both the Luminal A and the Basal subtypes. Another aspect to take into account is that we choose an FDR < 0.005 for the identification of the relevant DNA-probes by the SAM analysis. This rather strict value limits the number of false positives, and enables us to highlight the stronger aberrations. We repeated the experiments with a less strict constraint, i.e. using a FDR smaller than 0.05 or 0.1. The results for the FDR < 0.05 are shown in Figure 5(c) Overall, given the differences in the datasets and in the methodology used, we can see striking similarities in the subtype characterization of the cancer. Especially the Basal, the ERBB2 and the Luminal A subtypes seem better defined, while the Luminal B type, seems rather weak, and we advocate that a better definition of this subtype needs to be established. As stated earlier, we simply chose to represent the detected aberrations in terms of chromosome arms in order to ease the comparison with Bergamaschi et al. [33]. However, such a representation does not highlight a very useful feature of the SIRAC algorithm: the scale space. The scale space allows evaluation of aberrations at different genomic resolutions, and the number of scales across which an aberration remains significant can also be employed to judge the importance of a region, for a fixed SAM-FDR. By employing this feature, one can zoom in on potentially interesting regions, where the aberration has a larger average amplitude, and is of medium length (see Figure Figure11 The Fridlyand dataset Recently, Fridlyand et al. [15] analyzed the aberrations of 67 breast cancer samples. First they smoothed each sample using Circular binary segmentation [34], and defined chromosomal aberrations per sample. Based on the clustering of the smoothed data they identified three subtypes, i.e. the 1q16q, the Complex and the Mixed amplifier subtypes. The 1q16q subtype is named after the only copy number aberrations detected, i.e. a gain on 1q and a loss on 16q. The Complex subtype is characterized by many low level copy number alterations, mainly ER negative tumors, and worse outcome than the others subtypes. The Mixed amplifier subtype tumors were both ER positive and ER negative and did show several aberrations. They analyzed the aberration frequency in each subtype in order to find patterns of chromosomal changes across samples. We applied our algorithm to their data, analyzing each subtype against the remaining samples. Figure Figure66
Discussion and conclusion We have presented a method to identify aberrant chromosomal regions that are specific for the problem under study. Our emphasis is not on the identification per sample of a chromosomal gain or loss, but we strive to evaluate what makes two classes different from each other, and what are the aberrations that distinguish them. We also want to limit the number of preprocessing steps, in order to reduce the set of inevitable parameters to be tuned. This motivated us to avoid the characterization per sample of the DNA-probes being amplified or deleted, which is instead the necessary input data for the STAC algorithm [18] and the approach followed by Fridlyand et al. [15]. We chose to use the raw data as input and assumed that a DNA-probe amplified/deleted in one class and not in the other is selected as significant by the SAM analysis. Of course the researcher has to choose the appropriate false discovery rate. This decision influences the number of DNA-probes preselected as relevant. This is an important starting point of our algorithm. We opted for a low false discovery rate for all the problems analyzed. The different number of relevant DNA-probes selected in the distinct cases already gave us an indication of the number and the strength of the chromosomal aberrations. For example in the NKI dataset the largest number of relevant DNA-probes was present in the Basal subtype, while the ERBB2 class was associated with only a few DNA-probes mainly on Chromosome 17. Our algorithm is designed to identify the copy number alterations in the aCGH data. The core of the algorithm resides in the identification of the regions of chromosomal aberration. We assumed that an aberration involves more than a single DNA-probe. Therefore, we tested in a systematic manner the candidate regions, i.e. the locations in the vicinity of the DNA-probes identified by the SAM analysis. The use of different window sizes allows us to detect different lengths of copy number changes and not to miss aberrations in regions sparsely covered by the aCGH probes. Since for the samples in the NKI data also the expression is available, we tested if similar results could be obtained by applying our algorithm to the expression directly, as Furge et al. [19] did. However, the assumption that an over/under expression should involve more than a single gene here does not hold anymore. Even if a region is amplified, not all genes may be active and, therefore, differentially expressed with respect to the reference. Moreover, while in the aCGH data the only cause of aberration resides in the copy number variation, the variance in the expression is due to multiple factors. In general, we observed in our expression dataset that the relevant genes selected by the SAM analysis were scattered across the genome and, therefore, no clear regions of significance were identified. This result further indicates that the detection of genomic aberration using gene expression datasets should be performed with caution, and results should always be validated with other tests, such as FISH or PCR, if not with genomic copy number data itself. Instead, the expression data can be used to perform a post-processing step on the algorithm applied to the aCGH data. Once the aberrated regions have been identified, the expression data allows for a further analysis of the genes present in these regions. For example, the genes can be prioritized according to the correlation between the expression and the aCGH data, or according to the ability of each gene to distinguish between the classes of interest. This is especially relevant since we expect that, for instance, not all genes in a region of aberration will be active, some may be silent and not contributing to the mechanism of cancer. A selection can be done based on this additional information source, resulting in a smaller list of potentially interesting genes to be further analyzed. The benefits of the use of the expression data are exemplified by the ERBB2 subtype in the NKI dataset. The genes present in the amplified region of Chromosome 17 were ranked according to the product of the p-value of the t-test (computed on the gene expression and class labels) and the p-value of the correlation between the expression of each gene and its closest DNA-probe. The top two genes are the ERBB2 gene itself and the GRB7, i.e. the growth factor receptor-bound protein 7. This is expected since the ERBB2 subtype is characterized by the amplification of the ERBB2 gene, and the GRB7 is found to be over-expressed and co-amplified with the ERBB2 gene [22,35,36]. Therefore, a combined approach of SIRAC and the use of gene expression is a powerful additional tool in the search for marker genes. In the SIRAC algorithm we first detect associations of single probes with the class label, and then search for regions that are enriched for class label associated probes. This is advantageous especially when working with tumor samples. The heterogeneity of the tumors may lead to signals for the aberrations smaller than the ones expected if the sample cells were homogeneous. Therefore, amplifications/deletions with small absolute values may be of interest as well, especially when they discriminate the classes of interest. Several authors (e.g. Saramaki et al. [37], Fridlyand and Chin et al. [15,38], and Nymark et al. [39]) have recently pointed out that even low-level copy number aberrations may have significant effects on the gene-expression and, therefore, on the cell functioning and tumor development. The error rate control of SIRAC is performed in two different steps. First the null-hypothesis being constructed during the permutation steps of the SAM procedure, second, the Bonferroni correction for multiple testing applied to the p-values of the hypergeometric test. The artificial experiment illustrates how the dependencies between these two steps may lead to an anti-conservative control of the error rate. The choice of the parameter s, which combines the outcomes of different window sizes, plays an important role. The artificial experiments suggests that the stricter the value, e.g. s = 9, the better the control of the error rate. However, this is achieved at the expenses of the sensitivity. Therefore, less conservative choices, e.g. s = 2, may be used. In this case, the p-values of the hypergeometric test need to be interpreted with caution. The SIRAC algorithm, however, provides useful details, such as the number of window sizes in which each DNA-probe was judge significant, that can be used to further prioritize the regions. Moreover, if the expression data is available, further validation of the aberrations may be performed by investigating the correlation with the expression of the genes in the identified region. In conclusion, we focused on the identification of the chromosomal aberrations that discriminate between the classes of interest and proposed a robust algorithm for the evaluation of their significance. Our algorithm does not require preprocessing of the data such as discretization or smoothing, and uses a limited number of parameters. Our findings on the two breast cancer datasets are in agreement with previous studies, and better highlight the dissimilarities between the classes of interest. Appendix Algorithm 1 SIRAC: Supervised Identification of Relevant Aberration in aCGH datasets 1: Input: dataset D, label set y, SAM parameters: d for the desired false discovery rate and number of iterations I; vector W with half the sizes of observation windows; threshold t for the hyper-geometric distribution; minimum number of windows sizes s for which the location is judged significant. 2: Apply the SAM analysis with the given parameters d and I to the labeled dataset D, y. A vector J stores the indexes of the relevant DNA-probes obtained. 3: Initialize variables: P = ones(|W|, 3|J|), stores the p-value of the test; POS = zeros(|W|, 3|J|) stores the location where the test is applied. 4: ∀ w W (for all window sizes)5: Initialize: bon = 0; (count the number of tests performed) 6: ∀j J (for all relevant DNA-probes)7: Determine position of the window centers C = [lj - w, lj, + w] around the DNA-probe, with lj the position of the jth DNA-probe. 8: If 9: Then 10: Initialize: H = ones(1, 3), (stores the test value for the triplet position in C) 11: ∀ c C (for all window positions)12: 13: x = number of relevant DNA-probes in the window [c - w, c + w], 14: M = number of DNA-probes in the dataset D, 15: k = number of relevant DNA-probes in the dataset D, 16: N = number of DNA-probes in the window [c - w, c + w]. 17: Hc = 1 - h; 18: bon = bon+1; (update the counter) 19: End 20: Pwj = H; (Pwj is the p-value on row w and probe triplet j); 21: POSwj = C; (POSwj stores the triplet window location); 22: Pw = Pw × bon; (Bonferroni correction) 23: ∀l G (all positions in the genome):24: 25: 26: Output: all locations with Fl ≤ s. Availability and requirements Project name: SIRAC Project home page: http://bioinformatics.nki.nl/software.php Operating system(s): Platform independent Programming language: Matlab Authors' contributions CL, HMH, MJvdV, MJTR and LFAW designed the experiments and analyzed the results; CL carried out the analysis; HMH generated the NKI dataset; EHvB and PMN set up the BAC platform and software employed to profile and pre-process the NKI dataset; all authors read and approved the final manuscript. Acknowledgements The authors would like to thank Simon Joosse for technical assistance in the hybridizations and pre-processing of the NKI dataset, Arno Velds for assistance on the mapping of the BAC clones on the genome, Dick de Ridder and Theo Knijnenburg for the matlab implementation of the SAM algorithm. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nature. 1998 Dec 17; 396(6712):643-9.
[Nature. 1998]Nat Rev Cancer. 2007 Jul; 7(7):545-53.
[Nat Rev Cancer. 2007]Int J Cancer. 2003 Feb 20; 103(5):565-71.
[Int J Cancer. 2003]Science. 1992 Oct 30; 258(5083):818-21.
[Science. 1992]Nat Genet. 2005 Jun; 37 Suppl():S11-7.
[Nat Genet. 2005]Cancer Res. 2003 Jun 1; 63(11):2872-80.
[Cancer Res. 2003]Proc Natl Acad Sci U S A. 2004 Jan 27; 101(4):1039-44.
[Proc Natl Acad Sci U S A. 2004]Bioinformatics. 2005 Oct 1; 21(19):3763-70.
[Bioinformatics. 2005]BMC Bioinformatics. 2005 Feb 11; 6():27.
[BMC Bioinformatics. 2005]Bioinformatics. 2004 Dec 12; 20(18):3636-7.
[Bioinformatics. 2004]Cancer Res. 2003 Jun 1; 63(11):2872-80.
[Cancer Res. 2003]Breast Cancer Res. 2005; 7(6):R1186-98.
[Breast Cancer Res. 2005]Biostatistics. 2005 Jan; 6(1):45-58.
[Biostatistics. 2005]Int J Oncol. 2002 Dec; 21(6):1197-204.
[Int J Oncol. 2002]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]BMC Genomics. 2005 May 9; 6(1):67.
[BMC Genomics. 2005]Bioinformatics. 2005 Jun 15; 21(12):2867-74.
[Bioinformatics. 2005]Genomics. 2005 Mar; 85(3):401-12.
[Genomics. 2005]Cancer Res. 2003 May 1; 63(9):2194-9.
[Cancer Res. 2003]Int J Oncol. 2002 Dec; 21(6):1197-204.
[Int J Oncol. 2002]Nat Genet. 2006 Apr; 38(4):421-30.
[Nat Genet. 2006]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Nat Genet. 2006 Apr; 38(4):421-30.
[Nat Genet. 2006]BMC Genomics. 2005 May 9; 6(1):67.
[BMC Genomics. 2005]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Cancer Res. 2005 Feb 1; 65(3):822-7.
[Cancer Res. 2005]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]Nature. 2000 Aug 17; 406(6797):747-52.
[Nature. 2000]Proc Natl Acad Sci U S A. 2003 Jul 8; 100(14):8418-23.
[Proc Natl Acad Sci U S A. 2003]N Engl J Med. 2002 Dec 19; 347(25):1999-2009.
[N Engl J Med. 2002]Genes Chromosomes Cancer. 2006 Nov; 45(11):1033-40.
[Genes Chromosomes Cancer. 2006]Genes Chromosomes Cancer. 2006 Nov; 45(11):1033-40.
[Genes Chromosomes Cancer. 2006]Biostatistics. 2005 Jan; 6(1):45-58.
[Biostatistics. 2005]Genes Chromosomes Cancer. 2006 Nov; 45(11):1033-40.
[Genes Chromosomes Cancer. 2006]Genes Chromosomes Cancer. 2006 Nov; 45(11):1033-40.
[Genes Chromosomes Cancer. 2006]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]Biostatistics. 2004 Oct; 5(4):557-72.
[Biostatistics. 2004]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]Genome Res. 2006 Sep; 16(9):1149-58.
[Genome Res. 2006]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]BMC Genomics. 2005 May 9; 6(1):67.
[BMC Genomics. 2005]Cancer Res. 2003 May 1; 63(9):2194-9.
[Cancer Res. 2003]Cancer Res. 2001 Nov 15; 61(22):8235-40.
[Cancer Res. 2001]Cancer Res. 2005 Feb 15; 65(4):1376-83.
[Cancer Res. 2005]Int J Cancer. 2006 Sep 15; 119(6):1322-9.
[Int J Cancer. 2006]BMC Cancer. 2006 Apr 18; 6():96.
[BMC Cancer. 2006]Cancer Cell. 2006 Dec; 10(6):529-41.
[Cancer Cell. 2006]Cancer Res. 2006 Jun 1; 66(11):5737-43.
[Cancer Res. 2006]