Logo of ajhgLink to Publisher's site
Am J Hum Genet. 2007 Jul; 81(1): 114–126.
Published online 2007 Jun 5. doi:  10.1086/518809
PMCID: PMC1950910

Highly Sensitive Method for Genomewide Detection of Allelic Composition in Nonpaired, Primary Tumor Specimens by Use of Affymetrix Single-Nucleotide–Polymorphism Genotyping Microarrays


Loss of heterozygosity (LOH), either with or without accompanying copy-number loss, is a cardinal feature of cancer genomes that is tightly linked to cancer development. However, detection of LOH is frequently hampered by the presence of normal cell components within tumor specimens and the limitation in availability of constitutive DNA. Here, we describe a simple but highly sensitive method for genomewide detection of allelic composition, based on the Affymetrix single-nucleotide–polymorphism genotyping microarray platform, without dependence on the availability of constitutive DNA. By sensing subtle distortions in allele-specific signals caused by allelic imbalance with the use of anonymous controls, sensitive detection of LOH is enabled with accurate determination of allele-specific copy numbers, even in the presence of up to 70%–80% normal cell contamination. The performance of the new algorithm, called “AsCNAR” (allele-specific copy-number analysis using anonymous references), was demonstrated by detecting the copy-number neutral LOH, or uniparental disomy (UPD), in a large number of acute leukemia samples. We next applied this technique to detection of UPD involving the 9p arm in myeloproliferative disorders (MPDs), which is tightly associated with a homozygous JAK2 mutation. It revealed an unexpectedly high frequency of 9p UPD that otherwise would have been undetected and also disclosed the existence of multiple subpopulations having distinct 9p UPD within the same MPD specimen. In conclusion, AsCNAR should substantially improve our ability to dissect the complexity of cancer genomes and should contribute to our understanding of the genetic basis of human cancers.

Genomewide detection of loss of heterozygosity (LOH), as well as copy-number (CN) alterations in cancer genomes, has drawn recent attention in the field of cancer genetics,13 because LOH has been closely related to the pathogenesis of cancers, in that it is a common mechanism for inactivation of tumor suppressor genes in Knudson’s paradigm.4 Moreover, the recent discovery of the activating Janus kinase 2 gene (JAK2 [MIM *147796]) mutation that is tightly associated with the common 9p LOH with neutral CNs, or uniparental disomy (UPD), in myeloproliferative disorders (MPDs)58 uncovered a new paradigm—that a dominant oncogenic mutation may be further potentiated by duplication of the mutant allele and/or exclusion of the wild-type allele—underscoring the importance of simultaneous CN detection with LOH analysis. On this point, Affymetrix GeneChip SNP-detection arrays, originally developed for large-scale SNP typing,9 provide a powerful platform for both genomewide LOH analysis and CN detection.1012 On this platform, the use of large numbers of SNP-specific probes showing linear hybridization kinetics allows not only for high-resolution LOH analysis at ∼2,500–150,000 heterozygous SNP loci but also for accurate determination of the CN state at each LOH region.1214 Unfortunately, however, the sensitivity of the currently available algorithm for LOH detection by use of SNP arrays may be greatly reduced when they are applied to primary tumor specimens that are frequently heterogeneous and contain significant normal cell components.

In this article, we describe a simple but highly sensitive method to detect allelic dosage (CNs) in primary tumor specimens on a GeneChip platform, with its validations, and some interesting applications to the analyses of primary hematological tumor samples. It does not require paired constitutive DNA of tumor specimens or a large set of normal reference samples but uses only a small number of anonymous controls for accurate determination of allele-specific CN (AsCN) even in the presence of significant proportions of normal cell components, thus enabling reliable genomewide detection of LOH in a wide variety of primary cancer specimens.

Material and Methods

Samples and Microarray Analysis

Genomic DNA extracted from a lung cancer cell line (NCI-H2171) was intentionally mixed with DNA from its paired lymphoblastoid cell line (LCL) (NCI-BL2171) to generate a dilution series, in which tumor contents started at 10% and increased by 10% up to 90%. The ratios of admixture were validated using measurements of a microsatellite (D3S1279) within a UPD region on chromosome 3 (data not shown). The nine mixed samples, together with nonmixed original DNAs (0% and 100% tumor contents), were analyzed with GeneChip 50K Xba SNP arrays (Affymetrix). Microarray data corresponding to 5%, 15%, 25%,…, and 95% tumor content were interpolated by linearly superposing two adjacent microarray data sets after adjusting the mean array signals of the two sets. Both cell lines were obtained from the American Type Culture Collection (ATCC). Genomic DNA was also extracted from 85 primary leukemia samples, including 39 acute myeloid leukemia (AML [MIM #601626]) samples and 46 acute lymphoblastic leukemia (ALL) samples, and was subjected to analysis with 50K Xba SNP arrays. Of the 85 samples, 34 were analyzed with their matched complete-remission bone marrow samples. DNA from 53 MPD samples—13 polycythemia vera (PV [MIM #263300]), 21 essential thrombocythemia (ET [MIM #187950]), and 19 idiopathic myelofibrosis (IMF [MIM #254450])—43 of which had been studied for JAK2 mutations,8 were also analyzed with 50K Xba SNP arrays. Microarray analyses were performed according to the manufacturer’s protocol,15 except with the use of LA Taq (Takara) for adaptor-mediated PCR. Also, DNA from 96 normal volunteers was used for the analysis. All clinical specimens were made anonymous and were incorporated into this study in accordance with the approval of the institutional review boards of the University of Tokyo and Harvard Medical School.

AsCN Analyses Using Anonymous Control Samples (AsCNAR)

SNP typing on the GeneChip platform uses two discrete sets of SNP-specific probes, which are arbitrarily but consistently named “type A” and “type B” SNPs, at every SNP locus, each consisting of an equal number of perfectly matched probes (PMAs or PMBs) and mismatched probes (MMAs or MMBs). For AsCN analysis, the sums of perfectly matched probes (PMAs or PMBs) for the ith SNP locus in the tumor (tum) sample and reference samples (ref1, ref2,…, refN),

equation image


equation image

are compared separately at each SNP locus, according to the concordance of the SNP calls in the tumor sample (Otumi) and the SNP calls in a given reference sample (OrefIi),

equation image

and the total CN ratio is calculated as follows:

equation image

For CN estimations, however, RrefIAB,i, RrefIA,i, and RrefIB,i are biased by differences in mean array signals and different PCR conditions between the tumor sample and each reference sample and need to be compensated for these effects to obtain their adjusted values equation M1, equation M2, and equation M3, respectively (appendix A).16

These values are next averaged over the references that have a concordant genotype for each SNP in a given set of references (K), and we obtain equation M4, equation M5, and equation M6. Note that equation M7 and equation M8 are calculated only for heterozygous SNPs in the tumor sample (see appendix A for more details).

A provisional total CN profile ΛK is provided by

equation image

and provisional AsCN profiles are obtained by

equation image

These provisional analyses, however, assume that the tumor genome is diploid and has no gross CN alterations, when the coefficients are calculated in regressions. In the next step, the regressions are iteratively performed using a diploid region that is truly or is expected to be diploid, to determine the coefficients on the basis of the provisional total CN, and then the CNs are recalculated.

Finally, the optimized set of references is selected that minimizes the SD of total CN at the diploid region by stepwise reference selection, as described in appendix A. Allele-specific analysis using a constitutive reference, refSelf, is provided by

equation image


equation image

Computational details of AsCNAR are provided in appendix A.

Comparison with Other Algorithms

dChip,17 and PLASQ,18 were downloaded from their sites, and the identical microarray data were analyzed using these programs. Since PLASQ requires both Xba and Hind array data, microarray data of mixed tumor contents for Hind arrays were simulated by linearly superimposing the tumor cell line (NCI-H2171) and LCL (NCI-BL2171) data at indicated proportions.

Statistical Analysis

Significance of the presence of allelic imbalance (AI) in a given region, Γ, called as having AI by the hidden Markov model (HMM), was statistically tested by calculating t statistics for the difference in AsCNs, equation M9, between Γ and a normal diploid region, where the tests were unilateral. Significance between the numbers of UPDs detected by the SNP call–based method and by AsCNAR was tested by one-tailed binominal tests. P values for AI detection by allele-specific PCR were calculated by one-tailed t tests, comparing triplicates of the target sample and triplicates of five normal samples that have heterozygous alleles in the SNP.

Detection of the JAK2 Mutation and Measurements of Relative Allele Doses

The JAK2 V617F mutation was examined by a restriction enzyme–based analysis, in which PCR-amplified JAK2 exon 12 fragments were digested with BsaXI, and the presence of the undigested fragment was examined by gel electrophoresis.5 Relative allele dose between wild-type and mutated JAK2 was determined by measuring allele-specific PCR products for wild-type and mutated JAK2 alleles by capillary electrophoresis by use of the 3100 Genetic Analyzer (Applied Biosystems), as described in the literature.19 Likewise, the fraction of tumor components having 9p and other UPDs was measured by either allele-specific PCR or STR PCR,7,19 by use of the primers provided in appendix B. The percentage of UPD-positive cells (%UPD(+)) was also estimated as the mean difference of AsCNs for heterozygous SNPs within the UPD region divided by that for homozygous SNPs within an arbitrary selected normal region:

equation image

where AsCNs for the denominator were calculated as if the homozygous SNPs were heterozygous. However, in those samples with a high percentage of UPD-positive components, the heterozygous SNP rate in the UPD region decreased. For such regions, we calculated the percentage of UPD-positive cells by randomly selecting 30% (the mean heterozygous SNP call rate for this array) of all the SNPs therein and by assuming that they were heterozygous SNPs. Cellular composition of JAK2 wild-type (wt) and mutant (mt) homozygotes (wt/wt and mt/mt) and heterozygotes (wt/mt) in each MPD specimen was estimated assuming that all UPD components are homozygous for the JAK2 mutation. The fractions of the wt/mt heterozygotes in cases with a 9p gain were estimated assuming that the duplicated 9p alleles had the JAK2 mutation. Throughout the calculations, small negative values for wt/mt were disregarded.


FISH analysis was performed according to the previously published method, to confirm the absolute total CNs in NCI-H2171.20 The genomic probes were generated by whole-genome amplification of FISH-confirmed RP11 BAC clones 169N13 (3q13; CN=2), 227F7 (8q24; CN=2), 196H14 (12q14; CN=2), 25E13 (13q33; CN=2), 84E24 (17q24; CN=2), 12C9 (19q13; CN=2), 153K19 (3q13; CN=3), 94D19 (3p14; CN=1), 80P10 (8q22; CN=1), and 64C21 (13q12-13; CN=1), which were obtained from the BACPAC Resources Center at the Children’s Hospital Oakland Research Institute in Oakland, California.


SNP Call–Based Genomewide LOH Detection by Use of SNP Arrays

When a pure tumor sample is analyzed with a paired constitutive reference on a GeneChip Xba 50K array, LOH is easily detected as homozygous SNP loci in the tumor specimen that are heterozygous in the constitutive DNA (fig. 1A, pink bars). In addition, given a large number of SNPs to be genotyped, the presence of LOH is also inferred from the grossly decreased heterozygous SNP calls, even in the absence of a paired reference (fig. 1D). The accuracy of the LOH inference would depend partly on the algorithm used but more strongly on the tumor content of the specimens. Thus, our SNP call–based LOH inference algorithm in CNAG (appendix C), as well as that of dChip,17 show almost 100% sensitivity and specificity for pure tumor specimens. But, as the tumor content decreases, the LOH detection rate steeply declines (fig. 1G), and, with <50% tumor cells, no LOH can be detected, even when complete genotype information for both tumor and paired constitutive DNA is obtained (fig. (fig.1B1B, 1E1E, 1H1H, and and1I1I).

Figure  1.
AsCN analysis with or without paired DNA. DNA from a lung cancer cell line (NCI-H2171) was mixed with DNA from an LCL (NCI-BL2171) established from the same patient at the indicated percentages and was analyzed with GeneChip 50K Xba SNP arrays. AsCNs, ...

LOH Detection Based on AsCN Analysis

On the other hand, the capability of allele-specific measurements of CN alterations in cancer genomes is an excellent feature of the SNP array-based CN-detection system that uses a large number of SNP-specific probe sets.16,18,21 When constitutive DNA is used as a reference, AsCN analysis is accomplished by separately comparing the SNP-specific array signals from the two parental alleles at the heterozygous SNP loci in the constitutive genomic DNA.16 It determines not only the total CN changes but also the alterations of allelic compositions in cancer genomes, which are captured as the split lines in the two AsCN graphs (fig. (fig.1A1A and and1B).1B). In this mode of analysis, the presence of LOH can be detected as loss of one parental allele, even in specimens showing almost no discordant calls (fig. 1B).


The previous method for AsCN analysis, however, essentially depends on the availability of constitutive DNA, since AsCNs are calculated only at the heterozygous SNP loci in constitutive DNA.16 Alternatively, allele-specific signals can be compared with those in anonymous references on the basis of the heterozygous SNP calls in the tumor specimen. In the latter case, the concordance of heterozygous SNP calls between the tumor and the unrelated sample is expected to be only 37% with a single reference. However, the use of multiple references overcomes the low concordance rate with a single reference, and the expected overall concordance rate for heterozygous SNPs and for all SNPs increases to 86% and 92%, respectively, with five unrelated references (appendix D). Thus, for AsCNAR, allele-specific signal ratios are calculated at all the concordant heterozygous SNP loci for individual references, and then the signal ratios for the identical SNPs are averaged across different references over the entire genome. For the analysis of total CNs, all the concordant SNPs, both homozygous and heterozygous, are included in the calculations, and the two allele-specific signal ratios for heterozygous SNP loci are summed together. Since AsCNAR computes AsCNs only for heterozygous SNP loci in tumors, difficulty may arise on analysis of an LOH region in highly pure tumor samples, in which little or no heterozygous SNP calls are expected. However, as shown above, such LOH regions can be easily detected by the SNP call–based algorithm, where AsCNAR is formally calculated assuming all the SNPs therein are heterozygous. Thus, the AsCNAR provides an essentially equivalent result to that from AsCN analysis using constitutional DNA, with similar sensitivity in detecting AI and LOH (compare fig. fig.1A1A with with1D1D and and1B1B with with1E1E).

As expected from its principle, AsCNAR is more robust in the presence of normal cell contaminations than are SNP call–based algorithms. To evaluate this quantitatively, we analyzed tumor DNA that was intentionally mixed with its paired normal DNA at varying ratios in 50K Xba SNP arrays, and the array data were analyzed with AsCNAR. To preclude subjectivity, LOH regions were detected by an HMM-based algorithm, which evaluates difference in AsCNs in both parental alleles (appendix E).22 As the tumor content decreases, the SNP call–based LOH inference fails to detect LOH because of the appearance of heterozygous SNP calls from the contaminated normal cell component (fig. (fig.1E1E and and1G1G1I), but these heterozygous SNP calls, in turn, make AsCNAR operate effectively. In fact, this algorithm precisely identifies known LOH regions, as well as regions with AI, in intentionally mixed tumor samples containing as little as 20% (for LOH without CN loss) to 25% (LOH with CN loss) tumor contents (fig. (fig.2A2A2C). Note that this large gain in sensitivity is obtained without the expense of specificity, which is very close to 100%, as observed with other algorithms (fig. 2D). In AsCNAR, small regions of AI (<1 million bases in length) are difficult to detect in samples contaminated with normal cells. However, such regions are also difficult to detect using other algorithms (data not shown).

Figure  2.
Sensitivity and specificity of LOH detection for intentionally mixed tumor samples. Sensitivity of detection of LOH with or without CN loss (A and B) in different algorithms were compared using a mixture of the tumor sample (NCI-H2171) and the paired ...

Identification of UPD in Primary Tumor Samples

To examine further the strength of the newly developed algorithms for AsCN and LOH detection, we explored UPD regions in 85 primary acute leukemia samples, including 39 AML and 46 ALL samples, on GeneChip 50K Xba SNP arrays, since recent reports identified frequent (∼20%) occurrence of this abnormality in AML.23,24 In the SNP call–based LOH inference algorithm, 16 UPD regions were identified in 14 cases, 8 (20.5%) AML and 6 (13.0%) ALL. However, the frequencies were almost doubled with the AsCNAR algorithm; a total of 28 UPD loci were identified in 25 cases, including 14 (35.9%) AML and 11 (23.9%) ALL (fig. 3A and table 1). In 5 of the 25 UPD-positive cases, a matched remission sample was available for AsCN analysis, which provided essentially the same results as AsCNAR, except for one relapsed AML case (W150673). In the latter case, a discrepancy in AsCN shifts in 17p UPD occurred between AsCN analysis with and without a constitutive reference, with more CN shift detected with anonymous references (fig. (fig.4A4A and and4B).4B). The discrepancy was, however, explained by the unexpected detection of a subtle UPD change in 17p in the reference sample by AsCNAR (P<.0001, by t test) (fig. 4C), which offset the CN shift in the relapsed sample, although it was morphologically and cytogenetically diagnosed as in complete remission.

Figure  3.
The number of UPD regions for acute leukemia and MPD samples detected by either the SNP call–based method or AsCNAR. The number of UPD regions for ALL and AML samples detected by the SNP call–based method or by AsCNAR is shown in panel ...
Figure  4.
Detection of AI in samples of primary AML and MPD. AsCN analyses disclosed the presence of a small population with 17p UPD in a primary AML specimen (W150673) (93% blasts in microscopic examination) with either a paired sample (A) or anonymous reference ...
Table 1.
CN-Neutral LOH in Primary Acute Leukemia

Analysis of 9p UPD in MPDs

Another interesting application of the AsCNAR is the analysis of allelic status in the 9p arm among patients with MPD, which includes PV, ET, and IMF. According to past reports, ∼10% (in ET) to ∼40% (in PV) of MPD cases with the activating JAK2 mutation (V617F) show evidence of clonal evolution of dominant progeny that carry the homozygous JAK2 mutation caused by 9p UPD.5,7,8 In our series that included 53 MPD cases, the JAK2 mutation was detected in 32 (60%), of which 13 (41%) showed >50% mutant allele by allele measurement with the use of allele-specific PCR, and thus were judged to have one or more populations carrying homozygous JAK2 mutations (table 2). This frequency is comparable to that reported elsewhere.8 However, when the same specimens were analyzed with 50K Xba SNP arrays by use of the AsCNAR algorithm, 20 of the 32 JAK2 mutation–positive cases were demonstrated to have minor UPD subpopulations (table 2 and fig. 3B), in which as little as 17% of UPD-positive populations were sensitively detected (fig. 4D). In fact, these minor (<50%) UPD-positive populations in these cases were also confirmed by allele-specific PCR of SNPs on 9p (table 2). The proportion of 9p UPD–positive components estimated both from allele-specific PCR and from AsCNAR (see the “Material and Methods” section) shows a good concordance (table 2). In some cases, 9p UPD–positive cells account for almost all the JAK2 mutation–positive population, whereas, in others, they represent only a small subpopulation of the entire JAK2 mutation–positive population (fig. 5). AsCNAR analysis also disclosed the additional three cases that have 9p gain (9p trisomy) (fig. 4E). The 9p trisomy is among the most-frequent cytogenetic abnormalities in MPDs25 and is implicated in duplication of the mutated JAK2 allele6 but could not have been discriminated from UPD or “LOH with CN loss” by use of conventional techniques—for example, allele-specific PCR to measure relative allele dose. Since the proportions of the mutated JAK2 allele coincide with two-thirds of the observed trisomy components in all three cases, the data suggest that the mutated JAK2 allele is duplicated in the 9p trisomy cases (table 2). Of particular interest is the unexpected finding of the presence of two discrete populations carrying 9p UPD in three cases, in which the AsCN graph showed a two-phased dissociation along the 9p arm (fig. 4F). In the previous observations, homozygous JAK2 mutations have been reported to be more common in PV cases (∼40%) than in ET cases (<∼10%). With AsCNAR analysis, the difference in the frequency of 9p UPD becomes more conspicuous; nearly all PV cases (11/11) and IMF cases (9/10) with a JAK2 mutation had one or more UPD components or other gains of 9p material, whereas only 3 of the 11 JAK2 mutation-positive ET cases carried a 9p UPD component or gain of 9p (P=1.3×10-4, by Fisher’s exact test).

Figure  5.
Estimation of tumor populations carrying 9p UPD and the JAK2 mutation in MPD samples. The populations of 9p UPD–positive components in the 53 MPD cases were estimated by calculation of the mean difference of AsCNs within the UPD regions. Heterozygous ...
Table 2.
AI of 9p in JAK2 Mutation-Positive MPDs[Note]


The robustness of the AsCNAR method lies in its capacity to measure accurately allele dosage and thereby to detect LOH even in the presence of significant normal cell components, which often occurs in primary tumor samples. In principle, an accurate LOH determination is accomplished only by demonstrating an absolute loss of one parental allele, not simply by detecting AI with conventional allele-measurement techniques. This is especially the case for contaminated samples, where it is essentially impossible to discriminate the origin of the remaining minor-allele component (i.e., differentiating normal cells and tumor cells).1,3 Nevertheless, and paradoxically, it is these normal cells within the tumor samples that enable determination of AsCNs in AsCNAR. It computes AsCNs on the basis of the strength of heterozygous SNP calls produced from the “contaminated” normal component, which effectively works as “an internal reference,” precluding the need for preparing a paired germline reference. It far outperforms the SNP call–based LOH-inference algorithms and other methods and definitively determines the state of LOH by sensing CN loss of one parental allele.

In the previously published algorithms, AsCN analysis was enabled by fitting observed array data to a model constructed from a fixed data set from normal samples.18,21 However, the model that explicitly assumes integer CNs fails to cope with primary tumor samples that contain varying degrees of normal cell components (PLASQ)18 (fig. 2). Another algorithm (CARAT) requires a large number of references to construct a model by which AsCNs are predicted, but such a model may not necessarily be properly applied to predict AsCNs for the newly processed samples, if the experimental condition for those samples is significantly different from that for the reference samples, which were used to construct the model (fig. 6 and data not shown).21 Signal ratios between array data from very different experiments could be strongly biased, to the extent that they can no more be properly compensated by conventional regressions. In contrast, AsCNAR uses just a small number of references simultaneously processed with tumor specimens, to minimize difference in experimental conditions between tumor and references, which act as excellent controls in calculating AsCNs, although references analyzed in short intervals also work satisfactorily (data not shown).

Figure  6.
Effects of the use of the different reference sets on signal-to-noise ratios (S/N) in CN analysis. The same DNA sample, containing 30% tumor (NCI-H2171) content, was analyzed on the 50K Xba SNP array in two different experiments by use of the identical ...

The CN analysis software for the Illumina array provides allele frequencies, as well as CNs, by use of a model-based approach, and, as such, it enables AsCN analysis but seems to be less sensitive for detection of AIs.26 AsCNAR can be easily adapted to other Affymetrix arrays, including 10K and 500K arrays, and may be potentially applied to Illumina arrays.

The probability of finding at least one concordant SNP between a tumor sample and a set of anonymous references is enough with five references, but use of just one reference provides almost an equivalent AsCN profile to that obtained with its paired reference (fig. 7). The sensitivity and specificity of LOH detection with this algorithm are excellent, even in the presence of significant degrees of normal cell components (∼70%–80%), which circumvent the need for purifying the tumor components for analysis—for example, by time-consuming microdissection.

Figure  7.
CN profile obtained with the use of a varying number of anonymous references. NCI-H2171 was analyzed with either one (A), three (B), or five (C) anonymous references, as well as its paired LCL (NCI-BL2171) (D) by use of the AsCNAR algorithm. Even though ...

Because the AsCNAR algorithm is quite simple, it requires much less computing power and time (several seconds per sample on average laptop computers) than do model-based algorithms. For example, with PLASQ, it takes overnight for model construction and an additional hour for processing each sample.

The high sensitivity of LOH detection by AsCNAR has been validated not only by the analysis of tumor DNA intentionally mixed with normal DNA but also by the analysis of primary leukemia samples. It unveiled otherwise undetected, minor UPD-positive populations within leukemia samples. Especially, the extremely high frequency of 9p UPD or gains of 9p in particular types of JAK2 mutation–positive MPDs, as well as multiple UPD-positive subclones in some cases, demonstrated how strongly and efficiently a genetic change (point mutation) works to fix the next alteration (mitotic recombination) in the tumor population during clonal evolution in human cancer. Finally, the conspicuous difference in UPD frequency among different MPD subtypes (PV and IMF vs. ET) is noteworthy. This is supported by a recent report that demonstrated the presence of minor subclones carrying exclusively the mutated JAK2 allele in all PV samples, but in none of the ET samples, by examining a large number of erythroid burst-forming units and Epo-independent erythroid colonies for JAK2 mutation.27 Our observation also supports their hypothesis that the biological behavior of these prototypic stem-cell disorders with a continuous disease spectrum could be determined by the components with either homozygous or duplicated JAK2 mutations.

In conclusion, the AsCNAR with use of high-density oligonucleotide microarrays is a robust method of genomewide analysis of allelic changes in cancer genomes and provides an invaluable clue to the understanding of the genetic basis of human cancers. The AsCNAR algorithm is freely available on our CNAG Web site for academic users.


This work was supported by Research on Measures for Intractable Diseases, Health and Labor Sciences Research Grants, Ministry of Health, Labor and Welfare, by Research on Health Sciences focusing on Drug Innovation, by the Japan Health Sciences Foundation, by Core Research for Evolutional Science and Technology, Japan Science and Technology Agency, and by Japan Leukemia Research Fund.

Appendix A: AsCNAR

Quadratic Regression

The log2 signal-ratio, equation M10 is regressed by the quadratic terms (the length [Li] and the GC content [Mi] of the PCR fragment of the ith SNP) as

equation image

where ɛi is the error term and the coefficients of regressions α, β, χ, δ, and γ are dependent on the reference used and are determined to minimize the residual sum of squares (i.e., equation M11). Note that the sum is taken for those SNPs that have concordant SNP calls between the tumor and the reference samples.

We suppose that both allele A DNA and allele B DNA follow the same PCR kinetics, and allele-specific ratios RrefIA,i and RrefIB,i, respectively, can be regressed by the same parameters, as

equation image


equation image

and the corrected total CN ratio is

equation image

Averaging over the References of Concordance SNPs

Concordant reference sets CKi and CK,heteroi for each SNP Si for a given set of references, K, are defined as follows:

equation image

and the averaged CN ratio, equation M12, is provided by

equation image

where “#” denotes the number of the elements of the set. Similarly, AsCN ratios are obtained by

equation image

Exceptional Handling with Regions of Homozygous Deletion, High Amplification, and LOH

To prevent SNPs within the regions that show homozygous deletion or high-grade amplification from being analyzed as “homozygous SNPs,” a homozygous SNP Si in the tumor sample is redefined as a heterozygous SNP with equation M13, if equation M14 or equation M15, where equation M16 and equation M17 are calculated supposing SNP Si is heterozygous. These cutoff values (0.1 and −0.1) are determined by receiver operating characteristic (ROC) curve for detection of gain of the larger allele and loss of the smaller allele in a sample containing 20% tumor cells (data not shown). In addition, SNPs within inferred LOH regions are also analyzed as “heterozygous” SNPs.

Reference Selection

The optimized set of references is selected that minimizes the SD of total CN at the diploid region D,

equation image

To do this, instead of testing all possible 2N combinations of N references, we calculate SDK(D) for individual references equation M18 to order the references such that SD1(D)⩽…⩽SDs(D)⩽SDs+1(D)⩽…⩽SDN(D), where 1, 2, 3,…,s, s+1,…, N denotes the ordered references. The optimal set equation M19 is determined by choosing N0 that satisfies SDK(1)(D)⩾…⩾SDK(N0)(D)<SDK(N0+1)(D).

Note that, in principle, a diploid region cannot be unequivocally determined without doing single-cell–based analysis—for example, FISH or cytogenetics. Otherwise, a diploid region is empirically determined by setting the CN-minimal regions with no AI as diploid, which provides correct estimation of the ploidy in most cases (data not shown).

Appendix B

Table B1. 

PCR Primers and Conditions for STR PCR and Allele-Specific PCR

Primer Sequence
STR/SNPForwardReverse 1Reverse 2
aConditions were 400 nM each of primers, 1.5mM MgCl2 with PCR cycles of 94°C for 3 min, followed by 30 cycles of 94°C /30sec,54°C /30sec,72°C /30sec,and final extension of 72°C for 7 min.
bConditions were 45nM forward primer and 22.5 nM each of the reverse primers, 1.5mM MgCl2 with PCR cycles of 94°C for 3 min, followed by 30 cycles of 94°C /30sec,61°C /30sec,72°C /30sec,and final extension of 72°C for 7 min.

Appendix C

Inference of LOH Based on Heterozygous SNP Calls

For a given contiguous region Ωi,j between the ith and jth SNPs (ij) and for the complete set of observed SNP calls therein, Oi,j), consider the log likelihood ratio

equation image

where the ratio is taken between the conditional probabilities that the current observation, Oi,j), is obtained under the assumption that Oi,j) belongs to LOH or not. We assume a constant miscall rate (q=0.001) for all SNP and use the conditional probability that the kth SNP is heterozygous (hk), depending on the observed k−1th SNP call, for partially taking the effect of linkage disequilibrium into account:

equation image

where hk is calculated using the data from the 96 normal Japanese individuals, whereas Ok takes either 1 or 0, depending on the kth SNP call, with 1 for a homozygous call and 0 for a heterozygous call. For each chromosome, a set of regions, ΩIn,Jn(Jn-1<InJn,J0=0) (n=1,2,3,…), can be uniquely determined as follows.

Beginning with the SNP at the short arm end (S0), find the SNP SIn that satisfies ZIn,In)>0 and Zi,i)⩽0 for Jn-1<∀i<In (fig. C1A). Identify the SNP SJ+, such that ZIn,j)>0 for In⩽∀jJ+ and ZIn,J++1)⩽0, or that SJ+ is the end of the chromosome (fig. C1B). Then, put Jn as equation M20 (fig. C1C). This procedure is iteratively performed, beginning the next iteration with the SNP SJn+1, until it reaches to the end of the long arm, generating a set of nonoverlapping regions, ΩI1,J1I2,J2I3,J3,…ΩIn,Jn,…. LOH inference is now enabled by testing each ZIn,Jn) against a threshold (25), which is arbitrarily determined from the ROC curve for LOH determination on a DNA sample from a lung cancer cell line, NCI-H2171 (fig. C1D). This algorithm is implemented in our CNAG program, which is available at our Web site.

Figure  C1.
Inference of LOH on the basis of heterozygous SNP calls. A–C, The schema of determination of LOH blocks in inference of LOH. D, ROC curve for LOH determination. The sensitivity and specificity of LOH detection for pure tumor specimens were plotted ...

Appendix D

Figure D1. 

An external file that holds a picture, illustration, etc.
Object name is AJHGv81p114fg10.jpg

Expected concordance rate of SNP calls between normal samples. In the AsCNAR algorithm, SNP-specific signals of each SNP in a tumor sample were compared with those in reference samples that had a SNP call identical to that of the tumor sample. The probability of finding such concordant SNPs between a given tumor sample and a set of references was estimated as the function of the number of reference samples, by use of genotyping data from the 96 normal individuals. To do this, the latter were first divided into a test set and a reference set, each consisting of 48 individuals. Then, for each individual from the test set, the number of those SNP loci was enumerated that were identical to one or more SNPs within i references randomly selected from the reference set (i=1,2,3,…,10). No-call SNPs in test samples were excluded from the enumerations. The concordance rates were expressed as the mean ± SD for the 48 test samples. The concordance rate was separately estimated for heterozygous (hetero call) SNPs and for all SNPs in 50K Xba and 50K Hind arrays.

Appendix E

Algorithm for Detection of AI With or Without LOH

The regions with AI are inferred from the AsCN data by use of an HMM, where the real state of AI (a hidden state) is inferred from the observed states of difference in AsCNs of the two parental alleles, which are expressed as dichotomous values (“preset” or “absent”) according to a threshold (μ). The emission probabilities at the ith SNP locus (Si) are

equation image


equation image

(see also the “Material and Methods” section and appendix A for calculation of equation M21 and equation M22).

The parameters (μ, α, and β) are determined by the results of 10%, 20%, and 30% tumor samples. Sensitivity and specificity are calculated with varying threshold (μ), where sensitivity is defined as the ratio of detected SNPs of UPD region detected in the 100% tumor sample, specificity is defined as the ratio of nondetected SNPs in normal samples, and α and β parameters are determined from mixed tumor-sample data for each threshold value. Sensitivity and specificity are relatively stable and are within the acceptable range when the threshold is between 0.05 and 0.15 in 20% and 30% tumor samples (fig. E1A and E1B). We used 0.12, 0.17, and 0.06 for μ, α, and β, respectively, on the basis of 20% tumor-sample data.

Figure  E1.
Sensitivity and specificity for determination of AI, LOH, and UPD. The sensitivity and specificity of detection of AI (A and B), LOH (i.e., decrease of the smaller allele in AI region) (C and D), and UPD (i.e., increase of the larger allele in LOH region) ...

Considering that UPD is caused by a process similar to recombination, the Kosambi’s map function (1/2)tanh(2θ) is used for transition probability, where θ is the distance between two SNPs, expressed in cM units; for simplicity, 1 cM should be 1 Mbp. Thus, the most likely underlying, hidden, real states of AI are calculated for each SNP according to Vitervi’s method, by which AI-positive regions are defined by contiguous SNPs with “present” AI calls flanked by either chromosomal end or an “absent” AI call. Next, to determine the LOH status for each AI-positive region (Γ), AsCN states at each SNP locus within Γ are inferred as “reduced (R)” and “not reduced (equation M23)” for the smaller AsCNs, and “increased (I)” and “not increased (equation M24)” for the larger AsCNs, using similar HMMs from the “observed CN states” of the smaller and the larger AsCNs, which are expressed as dichotomous values according to thresholds μS and μL, respectively. The emission probabilities of these models are

equation image


equation image

These parameters (μS, αS, βS, μL, αL, and βL) are determined by evaluating sensitivities and specificities of the results for 10%, 20%, and 30% tumor samples, where sensitivities and specificities are calculated the same way as was AI. Sensitivity and specificity are relatively stable for μS between −0.03 and −0.13 and are relatively stable for μL between 0.04 and 0.09 in 20% and 30% tumor samples (fig. E1CE1F). We employed μS=-0.1, αS=0.3, βS=0.26, μL=0.08, αL=0.27, and βL=0.31 on the basis of the data for 20% tumor content.

Web Resources

The URLs for data presented herein are as follows:

BACPAC Resources Center, http://bacpac.chori.org/
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/ (for JAK2, AML, PV, ET, and IMF)


1. Mei R, Galipeau PC, Prass C, Berno A, Ghandour G, Patil N, Wolff RK, Chee MS, Reid BJ, Lockhart DJ (2000) Genome-wide detection of allelic imbalance using human SNPs and high-density DNA arrays. Genome Res 10:1126–1137 [PMC free article] [PubMed] [Cross Ref]10.1101/gr.10.8.1126
2. Horvath A, Boikos S, Giatzakis C, Robinson-White A, Groussin L, Griffin KJ, Stein E, Levine E, Delimpasi G, Hsiao HP, et al (2006) A genome-wide scan identifies mutations in the gene encoding phosphodiesterase 11A4 (PDE11A) in individuals with adrenocortical hyperplasia. Nat Genet 38:794–800 [PubMed] [Cross Ref]10.1038/ng1809
3. Lindblad-Toh K, Tanenbaum DM, Daly MJ, Winchester E, Lui WO, Villapakkam A, Stanton SE, Larsson C, Hudson TJ, Johnson BE, et al (2000) Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. Nat Biotechnol 18:1001–1005 [PubMed] [Cross Ref]10.1038/79269
4. Knudson AG (2001) Two genetic hits (more or less) to cancer. Nat Rev Cancer 1:157–162 [PubMed] [Cross Ref]10.1038/35101031
5. Baxter EJ, Scott LM, Campbell PJ, East C, Fourouclas N, Swanton S, Vassiliou GS, Bench AJ, Boyd EM, Curtin N, et al (2005) Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders. Lancet 365:1054–1061 [PubMed]
6. James C, Ugo V, Le Couedic JP, Staerk J, Delhommeau F, Lacout C, Garcon L, Raslova H, Berger R, Bennaceur-Griscelli A, et al (2005) A unique clonal JAK2 mutation leading to constitutive signalling causes polycythaemia vera. Nature 434:1144–1148 [PubMed] [Cross Ref]10.1038/nature03546
7. Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, Tichelli A, Cazzola M, Skoda RC (2005) A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl J Med 352:1779–1790 [PubMed] [Cross Ref]10.1056/NEJMoa051113
8. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, Boggon TJ, Wlodarska I, Clark JJ, Moore S, et al (2005) Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer Cell 7:387–397 [PubMed] [Cross Ref]10.1016/j.ccr.2005.03.023
9. Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, et al (2003) Large-scale genotyping of complex DNA. Nat Biotechnol 21:1233–1237 [PubMed] [Cross Ref]10.1038/nbt869
10. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, Minna J, Christiani D, Leo C, et al (2004) An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res 64:3060–3071 [PubMed] [Cross Ref]10.1158/0008-5472.CAN-03-3308
11. Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, Futreal PA, Wooster R, Jones KW, Shapero MH (2004) Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics 1:287–299 [PMC free article] [PubMed]
12. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, et al (2004) High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res 14:287–295 [PMC free article] [PubMed] [Cross Ref]10.1101/gr.2012304
13. Wang ZC, Buraimoh A, Iglehart JD,Richardson AL (2006) Genome-wide analysis for loss of heterozygosity in primary and recurrent phyllodes tumor and fibroadenoma of breast using single nucleotide polymorphism arrays. Breast Cancer Res Treat 97:301–309 [PubMed] [Cross Ref]10.1007/s10549-005-9124-5
14. Zhou X, Mok SC, Chen Z, Li Y, Wong DT (2004) Concurrent analysis of loss of heterozygosity (LOH) and copy number abnormality (CNA) for oral premalignancy progression using the Affymetrix 10K SNP mapping array. Hum Genet 115:327–330 [PubMed] [Cross Ref]10.1007/s00439-004-1163-1
15. Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E, Law J, Berntsen T, Chadha M, Hui H, et al (2004) Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods 1:109–111 [PubMed] [Cross Ref]10.1038/nmeth718
16. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, Kurokawa M, Chiba S, Bailey DK, Kennedy GC, et al (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res 65:6071–6079 [PubMed] [Cross Ref]10.1158/0008-5472.CAN-05-0465
17. Beroukhim R, Lin M, Park Y, Hao K, Zhao X, Garraway LA, Fox EA, Hochberg EP, Mellinghoff IK, Hofer MD, et al (2006) Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide SNP arrays. PLoS Comput Biol 2:e41 [PMC free article] [PubMed] [Cross Ref]10.1371/journal.pcbi.0020041
18. Laframboise T, Harrington D, Weir BA (2007) PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics 8:323–336 [PubMed] [Cross Ref]10.1093/biostatistics/kxl012
19. Kralovics R, Teo SS, Li S, Theocharides A, Buser AS, Tichelli A, Skoda RC (2006) Acquisition of the V617F mutation of JAK2 is a late genetic event in a subset of patients with myeloproliferative disorders. Blood 108:1377–1380 [PubMed] [Cross Ref]10.1182/blood-2005-11-009605
20. Wang L, Ogawa S, Hangaishi A, Qiao Y, Hosoya N, Nanya Y, Ohyashiki K, Mizoguchi H, Hirai H (2003) Molecular characterization of the recurrent unbalanced translocation der(1;7)(q10;p10). Blood 102:2597–2604 [PubMed] [Cross Ref]10.1182/blood-2003-01-0031
21. Huang J, Wei W, Chen J, Zhang J, Liu G, Di X, Mei R, Ishikawa S, Aburatani H, Jones KW, et al (2006) CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics 7:83 [PMC free article] [PubMed] [Cross Ref]10.1186/1471-2105-7-83
22. Dugad R, Desai U (1996) A tutorial on hidden Markov models. Technical report SPANN-96.1. Signal Processing and Artificial Neural Networks Laboratory, Bombay, India
23. Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, Lister TA,Young BD (2005) Genome-wide single nucleotide polymorphism analysis reveals frequent partial uniparental disomy due to somatic recombination in acute myeloid leukemias. Cancer Res 65:375–378 [PubMed]
24. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, Lillington D, Lister TA,Young BD (2005) Association between acquired uniparental disomy and homozygous gene mutation in acute myeloid leukemias. Cancer Res 65:9152–9154 [PubMed] [Cross Ref]10.1158/0008-5472.CAN-05-2017
25. Najfeld V, Montella L, Scalise A,Fruchtman S (2002) Exploring polycythaemia vera with fluorescence in situ hybridization: additional cryptic 9p is the most frequent abnormality detected. Br J Haematol 119:558–566 [PubMed] [Cross Ref]10.1046/j.1365-2141.2002.03763.x
26. Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, Haden K, Li J, Shaw CA, Belmont J, et al (2006) High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 16:1136–1148 [PMC free article] [PubMed] [Cross Ref]10.1101/gr.5402306
27. Scott LM, Scott MA, Campbell PJ,Green AR (2006) Progenitors homozygous for the V617F mutation occur in most patients with polycythemia vera, but not essential thrombocythemia. Blood 108:2435–2437 [PubMed] [Cross Ref]10.1182/blood-2006-04-018259

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • SNP
    Nucleotide polymorphism records from dbSNP that have current articles as submitter-provided references.
  • Taxonomy
    Taxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...