![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||
A Sparse Marker Extension Tree Algorithm for Selecting the Best Set of Haplotype Tagging Single Nucleotide Polymorphisms 1 Department of Biostatistics, Harvard School of Public Health, Boston, MA 2 Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 3 Department of Epidemiology, Harvard School of Public Health, Boston, MA Address correspondence and reprint requests to: Dr. Tianhua Niu Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School 900 Commonwealth Ave., Boston, MA 02215 Phone: (617) 278−0860; Fax: (617) 731−3843; E-mail: tniu/at/hsph.harvard.edu The publisher's final edited version of this article is available at Genet Epidemiol. See other articles in PMC that cite the published article.Abstract Single nucleotide polymorphisms (SNPs) play a central role in the identification of susceptibility genes for common diseases. Recent empirical studies on human genome have revealed block-like structures, and each block contains a set of haplotype tagging SNPs (htSNPs) that capture a large fraction of the haplotype diversity. Herein, we present an innovative sparse marker extension tree (SMET) algorithm to select optimal htSNP set(s). SMET reduces the search space considerably (compared to full enumeration strategy), therefore improves computing efficiency. We tested this algorithm on several datasets at three different genomic scales: (1) gene-wide (NOS3, CRP, IL6 PPARA, and TNF), (2) region-wide (a Whitehead Institute's inflammatory bowel disease dataset and a UK Graves’ disease dataset), and (3) chromosome-wide (chromosome 22) levels. SMET offers geneticists with greater flexibilities in SNP tagging than lossless methods with adjustable haplotype diversity coverage (ϕ). In simulation studies, we found that (1) an initial sample size of 50 individuals (100 chromosomes) or more is needed for htSNP selection; (2) the SNP tagging strategy is considerably more efficient when the underlying block structure is taken into account; and (3) htSNP sets at 80−90% ϕ are more cost-effective than the lossless sets in term of statistical power, relative risk ratio estimation, and genotyping efforts. Our study suggests that the novel SMET algorithm is a valuable tool for association tests. Keywords: Single Nucleotide Polymorphism, Haplotype, Entropy, Heterozygosity, Tagging, Association Study Introduction Surveys of variations at different genomic scales have revealed block-like patterns of haplotype structures in European, Asian, and African populations. Examples include chromosomes 5q31 [Daly et al., 2001], 6 in the MHC class II region [Jeffreys et al., 2001], 19 [Phillips et al., 2003], 21 [Patil et al., 2001], 22 [Dunning et al., 2000], and 51 autosomal regions spanning 13.4 Mb [Gabriel et al., 2002]. This has motivated the launch of the HapMap project to assess the haplotypic structure of the entire human genome. Once haplotype block structure is deciphered, a small fraction of SNPs – called “haplotype tagging” SNPs (htSNPs) -- can be used to represent a large fraction of the haplotype diversity. Various testing schemes, including haplotype-based and multiple-marker methods, have been proposed to utilize htSNPs in capturing genetic association [Akey et al., 2002; Clark, 2004; Roeder et al., 2005; Schaid, 2004]. In order to identify the best htSNP set, we first need to nail down a proper measure of “haplotype diversity”. Several metrics, such as In genetic association studies, it is of great interest to determine the best threshold of haplotype diversity coverage, denoted as ϕ. In particular, it is critical to ascertain how much power is lost and how much bias is introduced in estimating the disease risk ratio when the htSNP set only captures 80−90% (as opposed to 100%) of the haplotype diversity. To address these issues, we conducted both simulation and empirical studies using a new algorithm, sparse marker extension tree (SMET), which allows for flexibility in selecting the optimal htSNP set at any given haplotype diversity coverage. We also assessed the impacts of sample size on panel development and the utilization of haplotype block structures. Materials and Methods Definition of a minimum htSNP set Let K denote the number of the original SNPs and W the total number of distinct haplotypes formed by the original SNPs. Let ϕN denote the haplotype diversity coverage captured by the N (K≥N) candidate htSNPs. The selection of minimum htSNP set for a pre-defined threshold, ϕT, requires identification of the minimum of all N that satisfy ϕN ≥ ϕT. We calculate ϕN using either E or H. (1) Heterozygosity, H, is a classic statistic measuring genetic marker diversity, and the H of all K SNPs is defined as:
Reconstructing haplotype phase and partitioning haplotype blocks For reconstructing haplotype phases of unrelated subjects, we applied the partition-ligation Gibbs sampling algorithm [Niu et al., 2002]. For pedigree data, we used the transmission disequilibrium test function of GeneHunter 2.0 [Daly et al., 2001] and Clark's method [Clark, 1990]. We did not attempt to perform block partitioning for gene-wide datasets, because the physical lengths (<300 kb) and the number of SNPs for each gene are relatively small. For Whitehead Institute's IBD dataset, we used haplotype blocks from the original report [Daly et al., 2001]. For the UK Graves’ disease dataset and the chromosome 22 dataset, we applied block partitioning algorithms based on the confidence intervals of D’ [Gabriel et al., 2002; Schulze et al., 2004]. Specifically, we calculated 95% confidence bounds for all pairwise D’ using a bootstrap resampling technique [Gabriel et al., 2002]. Two criteria were used: (1) for the SNP pair comprising both the left-most and right-most SNPs, the D’ needs to have a 95% upper bound (ub) > 0.95 and a 95% lower bound (lb) > 0.7; and (2) among all the possible SNP pairs within the block, the fraction of SNP pairs with both ub >0.95 and lb >0.50 should be >80%. Sparse Marker Extension Tree (SMET) algorithm For any initial set of K SNPs (K≥2) and a given threshold ϕT, our algorithm selects the minimum htSNP set(s). As described above, ϕN can be defined on the basis of either H or E. Our algorithm follows a stepwise forward selection procedure that begins with all possible pairs of SNPs and sequentially adds SNPs until the minimum htSNP set(s) is found (i.e. when ϕN exceeds threshold ϕT). Figure 1
The essence of SMET is that all calculations are made only for nonsingular nodes, resulting in a significant reduction in computational cost when K is large. Below we describe the algorithmic details of SMET: Let Ak denote the nonsingular nodes at the kth level of the tree, and ai denote the member sets of Ak. S denotes the original K SNPs. Also, amax,k denotes the node(s) carrying the largest haplotype diversity coverage among all ai. The SMET can be written as, While (ϕ (amax,k) < ϕT) { For each tree node add each SNP mj S\ai individually to ai, such that if } k ← k + 1; } return amax,k. Empirical Datasets We compared the performance of our algorithm with the BEST algorithm on datasets that represent different genomic scales.
Evaluation based on simulations (1) Assessing the impact of the sample size of the developing panel on htSNP selection adequacy Researchers often genotype an initial grid of densely spaced SNPs on a developing panel with a sample size - Nd, followed by haplotype block partitioning and htSNP selection; afterwards, only those htSNPs (derived from the developing panel) are genotyped in large samples. Therefore, it is critical to evaluate the impact of Nd on the efficiency of htSNP selection. We generated our simulated datasets based on (1) the TNF haplotypes in the Gambian sample [Ackerman et al., 2003] and (2) the UK Graves’ disease dataset [Ueda et al., 2003]. In each dataset, we randomly sampled an Nd of (1) 25 and (2) 50 individuals as our developing panel, based upon which we identified the respective minimal htSNP sets using SMET, then evaluated the haplotype diversity coverage of these htSNP sets on simulated large cohorts, comprising 324 chromosomes (i.e. 162 individuals) derived from the TNF Gambian dataset and 324 chromosomes (i.e. 162 individuals) from the UK Graves’ disease dataset, respectively. Specifically, in each resampling-based dataset, we resampled 50 or 100 haplotypes from this “haplotype pool” [e.g. the”haplotype pool” of the UK Graves’ disease dataset (N=652) contained 1,304 chromosomes] to serve as the “development panel” in SNP tagging. For each parameter set, we conduct 1,000 resamplings to ensure reliable inferences. Also, the resampling was performed within individual haplotype blocks. (2) Assessing the dependency of htSNP selection efficiency on a genuinely LD-based haplotype block structure The efficiency of htSNP selection using “haplotype-block-tagging” approach depends on whether the underlying haplotype block structure is truly based on LD [Stram et al., 2003]. To evaluate the extent of this dependency, we devised a simulation scheme similar to that of Lin et al. [2004]. In brief, suppose n SNPs are located linearly on the chromosome, denoted as SNP1 to SNPn from 5’ → 3’. A haplotype block can be denoted as an interval [j, k], where n ≥ k > j ≥ 1, so that jth through kth SNPs form a haplotype block. We reshuffle the order of these SNPs by moving the first t (t is a randomly generated integer) SNPs to the end of the list, which results in a new list: SNPt+1, SNPt+2,..., SNPn, SNP1,..., SNPt. By still treating jth through kth SNPs in the new list as a haplotype block, we generated arbitrarily-defined, non-LD based pseudo-blocks without destroying the “local” LD relations of neighboring SNPs (except for SNPt). A total of 1,000 simulations were conducted for each set of parameters. (3) Assessing the impact of genetic diversity on htSNP selection efficiency We used the coalescent model to simulate data to evaluate the impact of genetic diversity (haplotypic diversity). The program HUDSON [Schierup and Hein, 2000a; Schierup and Hein, 2000b] was applied. In our simulations, we specified the population mutation rate (θ = 4Nμ) as 0.05 and the population recombination rate (ρ = 4Nν) as 1.0, where N designates the effective population size, and μ and ν designate the per-locus mutation and recombination rates per generation [Wall and Hudson, 2001], respectively. We simulated two populations (denoted as Pop I and Pop II, respectively), and their only difference was that Pop II had a longer (two times) evolution history than Pop I. Therefore, Pop II would possess more historical recombinations and a higher genetic diversity than Pop I. A total of 500 pairs (i.e. for Pop I and Pop II respectively) of simulated datasets were generated, and in each pair of simulated datasets, 50 individuals from each population were sampled for SNP tagging. (4) Statistical power and relative risk ratio estimation in haplotype association tests using htSNPs at various ϕ The choice of ϕ may affect the power of haplotype-based association tests. We investigated two datasets: (1) TNF dataset in the Gambian sample [Ackerman et al., 2003] and (2) the 5th, 8th and 9th block in UK Graves’ disease dataset. We used these datasets because of their relatively large sample sizes and SNP numbers, and we only considered the most frequent haplotype in each dataset as the disease haplotype. The power was evaluated through simulations. To assess the impacts of different disease models using different parameter settings, we performed the following simulations. We set the number of cases and number of controls each to be 200, respectively, and specified a disease-susceptible haplotype with a penetrance of 0.1, 0.2, and 0.4 for subjects carrying 0, 1, and 2 susceptible haplotypes, respectively. For each simulation dataset, we (1) generated the data by a random sampling from each respective real dataset and by assigning the disease status to each individual according to that individual's haplotype configuration and penetrance, (2) identified htSNPs using Nd = 50 at ϕ of 80%, 90% and 100% levels, (3) conducted association tests in a contingency table setting, which compares the haplotype frequencies between cases and controls, and (4) estimated power at α (Type I error) = 0.0001. We generated 5,000 simulation datasets for each set of parameters. Results Empirical Studies We compared the performance of SMET with BEST at various genomic scales. Gene-wide (< 300 kb) scale (a) SeattleSNPs databsets We studied a total of 40, 17, 16, and 45 SNPs on NOS3, CRP, IL6, and PPARA, respectively (Table I). The analysis included all haplotypes that appeared at least twice in the dataset. As shown in Table I, based on either E or H, an htSNP set containing 6−9 SNPs at ϕ = 100%. Based on E alone, an htSNP set containing 4−6 SNPs at ϕ = 90% (please note, according to our studies using both simulated and real data sets, H is not a sensitive measure of haplotype diversity coverage, and we recommend the use of E).
(b) TNF dataset Prior to the analysis, we removed 4 SNPs with MAF <10%, as well as rare haplotypes that appeared only once. When the study sample size was large (N=212; the Gambian sample), all SNPs were needed to from the lossless (i.e. ϕ=100%) htSNP set. When the study sample size was small (N=84; the Malawi sample), 6 out of 8 SNPs were needed. To achieve ϕ = 90%, the minimal htSNP set included only 3−5 of the 8 SNPs (Table I). Region-wide (300 − 500 kb) scale (a) Whitehead Institute IBD dataset Among the 103 SNPs distributed on a 500 kb region of 5q31, 99 SNPs were located within 11 haplotype blocks, and 4 SNPs did not belong to any blocks. Infrequent haplotypes, which appeared only once in the sample, were excluded prior to analysis. Based on E, we found that to achieve ϕ = 100%, 29−100% SNPs were needed. 16.3−80.0% SNPs were needed at ϕ = 90% (Figure 2
(b) UK Graves’ disease dataset Among the 108 SNPs distributed over a 317 kb region containing CD28, CTLA-4, and ICOS genes, a total of 107 frequent SNPs (MAF ≥ 10%) were used. By applying D’-based block partitioning [Gabriel et al., 2002], 10 haplotype blocks were revealed. The size and the average D’ value for each block are shown in Figure 3
Combining the gene- and region-wide SNP datasets, we plotted ϕ as a function of htSNP ratio (defined as the size of the htSNP set/the size of the lossless htSNP set), based on either E or H (Figure 4
Chromosome-wide (33.4 Mb) scale By applying the D’-based block partitioning method [Gabriel et al., 2002], a total of 150 blocks were revealed for 656 SNPs in the chromosome 22 dataset. The distributions of the physical sizes of the haplotype blocks are shown in Figure 5
For all the above datasets, we also used the program BEST version 1.0, and confirmed that the lossless htSNP sets identified by SMET were identical to those identified by BEST. Simulation Studies Impact of Nd on htSNP selection adequacy Figure 6
Efficiency comparison of htSNP selection based on a genuinely LD-based block structure vs. an arbitrarily-defined pseudo-block structure We used the 5 largest blocks (size > 10) in the UK Graves’ disease dataset in our comparison experiment. The lossless (ϕ = 100%) htSNP set contained an average of 5.04 SNPs/block chosen by the SMET algorithm based on a predefined genuinely LD-based block structure, which is smaller than the lossless (ϕ = 100%) htSNP set (5.51 SNPs/block) chosen by the SMET algorithm based on an arbitrarily-defined pseudo-block structure, assuming that they share exactly the same “physical” haplotype block structure (i.e. the same width for each respective block and the same block order) (Figure 7
Impact of genetic diversity on htSNP selection efficiency Two populations, denoted as Pop I and Pop II respectively, were simulated using the HUDSON program. The only difference between Pop I and Pop II was that Pop II had a two times longer evolution history than Pop I. Therefore, Pop II would possess more historical recombinations and hence a higher genetic diversity than Pop I. A total of 500 pairs (i.e. for Pop I and Pop II respectively) of simulated datasets were generated, and in each pair of simulated datasets, 50 individuals from each population were sampled for SNP tagging. On average, Pop II required 1.51 times more computational steps than Pop I. Impact of haplotype diversity coverage on statistical power and relative risk ratio estimation The choice of ϕ had small-to-moderate impact on statistical power and relative risk ratio estimation (Figure 8
Discussion In this article, we propose a new sparse tree-based method, SMET, for finding the optimum htSNP set with ϕ exceeding a user-defined threshold ϕT. The idea of SMET is in essence similar to the sparse binary trees used represent patterns of gene flow in pedigrees used by Abecasis et al. [2002]. We applied SMET to empirical datasets at (1) gene-wide, (2) region-wide, and (3) chromosome-wide levels. Our results demonstrate that when ϕ is set to be 100%, the SMET algorithm selects the same htSNPs as the method named “BEST” [Sebastiani et al., 2003], which is a lossless-only method. For example, in the 8th block of the UK Graves’ disease dataset, both SMET and BEST identified the same 25 out of the original 26 SNPs as the lossless htSNP set. However SMET is more flexible than BEST that allows users to impose coverage thresholds. As shown in Figure 4 To identify the optimum htSNP set for a given ϕT, the full enumeration strategy is a straightforward approach, but is computationally very intensive (Discussed by Dr. David Clayton, please see Electronic Database Information Section [1]). The greedy methods [Carlson et al., 2004; Stram et al., 2003] dramatically lighten the computational burden by reducing the number of SNP sets to be considered, which can also accommodate various user-specified ϕ levels. Similar to SMET, these approaches organize the search space into tree structures, but only grow the branch with the largest ϕ at each level of the tree. Therefore, these greedy methods traverse a much smaller subspace than the entire space of the full tree. Although the greedy algorithms can run faster than SMET due to its heuristic nature (note than the SMET algorithm traverses a sparse version of the full tree), their drawbacks include (1) there is no mathematical guarantee that the greedy algorithm can identify the optimal htSNP sets; and (2) when there are multiple optimum htSNP sets, the greedy algorithm may not identify them all. Recently, principle component analysis (PCA) has been applied in assessing multivariate SNP correlations to identify htSNPs [Horne and Camp 2004; Lin and Altman 2004]. However, the PCA algorithm does not directly take haplotypes into account and cannot guarantee finding the optimal sets. In contrast, SMET traverses all non-singular tree nodes (i.e. SNP sets) with the potential to be optimal htSNP set(s), and mathematically guarantees the global optimum. Moreover, SMET can identify multiple optimal htSNP sets if they exist. Recently, Ding et al. [2005] developed a computer program named htSNPer1.0 with a graphical user interface for characterizing the haplotype block structure and for selecting htSNPs. Similar to SMET, htSNPer1.0 first estimates haplotypes within each haplotype block, and based on the estimated haplotypes, htSNPs are selected according to three htSNPs performance criteria (α-percent coverage [Patil et al., 2001], explained proportion of Clayton's haplotype diversity (please see Electronic Database Information Section [2]), and weighted-average haplotype r2 [Weale et al., 2003]) and four haplotype block definitions [chromosome coverage [Weale et al., 2003], average pairwise LD |D’| [Reich et al., 2001], estimated pairwise LD confidence limits [Gabriel et al., 2002] with minor modifications by Wall and Prichard [2003], and no historical recombination [Wang et al., 2002]). The htSNPer1.0 software package takes advantages of a novel tree-based heuristic algorithm called the Generalized Branch-and-Bound (GBB) algorithm to search the minimal htSNPs set. However, the GBB algorithm of htSNPer 1.0 does not use the exactly same ϕ criterion for htSNP selection as we used in SMET. The reduction of computational steps by SMET over the full enumeration strategy becomes more substantial as the search tree depth becomes greater. This is because there is an exponential growth of the number of nodes (and hence the search space) for the exhaustive enumeration method to traverse when tree depth increases linearly. Generally speaking, the tree depth correlates with the haplotype block size (measured by the number of SNPs within the block). Thus, the advantage of SMET algorithm over the full enumeration method was more evident for htSNP selection for large haplotype blocks such as the two largest blocks of the UK Graves’ disease dataset. In comparison with the full enumeration method (e.g. htSNP program, please see Electronic Database Information Section [3]), the factor of computational step reduction by SMET becomes as large as 18 for the largest block in the UK Graves’ disease dataset - Block 8 with 26 SNPs and an average D’ of 0.777. In our study, D’, a measure of LD between pairs of sites [Lewontin 1964], was chosen for defining haplotype blocks because it is directly related to the goal of detecting historical recombination, which is pivotal to the block concept, and because it can be applied directly to unphased diploid data. Two widely-cited previously performed studies - Reich et al. [2001] and Gabriel et al. [2002], used similar D’-based block definitions as that of ours. In particular, our haplotype block definition is based on minor modifications of Gabriel et al. [2002], which appears to perform reasonably well by controlling the random noise inherent in D’. In contrast, the r2 measure of LD is typically used for tagging SNP selection rather than for block partitioning [Ahmadi et al., 2005; Goldstein et al., 2003]. As shown in Figure 4 Figure 6 As shown in Figure 7 We used the program HUDSON [Schierup and Hein, 2000a; Schierup and Hein, 2000b], which is based on the coalescent model, to simulate data to assess the impact of genetic diversity (haplotypic diversity) on htSNP selection efficiency. We found that the Pop II, the population with a longer evolution history and thus a greater genetic diversity, required 1.51 times more computational steps than Pop I. Thus, haplotypic diversity will determine the proportion of non-singular nodes in the dataset, i.e., a highly diverse population (e.g. the African sample) will have fewer redundant SNPs, and therefore the number of nodes traversed by the SMET algorithm will approach the number in the full enumeration approach. What is the cost-effectiveness, in term of statistical power in association study, between htSNP sets of ϕ=80 or 90% and the lossless sets? We found that htSNP sets with a ϕ of 80% or 90% can achieve 87−95% of the statistical power attained by the lossless htSNP sets. More importantly, htSNP sets with ϕ = 80 or 90% incurred relatively insubstantial biases in relative risk ratio estimation (Figure 8 Practically speaking, whether or not haplotype tests are more powerful than single-marker tests remains a highly debatable topic [Akey et al., 2002; Clark, 2004; Roeder et al., 2005; Schaid, 2004]. Using simulated case-control data sets, Nielsen et al. [2004] demonstrated that either approach has its merits: when moderate to high levels of multilocus LD exist, haplotype tests tend to be more powerful, which could be by a very large degree. Single-marker tests tend to prevail when pairwise LD is high (note that the power of the single-marker tests can be seen to rely on pairwise LD of the observed marker with the functional site). If single-marker tests are to be used, Nielsen et al. [2004] pointed out that a multiple-testing adjustment should be applied, which would reduce the power of single-marker tests. Based on standard chi-square statistics, the simulation studies of Akey et al. [2001] showed that the power of haplotype tests is influenced by critical population genetic parameters, such as genetic distances between the observed markers and the causative mutation, maker allele frequency, age of the causal mutation. Given a single founder mutation and the absence of phenocopy, haplotype tests are more powerful and more robust than single-marker tests in case-control studies. An example of the superiority of haplotype tests over single-marker tests is a study of the adenine phosphoribosyltransferase (APRT) deficiency [Kuno et al., 2004]. In single-SNP analyses, even at SNP loci close to the mutation site (APRT*J), no significant results were found; however, the use of haplotypes based on the haplotype block data gave sufficient significance, and thus, haplotype tests based on the haplotype-block structure is more powerful than single-marker analyses for the detection of disease-related loci. Because of the lack of consensus as to which (haplotype vs. single-marker) tests are more powerful in case-control studies, the trade-off between these two types of tests need to be weighed carefully on a case-by-case basis. Taken together, we describe a novel algorithm, SMET, which elegantly achieves (1) flexibility in haplotype diversity coverage, (2) computational efficiency, and (3) mathematical optimum. We applied SMET on various simulated and empirical datasets, and validated this novel algorithm by an existing method – BEST [Sebastiani et al., 2003]. Furthermore, we investigated several important issues related to htSNP selection and association testing, including (1) the impact of Nd on htSNP selection, (2) the impact of a predefined, genuinely LD-based block structure vs. an arbitrarily-defined, pseudo-block structure on SNP tagging, (3) the impact of genetic diversity, and (4) the relationship of haplotype diversity coverage and statistical power. These results, as well as the SMET algorithm, will provide helpful guidance to scientists in choosing their htSNPs and in conducting haplotype-based genetic studies. ACKNOWLEDGMENTS This work is partially supported by NIH grants R01HG02341, R01HG002518, R01DK066401, and R01DK062290. We thank Dr. Cheng Li at Dana Faber Cancer Institute for his careful reading of the manuscript and insightful comments. We thank Affymetrix, Inc. for granting us access to the Affymetrix SNP Array data. We are also grateful to Prof. John Todd and Dr. Neil Walker at The Diabetes and Inflammation Laboratory, Juvenile Diabetes Research Foundation/Wellcome Trust, United Kingdom, for giving us access to their UK Graves’ disease dataset. Footnotes REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||
Nat Genet. 2001 Oct; 29(2):229-32.
[Nat Genet. 2001]Nat Genet. 2001 Oct; 29(2):217-22.
[Nat Genet. 2001]Nat Genet. 2003 Mar; 33(3):382-7.
[Nat Genet. 2003]Science. 2001 Nov 23; 294(5547):1719-23.
[Science. 2001]Am J Hum Genet. 2000 Dec; 67(6):1544-54.
[Am J Hum Genet. 2000]Am J Hum Genet. 2002 Jan; 70(1):157-69.
[Am J Hum Genet. 2002]Nat Genet. 2001 Oct; 29(2):229-32.
[Nat Genet. 2001]Mol Biol Evol. 1990 Mar; 7(2):111-22.
[Mol Biol Evol. 1990]Science. 2002 Jun 21; 296(5576):2225-9.
[Science. 2002]Hum Mol Genet. 2004 Feb 1; 13(3):335-42.
[Hum Mol Genet. 2004]Genome Biol. 2003; 4(4):R24.
[Genome Biol. 2003]Nat Genet. 2001 Oct; 29(2):229-32.
[Nat Genet. 2001]Nature. 2003 May 29; 423(6939):506-11.
[Nature. 2003]Genome Biol. 2003; 4(4):R24.
[Genome Biol. 2003]Nature. 2003 May 29; 423(6939):506-11.
[Nature. 2003]Hum Hered. 2003; 55(1):27-36.
[Hum Hered. 2003]Bioinformatics. 2004 May 22; 20(8):1233-40.
[Bioinformatics. 2004]Genetics. 2000 Oct; 156(2):879-91.
[Genetics. 2000]Mol Biol Evol. 2000 Oct; 17(10):1578-9.
[Mol Biol Evol. 2000]Mol Biol Evol. 2001 Jun; 18(6):1134-5; author reply 1136-8.
[Mol Biol Evol. 2001]Genome Biol. 2003; 4(4):R24.
[Genome Biol. 2003]Science. 2002 Jun 21; 296(5576):2225-9.
[Science. 2002]Science. 2002 Jun 21; 296(5576):2225-9.
[Science. 2002]Proc Natl Acad Sci U S A. 2003 Aug 19; 100(17):9900-5.
[Proc Natl Acad Sci U S A. 2003]Nat Genet. 2002 Jan; 30(1):97-101.
[Nat Genet. 2002]Proc Natl Acad Sci U S A. 2003 Aug 19; 100(17):9900-5.
[Proc Natl Acad Sci U S A. 2003]Am J Hum Genet. 2004 Jan; 74(1):106-20.
[Am J Hum Genet. 2004]Hum Hered. 2003; 55(1):27-36.
[Hum Hered. 2003]Genet Epidemiol. 2004 Jan; 26(1):11-21.
[Genet Epidemiol. 2004]Bioinformatics. 2004 May 22; 20(8):1233-40.
[Bioinformatics. 2004]BMC Bioinformatics. 2005 Mar 1; 6():38.
[BMC Bioinformatics. 2005]Science. 2001 Nov 23; 294(5547):1719-23.
[Science. 2001]Am J Hum Genet. 2003 Sep; 73(3):551-65.
[Am J Hum Genet. 2003]Nature. 2001 May 10; 411(6834):199-204.
[Nature. 2001]Science. 2002 Jun 21; 296(5576):2225-9.
[Science. 2002]Genetics. 1964 Jan; 49(1):49-67.
[Genetics. 1964]Nature. 2001 May 10; 411(6834):199-204.
[Nature. 2001]Science. 2002 Jun 21; 296(5576):2225-9.
[Science. 2002]Nat Genet. 2005 Jan; 37(1):84-9.
[Nat Genet. 2005]Trends Genet. 2003 Nov; 19(11):615-22.
[Trends Genet. 2003]Hum Hered. 2003; 55(1):27-36.
[Hum Hered. 2003]Hum Genet. 2004 Jul; 115(2):157-64.
[Hum Genet. 2004]Am J Hum Genet. 2002 Jan; 70(1):157-69.
[Am J Hum Genet. 2002]Genome Res. 2004 Aug; 14(8):1633-40.
[Genome Res. 2004]Am J Hum Genet. 2002 Jan; 70(1):157-69.
[Am J Hum Genet. 2002]Genet Epidemiol. 2004 Dec; 27(4):334-47.
[Genet Epidemiol. 2004]Nat Genet. 2005 Jan; 37(1):84-9.
[Nat Genet. 2005]Trends Genet. 2003 Nov; 19(11):615-22.
[Trends Genet. 2003]Genetics. 2000 Oct; 156(2):879-91.
[Genetics. 2000]Mol Biol Evol. 2000 Oct; 17(10):1578-9.
[Mol Biol Evol. 2000]Genome Res. 2002 Dec; 12(12):1805-14.
[Genome Res. 2002]Genet Epidemiol. 2004 Dec; 27(4):321-33.
[Genet Epidemiol. 2004]Genet Epidemiol. 2005 Apr; 28(3):207-19.
[Genet Epidemiol. 2005]Genet Epidemiol. 2004 Dec; 27(4):348-64.
[Genet Epidemiol. 2004]Genetics. 2004 Oct; 168(2):1029-40.
[Genetics. 2004]Proc Natl Acad Sci U S A. 2003 Aug 19; 100(17):9900-5.
[Proc Natl Acad Sci U S A. 2003]Genome Biol. 2003; 4(4):R24.
[Genome Biol. 2003]Nat Genet. 2001 Oct; 29(2):229-32.
[Nat Genet. 2001]Nature. 2003 May 29; 423(6939):506-11.
[Nature. 2003]Genome Biol. 2003; 4(4):R24.
[Genome Biol. 2003]Nature. 2003 May 29; 423(6939):506-11.
[Nature. 2003]