- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Minimal haplotype tagging

^{†}Ross Lazarus,

^{‡}

^{§}Scott T. Weiss,

^{‡}

^{§}

^{¶}Louis M. Kunkel,

^{‡}

^{}

^{††}Isaac S. Kohane,

^{‡}

^{¶}

^{‡‡}and Marco F. Ramoni

^{‡}

^{¶}

^{‡‡}

^{§§}

^{†}Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118;

^{‡}Harvard Medical School, Boston, MA 02115;

^{§}Channing Laboratory, Brigham and Women's Hospital, Boston, MA 02115;

^{¶}Harvard Partners Center for Genetics and Genomics, Boston, MA 02115; and

^{}Division of Genetics,

^{††}Howard Hughes Medical Institute, and

^{‡‡}Informatics Program, Children's Hospital, Boston, MA 02115

^{§§}To whom correspondence should be addressed at: Informatics Program, Children's Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115. E-mail: ude.dravrah@inomar_ocram.

**This article has been corrected.**See Proc Natl Acad Sci U S A. 2003 October 14; 100(21): 12526.

## Abstract

The high frequency of single-nucleotide polymorphisms (SNPs) in the human
genome presents an unparalleled opportunity to track down the genetic basis of
common diseases. At the same time, the sheer number of SNPs also makes
unfeasible genomewide disease association studies. The haplotypic nature of
the human genome, however, lends itself to the selection of a parsimonious set
of SNPs, called haplotype tagging SNPs (htSNPs), able to distinguish the
haplotypic variations in a population. Current approaches rely on statistical
analysis of transmission rates to identify htSNPs. In contrast to these
approximate methods, this contribution describes an exact, analytical, and
lossless method, called BEST (Best Enumeration of SNP Tags), able to identify
the *minimum* set of SNPs tagging an arbitrary set of haplotypes from
either pedigree or independent samples. Our results confirm that a small
proportion of SNPs is sufficient to capture the haplotypic variations in a
population and that this proportion decreases exponentially as the haplotype
length increases. We used BEST to tag the haplotypes of 105 genes in an
African-American and a European-American sample. An interesting finding of
this analysis is that the vast majority (95%) of the htSNPs in the
European-American sample is a subset of the htSNPs of the African-American
sample. This result seems to provide further evidence that a severe bottleneck
occurred during the founding of Europe and the conjectured “Out of
Africa” event.

**Keywords:**single-nucleotide polymorphisms, association studies

Single-nucleotide polymorphisms (SNPs) are an invaluable tool to uncover the genetic basis of common diseases (1, 2) by providing a high-resolution map of the genome and allowing researchers to associate variations in a particular genomic region to observable traits. Unfortunately, the sheer number of SNPs in the human genome, which makes SNPs so useful as markers, also makes genomewide association studies unfeasible. However, the number of distinct combinations of SNP alleles (haplotypes) encountered in human samples is a small fraction of the possible haplotypes that would arise if alleles were distributed randomly. This haplotypic structure of the genome lends itself to the selection of a parsimonious set of SNPs, called haplotype tagging SNPs (htSNPs), able to distinguish the haplotypic variations in a population.

Given a set of haplotypes in a genomic region, identified through
statistical (3,
4) or molecular
(5,
6) methods, the process of
haplotype tagging is in principle deterministic. Unfortunately, this problem
is also computationally intractable
(7), because its solution
requires the testing of every possible combination of SNPs in the haplotype
set, and the number of these combinations grows exponentially with the number
of SNPs in the haplotype set. Current approaches rely on approximate methods
to identify htSNPs. Most efforts
(8–12)
have focused on the identification of a secondary haplotype structure across
several large regions of the genome. This substructure comprises regions of
limited recombination, called *haplotype blocks*, bounded by small
regions characterized by higher recombination rates. Within these smaller
regions, htSNPs can be readily identified by eye or by brute-force search. An
alternative approach (13)
searches for htSNPs by maximizing the haplotype diversity
“explained” by a set of SNPs. Using this method, Johnson *et
al*. (13) were able to
identify htSNPs accounting for up to 80% of the genomic variations in the
populations they analyzed. Despite their differences, both block-based and
direct approaches use stochastic methods to identify a reduced set of SNPs
sufficient to characterize a genomic region in a population. A common concern
about these approaches is that the loss of information induced by their
stochastic nature could lead to overlooking rare variations responsible for
less frequent diseases
(14).

In contrast to these approximate approaches, we introduce the first exact,
analytical, lossless solution to the problem of identifying the
*minimum* set of SNPs accounting for the variations in an arbitrary
genomic region. This method, called Best Enumeration of SNP Tags (BEST), does
not follow a suboptimal heuristic or some approximate, stochastic approach but
takes advantage of a peculiar aspect of the genome (the relatively small
number of haplotypes with respect to the number of SNPs) to confine the source
of complexity to a smaller search space. In this way, the reliability of the
identified htSNPs will be only a function of the inferred haplotypes, and the
haplotype tagging process will not induce any further information loss.
Experimental results show that BEST runs to completion in a matter of seconds
even for genomic regions containing >200 SNPs.

The method described in this contribution can take as input haplotypes
inferred from cross-sectional samples via stochastic systems
(3,
4) or from pedigree data.
Therefore, we applied our method to both a set of 105 genes from 47
independent subjects and haplotypes for 9 genes from pedigree data described
in ref. 13. Our results
confirm that a small proportion of SNPs is sufficient to capture the
haplotypic variations in a population and show that this proportion decreases
exponentially as the number of SNPs in the haplotype increases. Comparing BEST
to the method proposed by Johnson *et al*.
(13), we also show that, in
two genes of nine, our method finds smaller sets of htSNPs, suggesting that
BEST improves their original results, even for comparatively small
haplotypes.

## Materials

**Data Collection.** SNP genotype data for 24 self-described African
Americans (12 female) and 23 European Americans (11 female) were obtained for
105 genes: 85 genes from the University of Washington–Fred Hutchinson
Cancer Research Center Variation Discovery Resource Program for Genomic
Applications
(http://pga.mbt.washington.edu)
and 20 genes from the Innate Immunity Programs for Genomic Applications
(http://innateimmunity.net).
All sequencing was performed on the same anonymized DNA samples from the
Coriell Cell Repositories
(http://locus.umdnj.edu/nigms),
using the same Big Dye Terminator sequencing chemistry and equipment (Applied
Biosystems). Both sites used the same software and virtually identical
protocols for base calling, assembly, and SNP determination, as detailed on
each of the respective web sites.

We also tested our method by using published haplotypes, which included 5 genes in a maximum of 418 multiplex families from the Diabetes UK Warren 1 Repository, 3 genes in 598 subjects from Finnish families with at least one sibling diagnosed with type 1 diabetes, and one gene from United Kingdom blood donors (13).

**Haplotype Identification.** Haplotypes comprising all SNPs with minor
allele frequency of ≥10% were inferred for each gene, independently in each
ethnic sample, by using default settings of the phase program
(3) for each of the 105 genes.
To account for the inherent error rate of current genotyping technologies, we
selected only those haplotypes seen more than once (frequency of >4%).

## Methods

**Preliminary Definitions.** We regard a SNP as a variable bearing, at
most, four states, one for each of the possible alleles (A, T, G, and C),
although, in practice, SNPs with more than two states are relatively uncommon.
This variable can also encode insertion (+) and deletion (–)
polymorphisms. An allele is the assignment of a value to a SNP: we will say
that, in a particular individual, the SNP in a particular locus bears the
value. A haplotype is a set of contiguous alleles, as they appear in the
population of interest. A haplotype set is a set of contiguous SNPs, and it is
identified by a set of haplotypes found in the population of interest.
Fig. 1 shows the unique
haplotypes from SNPs with ≥10% rare allele frequency found in gene
*TLR7* from an African-American sample; columns represent SNPs, rows
represent haplotypes, cells represent alleles, and the entire table is the
haplotype set.

*TLR7*in an African-American sample. Each column represents a SNP, and each row represents a haplotype identified in the sample. In this case, there are 14 haplotypes spanning 59 SNPs. Color coding for each SNP is performed

**...**

A haplotype set can be a haplotype block, an entire gene, or an arbitrary
genomic region. A SNP is *derivable* from a set of SNPs if its alleles
are uniquely identified by a combination of alleles of the SNPs in such a set.
A set of SNPs is *sufficient* to derive the SNPs in a haplotype set if
*all* the SNPs in the haplotype set can be derived from the SNPs in
such set. A set of SNPs is *necessary* to derive the SNPs in a
haplotype set if, when one of its members is removed, at least one SNP in the
haplotype set, including itself, is no longer derivable. A *tagging
set* is the set of SNPs in a haplotype set necessary and sufficient to
derive all of the SNPs in the set. Among all of the tagging sets, a
*minimal* tagging set is a set containing the minimum number of SNPs,
and its members will be called htSNPs. For a given haplotype set, there may be
more than one minimal tagging set. Our goal is to find at least one of
them.

**Haplotype Tagging.** Fig.
2 gives a skeletal description of the haplotype tagging algorithm
BEST. The algorithm takes as input a set *S* of haplotypes, each
representing a unique set of values of each SNP in the haplotype set, such as
the haplotype set displayed in Fig.
1. The algorithm returns a minimal set of SNPs from which all of
the other SNPs in the haplotype set can be derived. A preliminary step of the
algorithm is to convert the haplotype set into binary form. The colors in
Fig. 1 encode the binary
conversion of a haplotype set. Although we consider here only the case of
biallelic SNPs, this encoding can be easily generalized to triallelic SNPs and
is currently implemented as such. When two SNPs share the same binary
representation, they are termed *binary equivalent*. Any tagging set
including a SNP will be equivalent to a tagging set where it is replaced by
one of its binary equivalent SNPs. After the binary conversion, the algorithm
will keep only one member of each group of binary equivalent SNPs.

**...**

A fundamental property of this binary representation is that if a SNP is derivable from a set of other SNPs, then it will be derivable from any superset of such a set. This property spares the exponential effort of identifying the set of SNPs deriving a SNP. The second critical property exploited by BEST is that the tagging set identified by adding, at each step, the SNP that derives the maximum number of SNPs leads to SNP sets containing the minimal tagging set.

Let *S* = {*s*_{1},...,
*s*_{m}} denote the *m* SNPs in a haplotype
set. BEST recursively partitions the set *S* in two groups *H*
and *D* such that *S* = *H* – *D*, where
*H* is the minimal tagging set (the smallest set of SNPs necessary and
sufficient to derive all of the SNPs in the haplotype set), *D* is the
set of (*m* – *k*) SNPs that are derivable from
*H*, and a SNP *d*_{j} is derivable from
*H* if the value of *d*_{j} in each haplotype
can be expressed as a Boolean function *f*(.) of elements of
*H*. A property of these Boolean functions is that if a SNP
*d*_{j} is derivable from a subset *H*′
of *H*, then *d*_{j} is also derivable from any
subset of *H* containing *H*′. This property follows from
the fact that if a SNP is derivable from set of SNPs *H*′, then
its alleles are uniquely identified by the allele combinations of the SNPs in
*H*. Therefore, *f*(*h*_{1},...,
*h*_{k}) = *f*(*h*_{1},...,
*h*_{k},
*h*_{k}_{+}_{1},
*h*_{k}_{+}_{2},...) for all
*h*_{1},..., *h*_{k} in
*H*′. In this way, one can check whether a SNP
*s*_{j} is derivable from a subset of SNPs in
*S* without necessarily knowing the specific subset, therefore avoiding
the exponential cost of the search. In the skeletal description of the
algorithm in Fig. 2, this
operation is performed by the function DERIVABLE(*x, Y*), which returns
true if the combinations of the alleles of a subset of the SNPs in the set
*Y* uniquely identify the alleles of the SNP *x*. The function
DERIVED(*Y*) returns the set of SNPs derivable from the set of SNPs
*Y*. We can show that the cost of the function DERIVED(*Y*) is
polynomial by noting that if a SNP *x* is derivable from a set of SNPs
*Y*, it is also derivable by any superset of *Y*. Hence, to
check the derivability of a SNP *x* from a set *Y*, we just
check whether the alleles of at least one member of *Y* match the
alleles of *x*. The function DERIVED(*Y*) is a simple iteration
of the function DERIVABLE(*x,Y*) across the SNP set *Y* and can
be executed in polynomial time.

The first step of the algorithm is the generation of the set
*H*_{1} of htSNPs that are not derivable from any subset of
*S*. This set is identified by examining the Boolean dependency of each
*s*_{j} on the set of SNPs *S* \
*s*_{j}, and each *s*_{j} that
is not derivable is assigned to *H*_{1}. The elements of
*H*_{1} determine a partition of the remaining SNPs into
. The set
*D*_{1} contains the SNPs derivable from
*H*_{1}, whereas *C*_{1} is the set of SNPs that
are not derivable from *H*_{1} and are therefore candidate
htSNPs. Next, an augmentation procedure is applied to move one or more
elements from *C*_{1} to *H*_{1} and from
*C*_{1} to *D*_{1}. First, the elements of
*C*_{1} are sorted according to this criterion:
if the set
*D*_{2}_{i} of SNPs derivable from
has cardinality greater than the
cardinality of the set *D*_{2}_{j} of SNPs
derivable from . If the criterion
identifies only one *c*_{1} such that
for all other
*c*_{j} in *C*_{1}, then
, *D*_{2} =
*D*_{21}, and
, and the procedure is
repeated on the set *C*_{2} until the set of candidate SNPs is
empty. When more than one set *D*_{2}_{i} of
SNPs derivable from has the same size
of *D*_{21}, then parallel partitions
,
*D*_{2}_{i} and
*D*_{2}_{i}) are generated, and the
augmentation procedure is repeated on each of them. When none of the SNPs in
*C*_{1} augments the set *D*_{1}, then pairs of
SNPs are treated as one single variable by the augmentation procedure. When
the set of candidate SNPs is empty, if the augmentation procedure returns one
or more necessary and sufficient sets *H* of htSNPs, then the algorithm
stops. If no such set is found, the whole procedure is repeated on each of the
sets *H*.

**Proof of Optimality.** We prove by induction on *k*, the
minimum number of htSNPs, that the smallest set of necessary and sufficient
htSNPs returned by BEST is the *minimum* set of htSNPs tagging the
haplotype set at hand. Suppose first that *k* = 1, so that the minimum
set of htSNPs consists only of one SNP. Because *k* = 1, then all
elements of *S* are binary equivalent and BEST will return one of the
*m* minimal tagging sets, each set given by
{*s*_{i}}, for *i* = 1,..., *m*. Any of
this set will be the minimum solution. Next, suppose the result is true for
any minimum set of size *k* – 1 (that is, if the minimum set of
htSNPs has size *j* ≤ *k* – 1, then we assume that BEST
returns one of the minimum solutions of size *j*) and we show that the
result is true when the minimum set consists of *k* htSNPs. More
precisely, we assume that *H* = {*h*_{1},...,
*h*_{n}} is the set returned by BEST ordered according
to the augmentation procedure, and we show that *n* = *k*.
Suppose that we can decompose the set of SNPs *S* into
, where
*S*_{k}_{–}_{1} is the subset of
*S* that is decomposed into
,
*D*_{k}_{–}_{1},
*H*_{k}_{–}_{1} consists of the
first *k* – 1 htSNPs in *H,
D*_{k}_{–}_{1} is the set of SNPs
that are derivable from
*H*_{k}_{–}_{1}, and
*S*_{k} is the set of SNPs in *S* that are not
derivable from *H*_{k}_{–}_{1}.
By induction, *H*_{k}_{–}_{1} is
the minimum set of htSNPs for
*S*_{k}_{–}_{1}, and it is
equivalent to any minimum set of htSNPs for
*S*_{k}_{–}_{1}. Furthermore,
because we are assuming there is a minimum solution of size *k*, there
exists at least one htSNP in
*S*_{k} that, once added to
*H*_{k}_{–}_{1}, makes of
the minimum set of
htSNPs. Because the set of SNPs derivable from
is the whole set
, then the cardinality
of the set of SNPs derivable from
is the largest, and
therefore is the set of
htSNPs found by BEST and *n* = *k*. If a decomposition
does not exist, then we can
decompose , where
and
*H*_{k}_{–}_{i}
consists of the first (*k* – *i*) SNPs, and repeat the
same argument.

## Results

BEST was first used to tag the 105 genes, ranging from 5 to 229 SNPs in
length, described in *Materials*. The results of this analysis are
summarized in Table 1. These
results confirm that a small proportion of SNPs (14% for African Americans and
10% for European Americans) is sufficient to capture the variations in a
haplotype set. It is interesting to note that the proportion of SNPs required
to tag a gene decreases exponentially as the number of constituent SNPs
increases. Fig. 3 shows the
sharp exponential decay of the number of htSNPs as the size of the haplotype
increases in both populations. This decay is due to the fact that, as the
total number of SNPs increases, the observed haplotypes are likely to be a
smaller ratio of the entire sample space, which grows exponentially with the
number of SNPs. The algorithm takes advantage of the limited haplotype
diversity in the genome to achieve an exponential saving in genotyping as the
haplotype length increases.

*Left*) and European-American (

*Right*) samples. The steeper decay in the European-American sample is due to an increased number of binary

**...**

An interesting finding is that the majority of the htSNPs in the European-American sample also appear as htSNPs in the African-American sample. Because a minimal tagging set is not necessarily unique, these proportions can only be taken as a lower bound of the shared htSNPs. Still, an average of 95% (and in most cases 100%) of the htSNPs in the European-American sample are a subset of the htSNPs of the African-American sample. For 98 genes (94%), all of the htSNPs found in the European-American sample were also found in the African-American sample. This finding is strikingly identical across the vast majority of the 105 genes here considered and suggests that the lower variability of the European-American population is indeed the result of a depletion of an original gene pool, consistent with a severe bottleneck occurring during the founding of Europe and the proposed “Out of Africa” event (11, 15, 16).

An important result is that BEST successfully identifies the minimum set of
htSNPs even in haplotype sets that would be unfeasible to tackle by exhaustive
enumeration or prone to error by eye. We also analyzed nine genes described by
Johnson *et al*. (13).
Fig. 4 *Left* shows the
results obtained by BEST and compares them with the results published in the
original report. For two genes of nine, CASP10 and SDF1, BEST found multiple
alternative htSNP sets smaller than those reported in the original report. For
SDF1, the largest of the genes described in the original report spanning 22
SNPs, Johnson *et al*. determine that five htSNPs are required to tag
haplotypes with frequency of >5%. In contrast, BEST reveals 10 equivalent
sets of four htSNPs for their data. Even for the smaller gene, CASP10, with
just 11 SNPs, BEST was able to identify four alternative sets of three htSNPs,
against the single set of four htSNPs in the original report. These results
suggest that, although their method is able to find sets of tagging SNPs,
these are not necessarily optimal. In all genes, BEST identified equivalent
SNPs. In our experience, knowing alternative equivalent htSNP sets is often
valuable in practice when individual htSNPs prove difficult to genotype
because of flanking repeat regions.

## Discussion

Haplotype-based studies are today considered one of the most promising approaches to discover the genetic basis of common diseases. One consequence of the haplotypic nature of the human genome is that only a subset of the SNPs in a haplotype will be sufficient to unambiguously distinguish the haplotypes. This feature of the genome promises to significantly reduce the number of SNPs required to completely genotype a sample and, in so doing, render feasible genomewide association studies. The identification of haplotype blocks created by the evolutionary history of the genome is an important step toward the identification of redundant SNPs, but the fulfillment of the promise of haplotype-based studies rests on the possibility of identifying which SNPs are actually able to tag a haplotype set with no information loss. This contribution described a feasible, exact, and lossless method able to identify such htSNPs and analytically tag an arbitrary stretch of the genome.

Current approaches focus on the identification of htSNPs based on linkage disequilibrium and on stochastic measures of haplotype diversity. Although these efforts provide useful insight into the natural history of the genome, we have shown that analytical haplotype tagging of arbitrary genomic regions is more efficient at identifying parsimonious sets of htSNPs. We believe that the ability of our method to identify the minimum tagging set for an arbitrary region of the genome can be instrumental in delivering on the promise of haplotype-based studies. Furthermore, the ability of our method to identify alternative minimal sets of htSNPs, when available, can be valuable in practice when htSNPs prove difficult to genotype. Coupling BEST with a map of human haplotypes would provide investigators with a powerful tool to design association studies.

A computer program implementing the method described here is available at http://genomethods.org/best.

## Acknowledgments

We thank Emanuela Gussoni (Harvard Medical School), Stefano Monti (Massachusetts Institute of Technology/Whitehead Institute, Cambridge, MA), Alberto Riva (Harvard Medical School), and the referees for their insightful comments. This research was supported, in part, by National Science Foundation Grant ECS-0120309 (to M.F.R. and P.S.) and National Institutes of Health Grants HL-66795 (to S.T.W. and I.S.K.) and P01 NS40828 (to L.M.K. and I.S.K.). L.M.K. is an investigator of the Howard Hughes Medical Institute.

## Notes

Abbreviations: BEST, Best Enumeration of SNP Tags; SNP, single-nucleotide polymorphism; htSNP, haplotype tagging SNP.

## References

**,**536–539. [PubMed]

**,**1580–1581. [PubMed]

**,**978–989. [PMC free article] [PubMed]

**,**157–169. [PMC free article] [PubMed]

**,**760–763. [PubMed]

**,**435–438. [PubMed]

*et al.*(2001) Science 294

**,**1719–1723. [PubMed]

*et al.*(2001) Nat. Genet. 29

**,**223–228. [PubMed]

**,**229–232. [PubMed]

**,**199–204. [PubMed]

*et al.*(2002) Science 296

**,**2225–2229. [PubMed]

*et al.*(2001) Nat. Genet. 29

**,**233–237. [PubMed]

**,**573.

**,**8119–8123. [PMC free article] [PubMed]

**,**708–713. [PubMed]

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (553K)

- Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity.[Pac Symp Biocomput. 2003]
*Avi-Itzhak HI, Su X, De La Vega FM.**Pac Symp Biocomput. 2003; :466-77.* - Accuracy of haplotype reconstruction from haplotype-tagging single-nucleotide polymorphisms.[Am J Hum Genet. 2005]
*Forton J, Kwiatkowski D, Rockett K, Luoni G, Kimber M, Hull J.**Am J Hum Genet. 2005 Mar; 76(3):438-48. Epub 2005 Jan 19.* - Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies.[Genome Res. 2004]
*Halldórsson BV, Bafna V, Lippert R, Schwartz R, De La Vega FM, Clark AG, Istrail S.**Genome Res. 2004 Aug; 14(8):1633-40.* - Efficient selection of tagging single-nucleotide polymorphisms in multiple populations.[Hum Genet. 2006]
*Howie BN, Carlson CS, Rieder MJ, Nickerson DA.**Hum Genet. 2006 Aug; 120(1):58-68. Epub 2006 May 6.* - How many SNPs does a genome-wide haplotype map require?[Pharmacogenomics. 2002]
*Judson R, Salisbury B, Schneider J, Windemuth A, Stephens JC.**Pharmacogenomics. 2002 May; 3(3):379-91.*

- Trans-Ancestral Studies Fine Map the SLE-Susceptibility Locus TNFSF4[PLoS Genetics. 2013]
*Manku H, Langefeld CD, Guerra SG, Malik TH, Alarcon-Riquelme M, Anaya JM, Bae SC, Boackle SA, Brown EE, Criswell LA, Freedman BI, Gaffney PM, Gregersen PA, Guthridge JM, Han SH, Harley JB, Jacob CO, James JA, Kamen DL, Kaufman KM, Kelly JA, Martin J, Merrill JT, Moser KL, Niewold TB, Park SY, Pons-Estel BA, Sawalha AH, Scofield RH, Shen N, Stevens AM, Sun C, Gilkeson GS, Edberg JC, Kimberly RP, Nath SK, Tsao BP, Vyse TJ.**PLoS Genetics. 2013 Jul; 9(7)e1003554* - Assessing the quality of case-control association studies on the genetic basis of sepsis[Pediatric critical care medicine : a journa...]
*Vitali SH, Randolph AG.**Pediatric critical care medicine : a journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies. 2005 May; 6(3 0)S74-S77* - Identification of KIF3A as a Novel Candidate Gene for Childhood Asthma Using RNA Expression and Population Allelic Frequencies Differences[PLoS ONE. ]
*Butsch Kovacic M, Biagini Myers JM, Wang N, Martin LJ, Lindsey M, Ericksen MB, He H, Patterson TL, Baye TM, Torgerson D, Roth LA, Gupta J, Sivaprasad U, Gibson AM, Tsoras AM, Hu D, Eng C, Chapela R, Rodríguez-Santana JR, Rodríguez-Cintrón W, Avila PC, Beckman K, Seibold MA, Gignoux C, Musaad SM, Chen W, Burchard EG, Khurana Hershey GK.**PLoS ONE. 6(8)e23714* - Efficient Genome-Wide TagSNP Selection Across Populations via the Linkage Disequilibrium Criterion[Journal of Computational Biology. 2010]
*Liu L, Wu Y, Lonardi S, Jiang T.**Journal of Computational Biology. 2010 Jan; 17(1)21-37* - FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium[BMC Bioinformatics. ]
*Liu G, Wang Y, Wong L.**BMC Bioinformatics. 1166*

- Minimal haplotype taggingMinimal haplotype taggingProceedings of the National Academy of Sciences of the United States of America. Aug 19, 2003; 100(17)9900PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...