![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2009 Paul and Piontkivska; licensee BioMed Central Ltd. Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining 1Department of Biological Sciences, Kent State University, Kent, Ohio 44242, USA Corresponding author.Sinu Paul: spaul1/at/kent.edu; Helen Piontkivska: opiontki/at/kent.edu Received April 14, 2009; Accepted July 6, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used. Results Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007), we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms). The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations) that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms. Conclusion We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines) and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges associated with viral escape. Background In the course of viral infection, recognition of viral peptides by class I major histocompatibility complex (MHC) molecules and subsequent interactions of the peptide/MCH complex with the cytotoxic T lymphocytes (CTLs, or CD8+ T cells) plays an important role in the control of the infection [1,2]. Viral CTL epitopes (which are short viral peptides recognized by the immune system components, CTL and MHC class I molecules) are an integral – and critical – part of this recognition process, and amino acid changes at CTL epitopes have been shown to play a role in viral "escape" (in other words, evading recognition by the immune system) in human (HIV) and simian (SIV) immunodeficiency viruses [3-8]. In particular, in HIV certain CTL epitopes are subjected to consistent selective pressure from the host's immune system, leading to rapid accumulation of amino acid changes, while other CTL epitopes evolve under purifying selection pressure [9,10]. Furthermore, rapidly accumulating genetic diversity in the global HIV-1 pandemic [11] underlies a great need to develop vaccines that are protective against multiple subtypes and strains simultaneously. The epitope-vaccine approach has been suggested as a strategy to circumvent the rapid rate of mutations in HIV-1 and the subsequent viral escape from the host's immune system as well as the development of resistance to anti-viral drugs [12-14]. The inclusion of CTL epitope sequences in vaccines has several advantages, including a possibility of targeting a majority of viral variants if highly conserved epitopes are used. Likewise, when epitopes from different genes or genomic regions are included in the same vaccine, such multi-epitope vaccines can induce broader cellular immune responses [15,16]. Several strategies can be used to develop multi-epitope vaccines, including (a) the generation of tetramer epitope vaccines with epitopes being chosen based on the presence of principal neutralizing determinant [12], (b) the generation of synthetic peptides with prediction of the candidate epitopes based on the peptide binding affinity of anchor residues in silico, focusing on those capable of binding to multiple HLA alleles [17], (c) the juxtaposition of multiple HLA-DR-restricted HTL epitopes [18] with epitope identification by screening of HIV-1 antigens for peptides that contain the HLA-DR-supertype binding motif [19]. However, inherent limitation of in-silico epitope predictions is generating a rather large number of initially predicted epitopes, many of which are false positives; and hence, there exists a need for subsequent experimental validation of many potential candidates [20-24]. Furthermore, because of the enormous genetic diversity of HIV, some predicted epitope candidates may be specific to only certain subtypes [21,25,26], whereas relying primarily on the extent of amino acid sequence conservation does not determine the potential immunogenicity [21]. Other methods, such as artificial neural networks [27] and hidden Markov models [28], also have limitations, such as adjustable values whose optimal values are hard to find initially, over fitting, overtraining and interpreting [29]. For example, in a study by Anderson et al. (2000) on experimental binding of 84 peptides to class I MHC molecules [30], there was no correlation between predicted versus experimental binding, and a high possibility of false-negatives. Thus, in this study we develop a novel strategy to identify best epitope candidates for multi-epitope vaccines from the pool of experimentally well-supported epitopes based on the association-rule mining technique. Briefly, an association rule mining technique, which is a method that can detect association between items (frequent item sets) and formulate conditional implication rules among them [31-33], is used to examine relationships between 218 "best-defined" CTL epitopes (from the list of Frahm, Linde & Brander, 2007 [26]). Our results show that some CTL epitopes are significantly associated with each other so that they co-occur together in the majority of the reference viral genomes including circulating recombinant forms. At least 23 association rules were identified that involve CTL epitopes from 3 different genes, Gag, Pol and Nef, respectively. We also identified several combinations of 3 to 5 CTL epitopes that are frequently found together in the same viral genome despite high mutation and recombination rates found in HIV-1 genomes, and thus, can be used as likely candidates for multi-epitope vaccine development. Materials and methods HIV-1 genomic sequence data and alignment Genomic nucleotide sequences of 9 protein-coding genes of HIV-1 were collected for 62 HIV-1 reference genomes from the 2005 subtype reference set of the HIV sequence database by Los Alamos National Laboratory (LANL) [34,35] (Table 1). These included 44 non-recombinant sequences from the groups M, N and O, and 18 circulating recombinant forms (CRFs). The M group was comprised of representatives of sub-subtypes A1, A2, F1 and F2, and subtypes B, C, D, G, H, J, K, respectively, of approximately 4 representative sequences from each category. This set of sequences was chosen since they allowed the diversity of each subtype to be roughly the same as for all available sequences in the database, similar to an effective population size. Moreover, they had full length genomes that covered all genes and major geographical regions (for criteria of selection of reference sequences, refer to [35]). Inclusion of CRFs allowed us to identify those highly conserved CTL epitopes that are shared between non-recombinant genomes and are also present in the majority of the recombinant reference genomes. Viral sequences were aligned at the nucleotide level as per amino acid alignment reconstructed with ClustalW, and were manually checked afterwards [36].
The summary of the average numbers of breakpoints in the CRF genomes was based on the breakpoint maps summarized at the HIV database at Los Alamos [37]. CTL epitopes The set of 218 CTL epitopes, described as "the best-defined HIV CTL epitopes" by Frahm, Linde & Brander (2007) [26] that included only those epitopes supported by strong experimental evidence in humans, was used. These epitopes, together with their respective genomic coordinates according to the reference HXB2 sequence (GenBank accession number K03455) [38], are described in Additional file 1. Selecting epitopes for association rule mining Of the 218 "best-defined" CTL epitopes, the subset of the most evolutionary conserved epitopes that are present across a majority of surveyed reference sequences was included according to the following criteria: (a) The epitope was present in at least one out of 62 reference sequences. (b) If two or more epitopes were completely overlapping with each other and there were no amino acid sequence differences, the longer epitope was selected. However, if overlapping epitopes harbored one or more amino acid difference from each other, all such epitopes were included (Figure (Figure1).1
To determine whether the same associations exist among non-recombinant and circulating recombinant forms (CRFs), three data sets were created. The first sequence set (designated later as "62-all") included all 62 HIV-1 reference sequences used in the study, the second set included only 44 non-recombinant sequences ("44-non-CRFs") and the third set included 18 CRFs (designated as "18-CRFs"). Because of the requirement that an epitope be present as a "perfect match" in at least one sequence as described above, 1 and 29 epitopes were removed from the epitope lists for the second and third data sets, respectively. This resulted in lists of 188 and 160 epitopes, respectively (Additional file 1). Additionally, one hundred "pseudo-datasets" of 62 sequences each (62 × 100) was created by randomly selecting sequences from the original sequence set (random sampling with replacement). Similarly to the bootstrap test widely used in phylogenetics [39], these pseudo-sets were used as controls to determine the significance of detected associations using the same threshold as the 62-all data set (i.e., 75% support and 95% confidence), in other words, whether identified associations in our original 62 sequence set could be attributed to the overrepresentation of certain sequence types by chance. The number of epitopes analyzed in each data set is given in Additional file 2. It should be noted that essentially the same association rules were identified in the pseudo-datasets as they were in the 62-all data set, which is consistent with the expectations that high values of support and confidence constraints used here already prune away most of the insignificant rules [32]. Association rule mining Association rule mining is a data mining technique that discovers relationships (associations, or rules) that exist within a data set [31-33,40]. One of the commonly known applications of association rule mining is "market basket" analysis [40-42]. However, in addition to marketing analysis, association rule mining has many useful applications to answer biological problems, including the discovery of relationships between genotypes and phenotypes in bacterial genomes [43], predicting drug resistance in HIV [44], and predicting MHC-peptide binding [45]. In this study, association rule mining was used to discover novel relationships between CTL epitopes that consistently co-occur together in viral genomes despite high mutation and recombination rates, so that such epitopes can be used as promising candidates in the design of multi-epitope vaccines. Association rule mining was conducted using the Apriori algorithm [41] implemented in the program WEKA [40,46,47]. The initial minimum support was set at 0.75 and confidence at 0.95. The maximum number of rules identified was set at 5,000 for the 62-All and 44-non-CRFs data sets and at 50,000 for the 18-CRFs data set to ensure that all association rules above the support and confidence thresholds are captured. The support level was set rather high to include only associations among epitopes that were present in at least 75% of the reference sequences used. The confidence was set to 0.95 to generate only very strong associations, and all generated association rules were exhaustively enumerated and examined. Once identified, association rules were examined to identify "unique" rules, i.e., rules that combine associations between the same epitopes into a single, "unique" rule regardless of the order of epitopes within a rule (i.e., A occurs with B and B occurs with A are considered the same "unique" rule) (Table 2 and Additional file 3).
Estimates of the nucleotide substitution rates The relative degree of sequence divergence among reference sequences and different genomic regions was evaluated by comparing the number of synonymous and nonsynonymous substitutions. In particular, the number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) were estimated by the Nei-Gojobori method [48] as implemented in the MEGA4 program [49]. This simple method was used because it is expected to have lower variance than more complicated substitution models [39]. The standard errors were estimated with 100 bootstrap replications. Pairwise dN and dS values were estimated for the so-called "associated" epitope regions (defined as epitopes that were found to be involved in any association rule), non-associated epitope regions (epitopes that were not involved in any association rule) and non-epitope regions (i.e., regions that did not harbor any CTL epitopes used in study), respectively. Results and discussion Mining for association rules In order to identify CTL epitope regions that consistently co-occur together in the HIV-1 genomes, 189 CTL epitopes were mapped in 62 HIV-1 reference sequences (Table 1), where "perfect match" was recorded as "epitope presence", while one or more amino acid differences between the canonical CTL epitope sequence and respective HIV sequences were considered as "epitope absence", and association rule mining was applied to determine whether certain CTL epitopes consistently co-occurred together. Using the data mining tool WEKA [46,47], the initial minimum support and confidence values were set to 0.75 and 0.95, respectively, to ensure that we identified only the most frequently co-occurring epitopes. In other words, a minimum support value of 75% ensures that only epitopes that are present as a "perfect match" in at least 75% of the sequences are included in association rules (e.g., epitope A is present in at least 46 sequences out of 62). The support for the 18-CRFs data set was later raised to 0.95 (i.e., even more conservative) to limit the overall number of associations because this data set generated a lot more association rules with 75% support compared to the other data sets, as it had 31 CTL epitopes with at least 75% support whereas those for the 62-All and 44-non-CRFs data sets were 25 and 26, respectively. On the other hand, a level of confidence set to 95% indicates that the identified association rule (e.g., epitope A being associated with epitope B) will be present in at least 95% of the sequences where epitope A occurs. In the case of 62 reference sequences, that means at least 44 reference sequences had both epitopes present. The results of the association rule mining are summarized in the Table 2. Initially, 1961 association rules were detected in the 62 sequences data set (1095 and 1867 for the 44-non-CRFs and 18-CRFs, respectively), of them 484, 344 and 210 were association rules involving unique combinations of epitopes (i.e., rules that A occurs with B and B occurs with A were considered the same "unique" rule), respectively. The majority of associations included 3 or 4 epitopes at a time; for example, the 62-all data set had 217 and 153 association rules involving 3 and 4 epitopes, respectively. However, a substantial amount of association rules was found to involve larger numbers of epitopes, 5 or 6 (Table 2). Among the unique epitope associations, a majority of them involved CTL epitopes harbored by the Gag and Pol genes for the 62-all sequence set (364 and 472 association rules included epitopes from the Gag and Pol genes, respectively), but only 52 association rules included an epitope from the Nef gene. Since Gag and Pol are located in adjacent genomic positions (and are somewhat overlapping), the physical proximity of the genes in the genome may be responsible for the existence of some association rules that involve epitopes from both of these genes (i.e., where recombination did not break up the association between epitopes). However, given the extremely high recombination rate in HIV-1, which was estimated to be as high as 2.8 crossovers per genome per replication cycle [50], epitopes from genes that are located far apart (such as Pol and Nef) would not be expected to be involved in many association rules, particularly those that occur with high support and confidence. Notably, our results identified at least 23 associations that involved epitopes located in 3 different genes, namely, Gag, Pol and Nef (shown in Figure Figure2).2
Overall, our results identified 358 association rules that involved epitopes from two different genes (mostly Gag and Pol) and 23 association rules that involve epitopes from three different genes (Gag, Pol and Nef). The Venn diagram shown on Figure Figure33
Interestingly, one of the frequently associated epitopes found in three genes associations (Figure (Figure2),2 As Figure Figure22 Selection at CTL epitopes involved in association rules To assess the extent of evolutionary sequence conservation of the CTL epitopes that participated in the association rules, we compared the levels of nonsynonymous (amino acid altering) and synonymous substitutions in all pairwise comparisons of 62 HIV-1 genomes. The results are shown in Table 3, which lists average pairwise dN and dS values estimated for the epitope and non-epitope regions from the 62 HIV-1 reference genomic sequences. Here, the epitope regions are divided into two groups: (a) those epitopes that are involved in association rules and (b) those not involved. In all pairwise comparisons, the overall substitution trend is that the number of synonymous substitutions significantly exceeds that of nonsynonymous substitutions (i.e., dS >> dN, paired t test, p < 0.01). This indicates that purifying selection indeed plays a major role in the evolution of both the CTL epitopes and non-epitope regions, which is consistent with our previous results [9,10]. However, when the relative magnitude of nonsynonymous and synonymous changes was considered, epitopes that participated in association rules were found to have significantly lower dN values than either the other CTL epitopes or the non-epitope regions (ANOVA, p = 0.015), indicating that they are much more conserved at the amino acid but not at the nucleotide level. On the other hand, no significant differences were detected between dS values compared between these categories (p > 0.2). Similar results were obtained using nonparametric statistics (Kruskal-Wallis test, p = 0.002). These results indicate that purifying selection acting to preserve amino acid sequences of CTL epitopes is operating more strongly on the CTL epitopes that are found to be involved in association rules than on the non-associated epitopes, perhaps, due to stronger functional and structural constraints in these regions. However, further studies are necessary to determine the nature of these constraints.
Significance of CTL epitope "participation" in the association rules By design, our study was focused on the identification of highly supported association rules (support >= 75%), i.e., those that involve epitopes present in at least 75% of the sequences analyzed. Notably, not all CTL epitopes that are present in over 75% of sequences can be found in association rules (e.g., Gag CRAPRKKGC and Pol LVGPTPVNI while occurring in over 75% of the analyzed sequences, were not part of any association rule). As Table 4 shows, CTL epitopes from about 15 non-overlapping genomic regions participated in association rules; however, some genomic regions contributed more than one epitope (generally, these are overlapping epitopes). We also used a conservative level of confidence of 95% or higher, which can be interpreted as follows: if epitopes A and B are present together and are associated with epitope C with confidence of 0.95, we can conclude that whenever there are epitopes A and B in the same genome, epitope C will appear in the same genome with 95% probability or higher.
Overall, we were able to identify several highly conserved epitopes that are relatively widely spread across the worldwide HIV-1 population, and present not only in non-recombinant subtypes, but also in the circulating recombinant forms. Such highly conserved epitopes may be considered promising candidates for multi-epitope vaccine design, as they are likely to be targeted in a majority of HIV lineages, thereby increasing population coverage. However, in addition to being highly conserved, there are additional benefits in utilizing CTL epitopes identified as participants in association rules (such as those depicted on Figure Figure2).2 Because the HIV genomes and definitions of the CTL epitopes were drawn from the reference sequences and the list of "best-defined" epitopes of the HIV Sequence and HIV Immunology databases, respectively, neither patient's HLA haplotype, stage of infection nor CTL responses are known. However, some of the associated epitopes have been shown to be immunogenic in acute HIV-1 infection, particularly those participating in associations involving epitopes from three different genes, while some others have been shown to be strongly immunogenic in drug-naive patients (Additional file 4). Furthermore, while some CTL epitopes may certainly be prone to escape mutations when exposed to the immune pressure elicited by the restricting HLA allele, the associated epitopes identified in this study are recognized by different HLA alleles, with some combinations representing three different alleles from the same HLA locus. For example, epitope association of Gag SEGATPQDL, Pol LFLDGIDKA and Nef FLKEKGGL is recognized by the HLA-B*4001, B*81 and B*0801 alleles, respectively, and thus, it is unlikely to be recognized by all three alleles within the same patient. On the other hand, a recent study has shown that there is a promiscuity of some CTL epitopes where epitope presentation and CTL recognition can occur in the context of alternative, not restricting, HLA class I alleles, often from different HLA supertypes [59]. As shown in Table 4, five of the 22 associated epitopes have been designated as promiscuous [per [59]], with at least one promiscuous epitope identified in each gene (Gag, Pol and Nef). Therefore, inclusion of these epitopes may potentially enhance the efficiency of a multi-epitope vaccine across a broader range of host HLA haplotypes (although "functionally homozygous" individuals who express both original and alternative HLA alleles may be at disadvantage [55,60]). Further studies are needed to address the mechanisms of immune control of HIV infection through combinations of HLA alleles and CTL epitopes, particularly, promiscuous epitopes. While our results demonstrated presence of several highly conserved – and identified to exist in association with each other – CTL epitopes in multiple HIV-1 reference genomes, including CRFs, the underlying functional significance of these regions for the virus remains poorly understood. Very few of the epitope regions found in association rules had such molecular features as glycosylstion, myristoylation, amidation, or phosphorylation sites. They also lacked any cell attachment motif or Leucine Zipper motif [61,62]. Yet, the highly conserved nature of these CTL epitopes hints at major functional significance of these regions. One possibility is that the strong sequence conservation is driven by functional constraints related to potential RNA secondary and tertiary structures formed by genomic regions of these epitopes, individually or in combination with each other. In such case it may be expected that the overall extent of sequence divergence will be lower at these epitopes than elsewhere in the genome, and indeed, both dN and dS values were found to be lower at the associated epitopes than at the other epitopes or non-epitope regions (Table 3). It is also noteworthy that some epitopes are not involved in any association despite being present in more than 75% of the reference sequences, hinting at some underlying mechanism that holds the "associated epitopes" together. It is possible that the associated epitopes from different genes co-evolve together because of functional and structural constraints due to protein-protein interactions that are necessary for many viral processes [63]. Since some of the HIV proteins are expressed as polyproteins (such as Gag-Pol) [64], regulation of polypeptide processing in the cell is an important part of the viral life cycle and is often mediated by interactions between domains that belong to different processed proteins. For example, within Gag-Pol several regions that are located close to the N and C termini of protease (PR) have been shown to influence PR activation [65]. Likewise, modulating reverse transcriptase (RT) activation has been shown to have an effect on Gag-Pol interaction and polypeptide processing [66], while interactions between C terminal flexible loop of Nef and Gag-Pol polyprotein are essential for HIV assembly [67]. While molecular mechanisms of potential interactions involving associated epitope regions are currently unknown, these regions represent interesting candidates for future experimental studies to elucidate these interactions and their functional significance. Our results revealed the presence of multiple associated co-evolving CTL epitope regions in HIV-1 genomes that are also significantly conserved across a broad range of HIV-1 subtypes and sub-subtypes. However, further studies are needed to ascertain the efficiency of these associated epitopes in multi-epitope vaccines as well as to uncover the underlying structural and/or functional constraints behind co-occurrences of the highly conserved epitopes. Conclusion Application of association rule mining revealed that certain CTL epitope combinations (including epitopes from three different genes) consistently co-occur in HIV-1 genomic sequences present in major geographic regions around the world. Such epitopes that are both well supported by experimental evidence and highly conserved across different non-recombinant and recombinant forms of HIV-1 genomes can be considered as ideal candidates for multi-epitope vaccines against HIV-1. Abbreviations HLA: Human Leukocyte Antigen; HTL: Helper T-Lymphocyte. Competing interests The authors declare that they have no competing interests. Authors' contributions SP did the analyses and wrote the manuscript. HP conceived and coordinated the study nd wrote the manuscript. All authors read and approved the final manuscript. Additional file 1 Table S1 – Epitopes included in the study. There were 218 epitopes in the "best defined CTL epitopes list" by Frahm et al. (2007). From this, 29 epitopes were removed from the 62-all data set because they did not satisfy the inclusion criteria. Additionally, one epitope from the 44-non-CRFs data set and 29 epitopes from the 18-CRFs data set were removed because of the inclusion criteria. Click here for file(46K, xls) Additional file 2 Table S2 – Number of epitopes included in the study. The number of epitopes included in the study as well as epitopes found in association rules in each gene. Click here for file(17K, xls) Additional file 3 Table S3 – Unique epitope associations in the 62-all data set. Each row represents one epitope association. Associated epitopes in each association are shown in adjacent columns. Presence of the association in other data sets is represented by 1 in respective columns. There are 484 unique associations in the 62-all data sets. Among them, 172 associations are present in the 44 – non-CRFs data set and 102 associations are present in the 18 – CRFs data set. The yellow color marks presence of the epitopes from all three genes in the 62-all data set. There are 23 associations of that kind. Click here for file(62K, xls) Additional file 4 Table S4. Stage of HIV-1 infection where the epitopes have been shown to be immunogenic, if known. Click here for file(20K, xls) Acknowledgements This work was partially supported by the Kent State University Research Council. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nature. 1987 Oct 8-14; 329(6139):512-8.
[Nature. 1987]Curr Opin Microbiol. 2002 Aug; 5(4):408-13.
[Curr Opin Microbiol. 2002]Proc Natl Acad Sci U S A. 1997 Mar 4; 94(5):1890-5.
[Proc Natl Acad Sci U S A. 1997]J Virol. 2004 Nov; 78(21):11758-65.
[J Virol. 2004]Virus Res. 2006 Mar; 116(1-2):98-105.
[Virus Res. 2006]Immunol Today. 1999 Dec; 20(12):588-9.
[Immunol Today. 1999]Immunobiology. 2003; 208(4):423-8.
[Immunobiology. 2003]Genet Anal. 1999 Nov; 15(3-5):149-53.
[Genet Anal. 1999]Immunol Today. 1999 Dec; 20(12):588-9.
[Immunol Today. 1999]Biologicals. 2001 Sep-Dec; 29(3-4):271-6.
[Biologicals. 2001]J Immunol. 2002 Jun 1; 168(11):5499-506.
[J Immunol. 2002]J Virol. 2001 May; 75(9):4195-207.
[J Virol. 2001]Drug Discov Today. 2007 May; 12(9-10):389-95.
[Drug Discov Today. 2007]Bioinformatics. 2002 May; 18(5):705-14.
[Bioinformatics. 2002]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]Bioinformatics. 2002 May; 18(5):705-14.
[Bioinformatics. 2002]Bioinformatics. 2008 Jul 1; 24(13):1523-9.
[Bioinformatics. 2008]Bioinformatics. 2004 Oct 12; 20(15):2479-81.
[Bioinformatics. 2004]Bioinformatics. 2007 Mar 1; 23(5):651-3.
[Bioinformatics. 2007]Mol Biol Evol. 1986 Sep; 3(5):418-26.
[Mol Biol Evol. 1986]Mol Biol Evol. 2007 Aug; 24(8):1596-9.
[Mol Biol Evol. 2007]Bioinformatics. 2004 Oct 12; 20(15):2479-81.
[Bioinformatics. 2004]Bioinformatics. 2007 Mar 1; 23(5):651-3.
[Bioinformatics. 2007]J Virol. 2002 Nov; 76(22):11273-82.
[J Virol. 2002]BMC Immunol. 2008 Jan 22; 9():1.
[BMC Immunol. 2008]Lancet Infect Dis. 2002 Aug; 2(8):461-71.
[Lancet Infect Dis. 2002]AIDS. 2006 Oct 24; 20(16):W13-23.
[AIDS. 2006]Nature. 2006 Nov 9; 444(7116):164.
[Nature. 2006]Science. 2006 Jul 28; 313(5786):523-6.
[Science. 2006]J Virol. 2001 Feb; 75(3):1301-11.
[J Virol. 2001]Proc Natl Acad Sci U S A. 1997 Mar 4; 94(5):1890-5.
[Proc Natl Acad Sci U S A. 1997]Blood. 2006 Mar 15; 107(6):2373-83.
[Blood. 2006]J Exp Med. 2004 Dec 20; 200(12):1547-57.
[J Exp Med. 2004]J Virol. 2004 Nov; 78(21):11758-65.
[J Virol. 2004]Virus Res. 2006 Mar; 116(1-2):98-105.
[Virus Res. 2006]Eur J Immunol. 2007 Sep; 37(9):2419-33.
[Eur J Immunol. 2007]J Virol. 2004 Nov; 78(21):11758-65.
[J Virol. 2004]Eur J Immunol. 2007 Sep; 37(9):2419-33.
[Eur J Immunol. 2007]Science. 2006 Jul 28; 313(5786):523-6.
[Science. 2006]Science. 1999 Mar 12; 283(5408):1748-52.
[Science. 1999]J Virol. 1992 Aug; 66(8):4748-56.
[J Virol. 1992]Retrovirology. 2005 Mar 9; 2():18.
[Retrovirology. 2005]Rev Med Virol. 2002 Jul-Aug; 12(4):239-62.
[Rev Med Virol. 2002]Nature. 1988 Jan 21; 331(6153):280-3.
[Nature. 1988]J Virol. 1995 Jun; 69(6):3878-84.
[J Virol. 1995]PLoS Pathog. 2006 Nov; 2(11):e119.
[PLoS Pathog. 2006]J Virol. 2004 May; 78(10):5311-23.
[J Virol. 2004]