![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2005, The National Academy of Sciences Evolution Quantifying the mechanisms for segmental duplications in mammalian genomes by statistical analysis and modeling *Department of Biology, †Courant Institute of Mathematical Sciences, and ‡School of Medicine, New York University, New York, NY 10003 § To whom correspondence should be addressed at: NYU Bioinformatics Group, New York University, 715 Broadway, Room 1002, New York, NY 10003. E-mail: mishra/at/nyu.edu. Edited by Charles R. Cantor, Sequenom, Inc., San Diego, CA, and approved January 19, 2005 Received October 26, 2004. This article has been cited by other articles in PMC.Abstract A large number of the segmental duplications in mammalian genomes have been cataloged by genome-wide sequence analyses. The molecular mechanisms involved in these duplications mostly remain a matter of speculation. To uncover, test, and further quantify the hypotheses on the mechanisms for the recent duplications in the mammalian genomes, we have performed a series of statistical analyses on the sequences flanking the duplicated segments and proposed a dynamic model for the duplication process. The model, when applied to the human duplication data, indicates that ≈30% of the recent human segmental duplications were caused by a recombination-like mechanism, among which 12% were mediated by the most recently active repeat, Alu. But a significant proportion of the duplications are caused by some mechanism independent of the repeat distribution. A less sure but similar picture is found in the rodent genomes. A further analysis on the physical features of the flanking sequences suggests that one of the uncharacterized duplication mechanisms shared by the mammalian genomes is surprisingly well correlated with the physical instability in the DNA sequences. Keywords: segmental duplication, genomic instability, interspersed transposable elements, Markov models, copy number fluctuation The mammalian genomes are filled with duplicated sequences of different sizes. In the last few years, researchers have found that ≈3.5–5% of the human genome (1, 2), ≈1.2–2% of the mouse genome (3, 4), and 3% of the rat genome (5) contain recent segmental duplications (genomic sequence blocks whose identity level is >90% and length is >1 kb). Nonetheless, a clear delineation of mechanisms responsible for those recent duplications in the mammalian genomes remains elusive: Unequal crossovers usually cause tandem duplications; long interspersed transposable element 1 (L1) retrotransposon machinery can only cause interspersed duplications of <1 kb (6). Recently, a detailed analysis on the duplication breakpoints in a specific genomic region showed that some segmental duplications may have been caused by Alu-mediated recombination events (7). Later, Bailey et al. (8) reported that a significant portion of the interspersed segmental duplications terminated within an Alu repeat. These results led to the suggestion that the primate-specific burst of Alu retrotransposition activity is the primary cause of the recent boom of segmental duplications in the human genome (8). However, given the highly dynamic nature of the Alu repeats in the recent past (9), estimation of its contribution to the segmental duplication process could be biased if its evolutionary dynamics are not taken into consideration. To quantitatively assess the relative contribution of Alu recombination mechanism to the process of segmental duplication without bias, we developed a dynamic mathematical model that formulates the evolution of the repeat distribution in the duplication flanking regions (see Fig. 1
The largest fraction of duplications is thus not accounted for by recombination between interspersed repeats according to our computation. We therefore attempted to uncover evidence for a repeat-independent mechanism and discovered that the regions flanking duplications are enriched for sequences with low helix stability and high DNA flexibility. These physicochemical properties also characterize sequences known to be “fragile” sites (10, 11) for genetic rearrangement. Thus, segmental duplications may share a mechanism linked to genetic instability. Methods Sequence Preparation. We used four different segmental duplication mapping data sets from three different mammalian genomes in our study: the July 2003 Human genome assembly (hg16) (http://projects.tcag.ca/humandup) (2), the April 2003 Human genome assembly (hg15) (http://genome.ucsc.edu) (1), the February 2003 Mouse genome assembly (mm3) (4), and the June 2003 Rat genome assembly (rn3) (5). To avoid redundancy and ambiguity, we only selected the duplication pairs that (i) were only duplicated once, (ii) could not be included in any other duplications, (iii) were interchromosomal or at least 9 kb apart, and (iv) were >6 kb in length. The filtered duplication pairs for genome assemblies hg16, hg15, mm3, and rn3 are available as Data Sets 1, 2, 3, and 4 (respectively), which are published as supporting information on the PNAS web site. Two control sequence sets were created for each data set: One contained sequences randomly chosen from the corresponding genome assembly (http://genome.ucsc.edu), and the other contained sequences randomly selected from inside the duplicated regions. Repeat Analysis. The repeats were identified according to the genome annotation database (http://genome.ucsc.edu). In this study, we considered a repeat as present in that flanking sequence if it was longer than a 100-bp threshold. For a pair of flanking regions to be identified as having a common repeat in a specific region (labeled as +/+), the repeat sequences had to be on the same side of the duplicated segments, face the same direction, and share at least 100 bp of high homology. For the Alu family, sequences from any subfamilies shared high homology (12, 13). For the L1 family, however, only sequences from the same subfamily were treated as highly homologous (14). In our model, the frequency of +/+ flanking region pairs in each age group was further normalized by subtracting the average frequency of repeats inside the duplicated segments, assuming that the repeats inside the duplicated region resulted from some repeat-independent mechanism and were uniformly distributed. Model Parameters. All but two of the model parameters could be derived from the existing literature. They are enumerated in Table 4 in Supporting Appendix, which is published as supporting information on the PNAS web site. We chose a flanking region size that is large enough to minimize the effect of mapping and annotation errors (by allowing some gaps and shifts; see Figs. 5–7, which are published as supporting information on the PNAS web site) and yet sufficiently restrictive to distinguish the signals from the genomic background noise. To establish the most appropriate size of the flanking regions to be used in the study, we applied the model to the data sets generated from several different flanking region lengths (200, 500, 1,000, and 2,000 bp). The estimation of repeat recombination, measured by the h1 value (see Model for definition), reaches its highest in the 500- and 1,000-bp data sets, thereby suggesting these two sizes to be optimal choices. The data presented in this report used a flanking region length of 500 bp. Model Evaluation. We used a cross-validation method to test the performance of the model and the confidence intervals of the estimated parameters. The complete data set was randomly partitioned into two equally sized groups: an in-sample set to estimate the parameters and an out-of-sample set to cross-validate and measure the significance of the estimated parameters. The goodness of fit was tested in the out-of-sample data by using the parameters estimated from the in-sample data (for details, see Supporting Appendix). In Results, we report the mean values and standard deviations of the parameters estimated in 50 independent trials. Stability and Flexibility Computation. The helix stability of the DNA duplex was estimated by the average strand dissociation Gibbs free energy (ΔG) in overlapping 50-bp windows, computed by the nearest neighbor model experimentally verified by Breslauer et al. (15). The DNA flexibility was estimated by the average twist angle in overlapping 50-bp windows computed by the method in ref. 16. Results Sequence Analyses: Repeat Distribution. We began by analyzing the repeat composition in the flanking regions of the mapped segmental duplications in the two assemblies of human genome (hg15 and hg16), as well as the two rodent genomes (mm3 and rn3) (see Methods). Consistent with the previous report on the human segmental duplications (8), we detected a significant overrepresentation of the repeats from the younger Alu subfamilies (AluY and AluS) in the flanking regions compared with random regions in both human genome assemblies (Fig. 1 However, to test the above hypothesis, one needs to consider the highly active history of the overrepresented repeats in the duplication flanking regions and the reliability of the genome assembly and duplication mapping data. Therefore, we conducted a detailed analysis on the hypothesis through a mathematical model that incorporates the evolutionary dynamic of the active repeats and minimizes the effect of assembly or mapping errors. Model. The repeats that caused duplications by recombination should reside on the same side of the duplicated segment, face the same direction, and share enough homologous sequences. Therefore, intuitively, we could directly estimate the contribution of repeat recombination to duplication by measuring the excessive level of such repeat configurations in the flanking regions of the newly duplicated segments before any erosion on the sequence occurs through mutation events. However, the newly duplicated segments are almost identical and are, therefore, most prone to genome assembly errors, making the estimations unreliable. In contrast, if we used the “older” duplications, which are less prone to assembly errors; we could potentially overestimate or underestimate the contribution of the repeats. For instance, the actively amplifying transposable repeats can be inserted into the flanking regions after duplication and can form a configuration that falsely suggests a recombination event, resulting in overestimation of the hypothesis. Conversely, the repeats in the flanking regions can also lose their initial configuration after the recombination incident because of point mutations and deletions after duplication, consequently leading to underestimation of the hypothesis. To resolve the above dilemma, we incorporated the evolutionary dynamics of the repeats and the duplicated segments in our model. Over time, all of the repeats in the flanking regions, regardless of whether they have caused the duplication by recombination, are subject to changes in their configurations. Assuming that the mechanisms of segmental duplication and their relative contribution have been well conserved over time, the current repeat configuration in the flanking regions of duplications of different ages may be viewed as sampled from its stationary distribution. If the evolutionary rates of the repeats and the duplicated segments are known, the relative contribution of repeat recombination to segmental duplications can be estimated from the stationary distribution. To explain the model, we begin by introducing some notations. In our model, each pair of the duplication flanking regions is assigned to a state specified by the configuration of the interspersed repeats in the flanking regions and the age of the duplication event. There are three possible repeat configurations in a pair of flanking regions (defined in Fig. 1 The model describes the dynamically changing state distribution of the flanking regions as a Markov process over evolutionary time under the effect of accumulating mutations and repeat amplifications. Table 1 lists in details all of the possible transitions between states in a small time interval (Δt) and the corresponding transition probabilities expressed in the evolutionary rates of the repeats and duplicated segments. A schematic representation of the model integrating the details in a small example is displayed in Fig. 2
The model rests on two assumptions: First, the evolutionary dynamic rates and the mechanisms of segmental duplication as well as their relative contribution have been well conserved over a long period of evolutionary time. Second, the state distribution evolution in the flanking regions has reached its stationary state; i.e., despite the uninterrupted dynamic changes in the state of each individual flanking region pairs, the distribution over different states among all of the flanking region pairs stays unchanged. Formally, there exists a sufficiently large T, such that for any time t or s with t, We applied the model to the duplication flanking regions in the human genome on the distribution of their states specified by repeats from the Alu (X = Alu) and L1 (X = L1) families, respectively, whose evolutionary rates have been well characterized (see Table 4) (9). Two different data sets (hg15 and hg16) (1, 2) were used. The free parameters in the model and their corresponding standard deviations were determined by cross-validation (see Methods). For both data sets (Fig. 3
To further measure the significance of the contribution to the duplication process by the recombination in these two repeat families, we compared the estimated contribution ( The hg15 and hg16 data sets were independently mapped by different research groups using different strategies (1, 2), and it has been shown that the earlier map (hg15) contains more artifacts caused by assembly errors than the later one (2). Despite such differences, the model still gives consistent results between the two assemblies. It is also reassuring to find that, for both repeat families, the model estimated that the fraction of the duplications caused by the recombination-like mechanism (h1) is ≈30% (for details, see Supporting Appendix), although their contributions to the duplication mechanisms are quite different. The consistency in the parameter values suggests the robustness of our model against errors in assembly, mapping, and annotation. This robustness is mostly due to the parsimony of the model and the way in which the model accounts for a reasonable amount of errors and efficiently removes the corrupting noise. For the mouse and rat genomes, a good estimation of the evolutionary dynamic parameters of the interspersed repeats is still lacking. Furthermore, the available duplication mappings in the rodent genomes are likely to be less accurate because of the unfinished status of the genome assemblies (3, 5). Those factors prevented us from applying the model accurately to the rodent data sets as we did for the Alu and L1 repeats in the human genome. However, if one approximates the mutation rates in the rodent genomes by doubling the corresponding rates in the human genome and the rodent L1 insertion rate by tripling the human L1 insertion rate, then it is possible to reach a fairly good fitting for the L1 distribution in the mouse and rat data sets (Fig. 9). The contribution of the L1 repeats to the recent segmental duplications through the recombination-like mechanism is then estimated at ≈10% in the rodent genomes. In conclusion, in all of the mammalian genomes examined, our model estimates that ≈10–12% of the recent segmental duplications were caused by the recombination between the most active interspersed repeat elements in the genome (Alu in human and L1 in rodents). The results from the model further suggest that the segmental duplications are likely to be caused by multiple mechanisms, and a large fraction (≈70%) of the duplications are caused by some unknown mechanism independent of the interspersed repeat distributions, which is consistent with the conclusions of ref. 20. Further Sequence Analysis: Physical Properties. In the process of searching for repeat-independent mechanisms, we discovered an enrichment of DNA sequences that are physically unstable around the duplication boundaries. The physical properties of the DNA duplex plays an important role as the initial step in many molecular processes, as shown in transcription (21), replication (22), and the large genome rearrangement events that originated from the chromosomal fragile sites (10, 11). Therefore, it is possible that similar properties can initiate or facilitate the segmental duplication process in the mammalian genomes. To explore possible repeat-independent explanations and to avoid the bias introduced by the AT-rich regions in Alu and L1 repeats, we analyzed the flanking sequences that do not contain any repeats for their helix stability (15) and DNA flexibility (16) (see Methods for details). These two features are suggested to be the specific characteristics of the fragile sites in the genome where genetic rearrangements frequently occur (10, 11). In the mouse and human data sets, there is a slight decrease of the average helix stability and an increase of the average DNA flexibility at the duplication junction compared with the other regions either inside or outside the duplicated segments (Fig. 4
The overrepresentation of sequences with physical features similar to the fragile sites in the duplication flanking regions suggests that segmental duplications may share a mechanism linked to genetic instability. Although these results represent evidence for the hypothesis that some repeat-independent mechanism is involved in the recent mammalian segmental duplications, the hypothesis needs to be explored further. Discussion From previous studies (2) and our detailed analysis on gaps and shifts in the duplication flanking regions (see Figs. 5–7), we conclude that the current map of segmental duplications is still tainted with errors from assembly, mapping, and annotation. In the presence of these errors, an analysis on sequences strictly at the mapped duplication boundaries will underestimate or even diminish the signals left by the repeat recombination. Using a flanking region size that allows some gaps and shifts helps us to minimize the effect of these errors on our analysis. In addition, by incorporating our knowledge of the related evolutionary processes in the dynamic model, it was possible to decrease the effect of random noise. Therefore, despite the nature of the data, our method was found to be quite robust. Of course, the accuracy of the results will increase with the finishing stages of the genome assembly and the improvement on the mapping and annotation schemes. Interspersed segmental duplications are significantly more abundant in the human genome than in the rodent genomes (3–5). It was suggested that the difference is due to the recent burst of primate Alu retrotransposition activity (8). However, the rough estimations from our model suggest that the relative contribution from the most active repeats through the recombination-like mechanism remains more or less constant in the human and rodent genomes. Therefore, the answer to why the genomes have different amounts of segmental duplications is to be sought elsewhere [for example, the difference in the tolerance for large duplications, the difference in effective population sizes, or the finishing stage of the genome assembly (23)]. Segmental duplications have been shown to be associated with the genome rearrangement events during species evolution (24, 25) and the copy number fluctuations (26–29) and other rearrangements (30) in genomic sequences during cancer development. Therefore, some of the mechanisms used by segmental duplications, such as recombination mediated by interspersed repeats (31, 32), may be shared by other genomic rearrangement events. Suggested by the fragile sites we found in the duplication flanking sequences and their association with the breakpoints of the syntenic blocks (24, 25), perhaps another common mechanism could be correlated to the specific physical properties in the DNA sequences. In fact, it has been suggested that segmental duplications in yeast are caused by breakage-induced-replications induced by replication fork stalling at the AT-rich replication termination sites (33). These topics of future research may rely on mathematical models akin to the ones proposed here. Acknowledgments We thank Jack Schwartz (Courant Institute of Mathematical Sciences), Mike Wigler (Cold Spring Harbor Laboratory, Cold Spring Harbor, NY), the anonymous referees, and our colleagues from the New York University Bioinformatics Group for many helpful discussions, suggestions, and constructive criticisms. This work was supported by grants from the Quantum and Biologically Inspired Computing Program and Information Technology Research Program of the National Science Foundation; the Defense Advanced Research Projects Agency; the U.S. Air Force Research Laboratory; the National Institutes of Health; and the New York State Office of Science, Technology, and Academic Research. Notes Author contributions: Y.Z. and B.M. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper. This paper was submitted directly (Track II) to the PNAS office. Abbreviation: L1, long interspersed transposable element 1. References 1. Bailey, J. A., Gu, Z., Clark, R. A., Reinert, K., Samonte, R. V., Schwartz, S., Adams, M. D., Myers, E. W., Li, P. W. & Eichler, E. E. (2002. ) Science 297 1003–1007. [PubMed] 2. Cheung, J., Estivill, X., Khaja, R., MacDonald, J. R., Lau, K., Tsui, L. C. & Scherer, S. W. (2003. ) Genome Biol., 4 R25. [PubMed] 3. Bailey, J. A., Church, D. M., Ventura, M., Rocchi, M. & Eichler, E. E. (2004. ) Genome Res. 14 789–801. [PubMed] 4. Cheung, J., Wilson, M. D., Zhang, J., Khaja, R., MacDonald, J. R., Heng, H. H. Q., Koop, B. F. & Scherer, S. W. (2003. ) Genome Biol. 4 R47. [PubMed] 5. Tuzun, E., Bailey, J. A. & Eichler, E. E. (2004. ) Genome Res. 14 493–506. [PubMed] 6. Ejima, Y. & Yang, L. (2003. ) Hum. Mol. Gen. 12 1321–1328. [PubMed] 7. Babcock, M., Pavlicek, A., Spiteri, E., Kashork, C. D., Ioshikhes, I., Shaffer, L. G., Jurka, J. & Morrow, B. E. (2003. ) Genome Res. 13 2519–2532. [PubMed] 8. Bailey, J. A., Liu, G. & Eichler, E. E. (2003. ) Am. J. Hum. Genet. 73 823–834. [PubMed] 9. Liu, G., Zhao, S., Bailey, J. A., Sahinalp, S. C., Alkan, C., Tuzun, E., Green, E. D. & Eichler, E. E. (2003. ) Genome Res. 13 358–368. [PubMed] 10. Matsuyama, A., Shiraishi, T., Trapasso, F., Kuroki, T., Alder, H., Mori, M., Huebner, K. & Croce, C. M. (2003. ) Proc. Natl. Acad. Sci. USA 100 14988–14993. [PubMed] 11. Mishmar, D., Rahat, A., Scherer, S. W., Nyakatura, G., Hinzmann, B., Kohwi, Y., Mandel-Gutfroind, Y., Lee, J. R., Drescher, B., Sas, D. E., et al. (1998. ) Proc. Natl. Acad. Sci. USA 95 8141–8146. [PubMed] 12. Batzer, M. A. & Deininger, P. L. (2002. ) Nat. Rev. 3 370–379. 13. Kapitonov, V. V. & Jurka, J. (1996. ) J. Mol. Evol. 42 59–65. [PubMed] 14. Smit, A. F. A., Toth, G., Riggs, A. D. & Jurka, J. (1995. ) J. Mol. Biol. 246 401–417. [PubMed] 15. Breslauer, K. J., Frank, R., Blocker, H. & Marky, L. A. (1986. ) Proc. Natl. Acad. Sci. USA 83 3746–3750. [PubMed] 16. Sarai, A., Mazur, J., Nussinov, R. & Jernigan, R. L. (1989. ) Biochemistry 28 7842–7849. [PubMed] 17. Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J. F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2001. ) Nature 409 860–921. [PubMed] 18. Gibbs, R. A., Weinstock, G. M., Metzker, M. L., Muzny, D. M., Sodergren, E. J., Scherer, S., Scott, G., Steffen, D., Worley, K. C., Burch, P. E., et al. (2004. ) Nature 428 493–521. [PubMed] 19. Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J. F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2002. ) Nature 420 520–562. [PubMed] 20. Zhang, L., Lu, H. H., Chung, W. Y., Yang, J. & Li, W.-H. (2005. ) Mol. Biol. Evol. 22, 135–141. [PubMed] 21. Benham, C. J. (1993. ) Proc. Natl. Acad. Sci. USA 90 2999–3003. [PubMed] 22. Polonskaya, Z, Benham, C. J., Hearing, J. (2004. ) Virology 328 283–291. 23. She, X., Jiang, Z., Clark, R. A., Liu, G., Cheng, Z., Tuzun, E., Church, D. M., Sutton, G., Halpen, A. L. & Eichler, E. E. (2004. ) Nature 431 927–930. [PubMed] 24. Armengol, L., Pujana, M. A., Cheung, J., Scherer, S. W. & Estivill, X. (2003. ) Hum. Mol. Genet. 12 2201–2208. [PubMed] 25. Bailey, J. A., Baertsch, R., Kent, W. J., Haussler, D. & Eichler, E. E. (2004. ) Genome Biol. 5 R23. [PubMed] 26. Lucito, R., West, J., Reiner, A., Alexander, J., Esposito, D., Mishra, B., Powers, S., Norton, L. & Wigler, M. (2000. ) Genome Res. 10 1726–1736. [PubMed] 27. Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. & Brown, P. O. (1999. ) Nat. Genet. 23 41–46. [PubMed] 28. Li, S., Zhang, L., Kern, W. F., Andrade, D., Forsberg, J. E., Bates, F. R. & Mulvihill, J. J. (2002. ) Cancer Genet. Cytogenet. 138 149–152. [PubMed] 29. Squire, J. A., Pei, J., Marrano, P., Beheshti, B., Bayani, J., Lim, G., Moldovan, L. & Zielenska, M. (2003. ) Genes, Chromosomes Cancer 38 215–225. [PubMed] 30. Lengauer, C., Kinzler, K. W. & Vogelstein, B. (1998. ) Nature 396 643–649. [PubMed] 31. Kolomietz, E., Meyn, M. S., Pandita, A. & Squire, J. A. (2002. ) Genes, Chromosomes Cancer 35 97–112. [PubMed] 32. Swensen, J., Hoffman, M., Skolnick, M. H. & Neuhausen, S. L. (1997. ) Hum. Mol. Genet. 6 1513–1517. [PubMed] 33. Koszul, R., Caburet, S., Dujon, B. & Fischer, G. (2004. ) EMBO J. 23 234–243. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Science. 2002 Aug 9; 297(5583):1003-7.
[Science. 2002]Genome Biol. 2003; 4(4):R25.
[Genome Biol. 2003]Genome Res. 2004 May; 14(5):789-801.
[Genome Res. 2004]Genome Biol. 2003; 4(8):R47.
[Genome Biol. 2003]Genome Res. 2004 Apr; 14(4):493-506.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2003 Dec 9; 100(25):14988-93.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 1998 Jul 7; 95(14):8141-6.
[Proc Natl Acad Sci U S A. 1998]Genome Biol. 2003; 4(4):R25.
[Genome Biol. 2003]Science. 2002 Aug 9; 297(5583):1003-7.
[Science. 2002]Genome Biol. 2003; 4(8):R47.
[Genome Biol. 2003]Genome Res. 2004 Apr; 14(4):493-506.
[Genome Res. 2004]J Mol Evol. 1996 Jan; 42(1):59-65.
[J Mol Evol. 1996]J Mol Biol. 1995 Feb 24; 246(3):401-417.
[J Mol Biol. 1995]Proc Natl Acad Sci U S A. 1986 Jun; 83(11):3746-50.
[Proc Natl Acad Sci U S A. 1986]Biochemistry. 1989 Sep 19; 28(19):7842-9.
[Biochemistry. 1989]Am J Hum Genet. 2003 Oct; 73(4):823-34.
[Am J Hum Genet. 2003]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Nature. 2004 Apr 1; 428(6982):493-521.
[Nature. 2004]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Genome Res. 2003 Mar; 13(3):358-68.
[Genome Res. 2003]Science. 2002 Aug 9; 297(5583):1003-7.
[Science. 2002]Genome Biol. 2003; 4(4):R25.
[Genome Biol. 2003]Science. 2002 Aug 9; 297(5583):1003-7.
[Science. 2002]Genome Biol. 2003; 4(4):R25.
[Genome Biol. 2003]Genome Res. 2004 May; 14(5):789-801.
[Genome Res. 2004]Genome Res. 2004 Apr; 14(4):493-506.
[Genome Res. 2004]Mol Biol Evol. 2005 Jan; 22(1):135-41.
[Mol Biol Evol. 2005]Proc Natl Acad Sci U S A. 1993 Apr 1; 90(7):2999-3003.
[Proc Natl Acad Sci U S A. 1993]Proc Natl Acad Sci U S A. 2003 Dec 9; 100(25):14988-93.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 1998 Jul 7; 95(14):8141-6.
[Proc Natl Acad Sci U S A. 1998]Proc Natl Acad Sci U S A. 1986 Jun; 83(11):3746-50.
[Proc Natl Acad Sci U S A. 1986]Biochemistry. 1989 Sep 19; 28(19):7842-9.
[Biochemistry. 1989]Proc Natl Acad Sci U S A. 2003 Dec 9; 100(25):14988-93.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 1998 Jul 7; 95(14):8141-6.
[Proc Natl Acad Sci U S A. 1998]Genome Biol. 2003; 4(4):R25.
[Genome Biol. 2003]Genome Res. 2004 May; 14(5):789-801.
[Genome Res. 2004]Genome Res. 2004 Apr; 14(4):493-506.
[Genome Res. 2004]Am J Hum Genet. 2003 Oct; 73(4):823-34.
[Am J Hum Genet. 2003]Nature. 2004 Oct 21; 431(7011):927-30.
[Nature. 2004]Hum Mol Genet. 2003 Sep 1; 12(17):2201-8.
[Hum Mol Genet. 2003]Genome Biol. 2004; 5(4):R23.
[Genome Biol. 2004]Genome Res. 2000 Nov; 10(11):1726-36.
[Genome Res. 2000]Genes Chromosomes Cancer. 2003 Nov; 38(3):215-25.
[Genes Chromosomes Cancer. 2003]Nature. 1998 Dec 17; 396(6712):643-9.
[Nature. 1998]