![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2005 BioMed Central Ltd Datasets for evolutionary comparative genomics 1Computational Biology Unit, Bergen Centre for Computational Science, University of Bergen, 5020 Bergen, Norway Corresponding author.David A Liberles: liberles/at/cbu.uib.no This article has been cited by other articles in PMC.Abstract Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. Bioinformaticists and computational biologists working in the field of comparative genomics are largely dependent on datasets generated by others. Working with available data opens up desires for complementary datasets to fill knowledge gaps. In addition to writing grants for experimental laboratories and molecular biology supplies, one can also write an opinion piece to convince others to do some of the dirty work for you; this is what I am attempting to do here. Comparative genomics starts with sequencing. Many have suggested gaps in the tree of life, where additional genome projects will augment current knowledge, either to shorten long 'branches' on the tree of sequenced genomes or to complement existing genome projects. For example, there remain huge gaps in our knowledge of archaea. But with the faith that these gaps will ultimately be filled in, in this article I focus on alternative strategies for directing genomic resources so as to answer fundamental questions in evolution. The tape of life A whole class of genomic experiments can be hypothesized through what can be called the 'tape of life' question. Stephen J. Gould wrote in his book Wonderful Life [1], "Wind back the tape of life to the early days of the Burgess shale; let it play again from an identical starting point, and the chance becomes vanishingly small that anything like human intelligence would grace the replay". At the molecular level, the tape of life has been played in parallel. Different species have gone from a similar ancestral point to a similar derived phenotype. In these cases, are the same molecules and pathways driving the phenotypic evolution? Comparative genomics gives us unprecedented opportunities to answer such questions. A few studies have tried to address the tape-of-life question through analysis of a single gene, such as the melanocortin-1 receptor (MC1R). This receptor plays a role in pigmentation and body/hair color, representing an obvious link between selectable genotype and phenotype. MC1R has been demonstrated to be under such selective pressure in various birds [2] and mammals [3]. In another set of studies, the transcription factor Pitx1, involved in hindlimb formation, has been implicated in parallel evolution of morphologically very distinct types of stickleback fish [4]. At a genomic level, there are whole classes of experiments that can be proposed where phenotypic evolution is the driving force. As an example of the tape-of-life question played in parallel, terrestrial mammals have returned to the water in at least three independent lineages (see Figure Figure1).1
Cichlid fish (together with Darwin's finches) may be the textbook example(s) of parallel evolution (reviewed in [7,8]). As seen in Figure Figure2a,2a
Rapid phenotypic evolution Just as genome sequencing projects can be directed at interesting examples of parallel evolution, large-scale sequencing efforts can also be directed at points where phenotypic change appears to have been particularly rapid. This will improve the signal-to-noise ratio in attempting to detect those substitutions that drove phenotypic change. Studies of parallelism in the cichlid fish, especially in Lake Victoria, fall into this category (as well as the parallel evolution category) [7]. In another example, polar bears diverged from brown bears only a little more than 100,000 years ago. The oldest polar bear fossil is less than 100,000 years old [10]. From phylogenetics, polar bears fall within the brown bear clade (see Figure Figure3),3
Examination of the tape-of-life question or rapid phenotypic evolution does not need to involve entire genome sequencing. Large-scale full-length cDNA [12,13] and upstream promoter sequence can be generated more cheaply but contains much of the relevant functional information. The molecular basis for changes in coding sequence function, gene expression, and possibly alternative splicing is likely to be contained within such data. Ultimately, population-level data in the form of single nucleotide polymorphisms (SNPs) linked to biogeography will also be desirable, to shed light on the process of speciation. Regulatory evolution In addition to coding-sequence evolution, changes in alternative splicing patterns and gene-expression levels and patterns can also contribute to lineage-specific diversification. Large-scale inter-specific datasets that characterize relative splice-site usage or splice-variant frequencies would be valuable. An initial study comparing alternative splicing patterns in mouse, rat, and human led to the conclusion that alternative splice variants, like gene duplicates, have been used as a testbed for evolutionary novelty [14]. Changes in gene expression have become the leading candidates as drivers of evolutionary novelty, dating back to Allan Wilson's attempt to explain the phenotypic divergence between human and chimpanzee [15]. Pioneering work on the evolution of regulatory networks in echinoderms has pointed to a major role for changes in the expression of key regulatory proteins during development in driving morphological change [16]. A systematic examination of gene-expression changes in higher primates has also been presented [17]. The molecular variation in the human population that affects gene expression that is subject to the diversifying selection and fixation seen in inter-specific studies is also being characterized [18] and can be related to chimpanzee sequences in a bid to understand lineage-specific evolution. Extending this in a well controlled study across larger portions of the tree of life (initially at the inter-specific level) is warranted. Both relative gene-expression levels and relative alternative splicing levels are continuous variables, unlike sequences that are discretely A, C, G or T. There are methods for reconstructing such values over a phylogenetic tree and parsing changes onto branches, coupled to a reconstruction of the regulatory sequences that govern such processes (see, for example, [19]). The power of harnessing phylogenetic information not only provides an understanding of the molecular basis for organismal phenotypic divergence but can also be used to reduce the background 'noise' in attempts to understand basic principles of transcriptional regulation, mRNA splicing, and protein folding and function [19,20]. Even within the completed genomes that we already have, there are many unknown genes. Phylogenetic focusing (systematically attempting to sequence such genes from closely related species) will help us understand how they evolved, their function, and the evolution of novel genes in general. This can also be applied to rare protein structures, in order to understand the process of neostructuralization by searching for phylogenetic intermediates that provide a 'missing link' sequence. Phylogenetic focusing will be greatly aided by the establishment of local DNA banks containing genomic DNA from regionally specific species. This will also aid nations and their regions in understanding local biodiversity. Ohno [21], and subsequently Lynch and Conery [22], proposed a major role for gene duplication in the generation of evolutionary novelty. Wilson and Davidson and colleagues have done the same for gene expression [15,16]; the Lee lab has done the same for alternative splicing [14]. All are probably right to some degree, as evolution is opportunistic and different regulatory mechanisms have potential different selectable outcomes. Generating datasets that enable us to integrate such knowledge and output better models (also drawing on work in population genetics, structural genomics, and systems biology) will allow a better understanding of biology, with evolution at its core. This article aims to continue a dialog between experimental and computational researchers towards the aim of a better understanding of genomes, and to encourage experimentalists to provide the community with even more varieties of genomic data. Acknowledgements I thank Axel Meyer for interesting discussions and for providing Figure Figure2,2 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Science. 2004 Mar 19; 303(5665):1870-3.
[Science. 2004]Proc Natl Acad Sci U S A. 2003 Apr 29; 100(9):5268-73.
[Proc Natl Acad Sci U S A. 2003]Science. 2001 Mar 2; 291(5509):1786-9.
[Science. 2001]Naturwissenschaften. 2004 Jun; 91(6):277-90.
[Naturwissenschaften. 2004]Sci Am. 1999 Sep; 281(3):64-73.
[Sci Am. 1999]Naturwissenschaften. 2004 Jun; 91(6):277-90.
[Naturwissenschaften. 2004]Mol Phylogenet Evol. 1996 Jun; 5(3):567-75.
[Mol Phylogenet Evol. 1996]Nature. 2002 Dec 5; 420(6915):563-73.
[Nature. 2002]Genome Biol. 2001; 2(1):INTERACTIONS1001.
[Genome Biol. 2001]Nat Genet. 2003 Jun; 34(2):177-80.
[Nat Genet. 2003]Science. 1975 Apr 11; 188(4184):107-16.
[Science. 1975]Proc Natl Acad Sci U S A. 2003 Nov 11; 100(23):13356-61.
[Proc Natl Acad Sci U S A. 2003]Science. 2002 Apr 12; 296(5566):340-3.
[Science. 2002]Mol Biol Evol. 2002 Nov; 19(11):1991-2004.
[Mol Biol Evol. 2002]BMC Bioinformatics. 2005 May 27; 6():127.
[BMC Bioinformatics. 2005]J Mol Biol. 2002 Jun 7; 319(3):729-43.
[J Mol Biol. 2002]Science. 2003 Nov 21; 302(5649):1401-4.
[Science. 2003]Science. 1975 Apr 11; 188(4184):107-16.
[Science. 1975]Proc Natl Acad Sci U S A. 2003 Nov 11; 100(23):13356-61.
[Proc Natl Acad Sci U S A. 2003]Nat Genet. 2003 Jun; 34(2):177-80.
[Nat Genet. 2003]