Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Aug 2009; 19(8): 1404–1418.
PMCID: PMC2720184

Evolution of gene function and regulatory control after whole-genome duplication: Comparative analyses in vertebrates

Abstract

The significance of whole-genome duplications (WGD) for vertebrate evolution remains controversial, in part because the mechanisms by which WGD contributed to functional evolution or speciation are still incompletely characterized. Fish genomes provide an ideal context in which to examine the consequences of WGD, because the teleost lineage experienced an additional WGD soon after divergence from tetrapods and because five teleost genomes are available for comparative analysis. Here we present an integrated approach to characterize these post-duplication genomes based on genome-scale synteny, phylogenetic, temporal, and spatial gene expression and on protein sequence data. A minimum of 3%–4% of protein-coding loci have been retained in two copies in each of the five fish genomes, and many of these duplicates are key developmental genes that function as transcription factors or signaling molecules. Almost all duplicate gene pairs we examined have diverged in spatial and/or temporal expression during embryogenesis. A quarter of duplicate pairs have diverged in function via the acquisition of novel protein domains or via changes in the subcellular localization of their encoded proteins. We compared the spatial expression and protein domain architecture of zebrafish WGD-duplicates to those of their single mouse ortholog and found many examples supporting a model of neofunctionalization. WGD-duplicates have acquired novel protein domains more often than have single-copy genes. Post-WGD changes at the gene regulatory level were more common than changes at the protein level. We conclude that the most significant consequence of WGD for vertebrate evolution has been to enable more-specialized regulatory control of development via the acquisition of novel spatiotemporal expression domains. We find limited evidence that reciprocal gene loss led to reproductive isolation and speciation in this lineage.

The availability of an ever-increasing number of complete genome sequences has fuelled research into the evolution and function of genomes as a whole. Eukaryotic genomes have been modified over the course of evolution not only by single gene duplications (Ohno 1970; Lynch 2002) but also by several rounds of whole-genome duplication (WGD) (Jaillon et al. 2004; Dehal and Boore 2005), which were typically followed by extensive gene loss. These WGD events would thus have had significant effects on gene regulatory control and protein–protein interactions. Nonetheless, WGD are comparatively common and have been described in plants (Vandepoele et al. 2002), yeast (Kellis et al. 2004), the ancestor of vertebrates (Dehal and Boore 2005), teleost fishes (Jaillon et al. 2004; Le Comber and Smith 2004), and the frog Xenopus laevis (Sémon and Wolfe 2008). Furthermore, polyploidy can be artificially induced by heat shock in rainbow trout and common carp, and triploid fish are commonly generated in aquaculture to achieve sterility and thus avoid interbreeding with native fish stocks (Le Comber and Smith 2004). The fact that ploidy levels can be so easily manipulated in teleost fishes and that several rounds of WGD and subsequent gene loss have occurred in vertebrate evolution challenges our experience that knocking-down or altering individual genes can suffice to disrupt normal vertebrate development and function.

Studying the function of post-duplication genomes can thus contribute to our understanding of how genomes evolve as a whole, which components are amenable to change, and by which mechanisms new functions or regulatory control evolve (e.g., Woolfe and Elgar 2007). In terms of biodiversity, loss of alternative copies of a duplicated locus has been suggested to promote within-population mating and to lead to reproductive isolation between populations. Speciation dynamics and gene loss patterns in polyploid yeast, for example, provide strong support for the “divergent resolution” hypothesis of speciation (Wong et al. 2002; Scannell et al. 2006). There is some evidence that reciprocal gene loss after WGD could have also contributed to the radiation of teleost fishes (Sémon and Wolfe 2007). These fishes experienced a WGD event during their early evolution, some 305–450 million years (Myr) ago (Amores et al. 1998; Christoffels et al. 2004; Hoegg et al. 2004; Vandepoele et al. 2004). Today, teleost fishes constitute the most speciose vertebrate lineage, with over 22,000 extant species (Taylor et al. 2003). The last WGD event has thus often been implicated as a driver for the radiation and diversification of this lineage (Amores et al. 1998; Meyer and Schartl 1999), although others have questioned the significance of this WGD for generating species diversity (e.g., Robinson-Rechavi et al. 2001).

The teleost-specific WGD offers great potential for understanding the evolution of this lineage as well as for understanding vertebrate genome evolution and function more generally. However, to date there have been no systematic, genome-scale studies investigating which genes have been retained in duplicate in different teleost lineages. Evolutionary theory predicts that most gene duplicates would rapidly become nonfunctional and lost (Force et al. 1999). For example, gene retention after WGD in the pufferfishes Tetraodon nigroviridis and Takifugu rubripes may be as low as 1%–5% (Aparicio et al. 2002; Jaillon et al. 2004), although there has been debate regarding these estimates (Brunet et al. 2006). Analysis of individual gene families in the zebrafish Danio rerio suggested that up to 20% of gene duplicates may have been retained from the last teleost-specific WGD event (Postlethwait et al. 2000, 2004; Woods et al. 2005). Previous studies have not been able to determine the proportion of retained duplicate genes, because these studies either were limited in the number of families investigated or did not distinguish between gene duplicates derived by WGD and those derived by gene-specific duplication events thereafter. Nevertheless, vertebrate genomes have been shown to contain a large number of anciently duplicated genes, many of which are expected to have originated by WGD (Blomme et al. 2006; Brunet et al. 2006).

In this study, we performed comparative genome analyses, including gene order (synteny) and phylogenetic analyses, in D. rerio, T. rubripes, T. nigroviridis, medaka (Oryzias latipes), and stickleback (Gasterosteus aculeatus) to identify gene duplicates retained from the last, teleost-specific WGD. We show that a minimum of 3%–4% of protein-coding genes have been retained in duplicate in each of the five fishes. Almost all D. rerio duplicate gene pairs examined here differed in spatiotemporal expression during embryogenesis, suggesting significant changes in gene regulatory control after WGD. The observed expression data support a model of neofunctionalization (Lynch et al. 2001), with many duplicates having acquired novel expression domains after duplication, although the signatures indicative of neofunctionalization are also easier to detect than those of other evolutionary fates, such as subfunctionalization. A quarter of duplicate pairs encode proteins with different protein domain architecture and/or subcellular localization, suggesting functional differences between their protein products. Our assessment of changes in regulatory control versus changes in protein sequence indicates that WGD primarily led to increased specialization of gene regulatory control of development, although some functional variation in coding sequence was observed. These data shed new light on the impact of WGD on vertebrate genome evolution and on how these post-duplication genomes have evolved new functionalities.

Results

Identifying sister chromosome regions based on conserved gene order

Paralogs derived from WGD are expected to be located in chromosome regions of shared ancestry. To help identify such regions, we first determined fish–human gene homology relationships by exhaustive, “all-against-all,” sequence similarity searches, saving all matches with an E-value <1 × 10−3, assuming that true homologs will have more-significant E-values (McLysaght et al. 2002; Christoffels et al. 2004). Some 20,300 fish proteins had a match in the human proteome using this threshold, while some 18,500 human proteins had a match in each of the five fish proteomes (Table 1). Approximately 11,200 fish–human protein pairs were reciprocal best hits in each of these comparisons (Table 1). Fish–human and human–fish unidirectional and reciprocal best hits were used to build the initial gene (positional) homology matrices and to identify collinear regions in the genomes of fish and human, followed by a search for additional homologs that map to the identified syntenic regions (Fig. 1). Due to the lack of a physical genomic map for T. rubripes, synteny analyses could not be performed in this species. Approximately 9300 positional homologs, namely, genes that share significant sequence similarity as well as conserved gene order and chromosome location across genomes, were identified per fish–human genome comparison (Table 1). Some 2100 human gene loci had positional homologs in two fish genome regions, suggesting that these fish loci are duplicate loci retained from WGD (Table 1). Synteny maps for each fish–human genome comparison illustrating the positional homologs identified in this study are available in Supplemental Figure 1. The syntenic map between T. nigroviridis and human corresponds largely with the one by Jaillon et al. (2004).

Table 1.
Comparative analyses of the genomes and proteomes of five teleost fishes
Figure 1.
Computational analysis pipeline for the identification of gene duplicates retained from whole-genome duplication in five teleost fish genomes. Synteny and phylogenetic analyses provide independent lines of evidence for origin by WGD. The level of support ...

Protein family tree topologies consistent with origin by WGD

As a second line of evidence for origin by WGD, we used phylogenetic approaches and the protein family trees of Ensembl Compara v48. Given that D. rerio is the most basal fish taxon in our comparisons (Metscher and Ahlberg 1999), we searched for duplication nodes that predate the split between D. rerio and the other fish taxa (Fig. 1). Approximately 20,000 proteins per fish species were represented in the Ensembl v48 protein families, corresponding to ~89% of all protein-coding genes in these taxa. Some 1500 gene pairs in each of the five fish genomes showed evidence for origin by WGD based on the topology of the protein family tree (Table 1). Combining the results from the synteny and phylogenetic analyses, in each of the five fish taxa, we identified some 680 duplicates, corresponding to 3%–4% of protein-coding loci, with strong support for origin by WGD (Table 2; for the complete list, see Supplemental Table 1). Duplicates showing strong support for origin by WGD in zebrafish were used in computational and experimental analyses to characterize their present functions.

Table 2.
Combined results from synteny and phylogenetic analyses

Presence of homologs in basal metazoan lineages and yeast

We expected that retained duplicates would be enriched for “vertebrate innovations,” namely, genes that arose only in early vertebrate evolution, and that such duplicates would underlie diversification in this lineage. To test this hypothesis, we looked for homologs in basal metazoa and yeast. Of the 12,533 protein families in fish, 754 contained a WGD-duplicate. Families containing WGD-duplicates were significantly larger than the average fish protein family (Table 3). We then determined how many of these families also contained sequences in Ciona intestinalis or Ciona savignyi, two invertebrate chordates; the insects Aedes aegypti, Anopheles gambiae, or Drosophila melanogaster; the worm Caenorhabditis elegans; or the yeast Saccharomyces cerevisiae. Protein families that contained a WGD-duplicate were more likely to also contain invertebrate chordate, insect, worm, or yeast sequences than families that did not contain a WGD-duplicate (Table 3). For example, 66% of protein families containing a WGD-duplicate also included a Ciona sequence, while only 39% of all fish-containing protein families contained Ciona sequences. The corresponding values for families containing a homolog in yeast were 23% and 15%, respectively. To exclude the possibility that WGD-duplicates preferentially belong to ancient metazoan protein families simply because WGD-duplicates tend to belong to families of greater size, we examined the relationship between family size and taxonomic representation among the sequences in the family. The average number of invertebrate sequences is approximately threefold greater in families that contain WGD-duplicates than in families that do not (Table 3). Nevertheless, the percentage of families containing invertebrate sequences was consistently higher for families containing a WGD-duplicate, suggesting that the probability of a protein family containing invertebrate sequences was not purely a matter of family size.

Table 3.
Comparison of family sizes for all Ensembl protein families containing fish sequences, with those containing a WGD-duplicate

Loss of duplicate gene copies in different teleost lineages and reciprocal gene loss

For each of the 754 protein families containing a WGD-duplicate, we inferred the number of gene losses since WGD along different teleost branches (Fig. 2). Given the uncertainties regarding the existence of a monophyletic clade “Smegmamorpha” (NCBI taxonomy vs. Metscher and Ahlberg 1999; Miya et al. 2003; Kawahara et al. 2008), gene losses were mapped onto two plausible tree topologies. Given that D. rerio is the most basal taxon in our comparisons, one of the D. rerio duplicate gene copies was arbitrarily designated the reference point against which the presence of the locus in the other fish taxa and in the sister clade was assessed. We marked all instances where a fish taxon or clade was inferred to have lost a copy of the locus, taking into account both plausible species tree topologies (Fig. 2). There was no evidence to suggest that different teleost species had retained a significantly different number of duplicate gene copies (based on studentized residuals, there were no outliers in the group). Among these gene losses, we identified 154 instances where two teleost species had lost alternative copies of the same locus, so-called “reciprocal gene losses” (not marked in Fig. 2). Of these 154 reciprocal gene loss events and assuming the existence of a clade Smegmamorpha, only 10 events were consistent with the loss having occurred at the time of species divergence (labeled RL in Fig. 2A). Assuming that O. latipes is basal to a clade containing G. aculeatus and the Tetraodontiformes, even fewer (seven) reciprocal gene loss events were consistent with the loss having occurred at the time of species divergence (labeled RL in Fig. 2B).

Figure 2.
Inferred gene losses along different teleost lineages. For each of the 754 protein families containing a WGD-duplicate, one of the D. rerio duplicate gene copies was arbitrarily designated the reference point against which the presence of the locus in ...

Enrichment of Gene Ontology (GO) categories among WGD-duplicates

To test whether the probability of duplicate gene retention was related to gene function, we performed gene set enrichment analysis in D. rerio. Of the 615 duplicate gene pairs in this species, 674 loci had GO annotations in the Zebrafish Information Network (ZFIN) gene association file (Sprague et al. 2006), and we compared their GO terms to the complete set of GO annotations in ZFIN. Ninety-seven GO terms were significantly enriched among WGD-duplicates with adjusted P < 0.1 (Table 4; for a complete list of significant GO terms, see Supplemental Table 2). Enriched terms included, for example, “calcium ion transport,” “transcription” and “transcription factor activity,” “integrin-mediated signaling pathway,” and “growth factor activity” (Table 4; Supplemental Table 2). Genes annotated with function in “calcium ion transport” included, for example, the annexin genes anxa1a/anxa1b and anxa3a/anxa3b, the ATPase genes atp2a2a/atp2a2b, and the calbindin 2 genes calb2/calb2l. Many other genes retained in duplicate in zebrafish have annotated functions during development and include, for example, fzd8a and fzd8b and otx1 and otx1lb.

Table 4.
A selection of significant Gene Ontology terms identified by gene set enrichment analysis comparing the Gene Ontology annotations of Danio rerio genes retained from whole-genome duplication to the complete set of D. rerio GO annotations in ZFIN

Expression localization of WGD-duplicates

Having characterized the types of genes retained in duplicate from the last, teleost-specific WGD, we assessed the extent to which these duplicates have acquired different functional roles as indicated by differences in the spatial domains of expression. For D. rerio, the ZFIN gene expression database is a comprehensive public resource of mRNA in situ hybridization and RT-PCR expression data (Sprague et al. 2006). A total of 97 WGD-gene pairs have expression localization data available in ZFIN, encompassing 949 individual expression observations (Supplemental Table 3). We categorized the expression patterns of duplicate gene copies as being the same, partially overlapping, or different and as being spatially restricted or ubiquitous throughout the animal (Table 5; Supplemental Table 3). For example, the WGD-duplicates fzd8a and fzd8b, which encode wnt signaling receptors, localize to different anatomical regions during the segmentation and pharyngula stages (Table 5; Supplemental Table 3). Approximately 65% of all expression observations listed distinct or only partially overlapping expression localizations for duplicate gene copies, while 5% described the same expression localization with both copies being spatially restricted and 30% described nonspatially restricted expression for both gene copies. Of the 97 WGD-gene pairs, 87% differed in expression localization during at least one developmental stage, while only 13% shared the same expression domain during all developmental stages investigated thus far. The former value may overestimate the true percentage of gene pairs with similar expression localizations, as some gene pairs may differ at developmental stages or under conditions not examined to date. Of these 13% of gene pairs with common expression localization, the majority (62%) were expressed in a nonspatially restricted manner throughout the animal. The probability of detecting differences in expression localization was not affected by the number of expression observations, as five out of the six gene pairs with the greatest number of expression observations (>20) showed the same expression localization (data not shown) and no other trends in the data suggested such an effect.

Table 5.
Expression localization of duplicate gene pairs derived from whole-genome duplication in Danio rerio

During the early stages of development (the zygote, cleavage, and blastula stages), 97% of WGD-duplicates shared the same expression localization. From gastrula to juvenile stages, the majority of WGD-duplicates (~73%) showed either distinct or only partially overlapping expression domains. The percentage of gene duplicates with distinct expression domains was greatest (~25%) during the segmentation and pharyngula stages (Supplemental Table 3).

Temporal expression of WGD-duplicates during embryogenesis

WGD-duplicates may differ not only in the spatial domain of expression but also in their temporal profile of expression during embryogenesis, especially since these loci are enriched for transcription factors and signaling genes with important functions in development. To assess this possibility, we examined two microarray time-course experiments of zebrafish embryogenesis (Mathavan et al. 2005; S Wilkins, M Kerr, M Köppen, B Gardiner, D Taylor, C Simons, M Landsberg, S Grimmond, C Heisenberg, and A Perkins, in prep.). Using stringent sequence comparisons, we identified 67 gene pairs that were represented by microarray probes that discriminated between duplicate gene copies. Only four of the 67 gene pairs showed significant coregulation across embryogenesis based on significant Pearson correlation coefficients and significance thresholds determined from random probe sets following the approach of Blanc and Wolfe (2004), while the remaining 63 showed differences in the temporal expression of the two gene copies (Fig. 3; Supplemental Table 4). Of these latter 63, five gene pairs showed inverse expression profiles with Pearson correlation coefficients exceeding the significance threshold (Fig. 3; Supplemental Table 4).

Figure 3.
Temporal expression of zebrafish duplicated genes. Expression profiles were grouped into three categories (coregulated, inversely coregulated, temporal differences) based on their Pearson correlation coefficients (r) and statistical significance thresholds ...

Spatiotemporal expression of WGD-duplicates

For two gene pairs with distinct temporal expression profiles (zgc:73223 and g12; fbxl14 and fbxl14a), no spatial expression data were available in ZFIN. To assess whether these gene pairs also differed in their spatial domains of expression, we performed in situ hybridization experiments using established methods (Supplemental Methods; Wilkins et al. 2008). Both gene pairs showed evidence for spatial differences in expression at 24 hours post-fertilization (hpf) (Supplemental Fig. 2), although further experiments are required to fully characterize the expression domains of these loci. For 16 additional gene pairs with temporal expression data, spatial expression data were available in ZFIN. In all 18 cases in which both sources of expression data were available, gene pairs differed in temporal and/or spatial expression with 15 gene pairs differing in both (Supplemental Tables 3, 3,44).

Comparison of expression domains to those of the single mammalian ortholog

To assess how the expression of duplicated genes changed after WGD, we compared their spatial expression domains in zebrafish to those of the single mouse ortholog, assuming that the expression domains of the mouse ortholog reflect those of the vertebrate ancestor before the divergence of the tetrapod and teleost lineage. While we currently lack expression data for a ray-finned fish that diverged before the WGD to perform such comparison, the time of independent evolution after the divergence of lobe- and ray-finned fishes and before the teleost WGD may also have contributed to expression variation between mouse and fish, independent of the effects of the teleost-specific WGD. To make this comparison, we chose the time when organogenesis is essentially completed, which in zebrafish occurs between 24 and 48 hpf and in mouse between day 9.5 and 16.5 post-coitum. A total of 47 duplicate gene pairs had corresponding in situ hybridization data in mouse available in the Mouse Genome Informatics (MGI) database (Table 6; Supplemental Table 5; Bult et al. 2008). We excluded nine comparisons from the analysis because transcripts from these loci were expressed in a nonspatially restricted manner in zebrafish or mouse, and we could not exclude the possibility that this was an artefact of poor probe specificity. Similarly, where different sources reported spatially restricted versus ubiquitous expression throughout the whole embryo, the spatially restricted expression was used to represent the locus, assuming again that ubiquitous expression was an artefact of poor probe specificity. Of the remaining 38 gene pairs, 20 (53%) had novel expression domains not found in the mouse ortholog, supporting a model of neofunctionalization (Table 6). Four gene pairs (11%) had expression domains that were subsets of those of mouse, supporting a model of subfunctionalization, while 10 gene pairs (26%) showed evidence to support both neo- and subfunctionalization. Finally, four zebrafish gene pairs (11%) showed the same expression pattern during this developmental time point, potentially indicating functional redundancy.

Table 6.
Comparison of spatial expression domains of 38 zebrafish duplicated genes and their single mouse ortholog

Domain architecture of proteins encoded by WGD-duplicates

Besides changes in expression and hence regulatory control, important functional changes may also occur at the protein-coding sequence level. To assess this possibility, we first analyzed the protein domain architecture of proteins encoded by WGD-duplicates by searching protein sequences against a database of known protein domain motifs (Pfam-A). Using this strategy, 493 of the 615 gene pairs in zebrafish could be annotated with domain information (Fig. 4). Where the locus is known to be subject to alternative splicing, we compared the domain architecture of all alternatively spliced products. In the majority of cases (422), both duplicate loci encoded proteins with the same protein domain architecture. We found 39 instances where the protein products differed in the types of domains and 32 where they differed in the number of protein domains, but no examples where proteins differed in the arrangesment of domains (Fig. 4). Given the limited sequence coverage of the zebrafish transcriptome, for the 71 gene pairs that showed differences in domain architecture, we investigated whether there was evidence for the “missing” domains at the genomic sequence level, even if the transcripts encoding these domains have not yet been observed. In seven cases, the genomic sequence had the potential to encode for the missing domains, while in the remaining 64 cases we found no evidence of the missing domains at the genomic sequence level (Fig. 4).

Figure 4.
Comparison of the protein domain architecture of zebrafish WGD-duplicates and those of their single mouse ortholog. (A) A total of 493 zebrafish gene pairs could be annotated with protein domain information when searched against the Pfam-A database of ...

Comparing the domain architecture of zebrafish duplicated genes to those of the single mouse ortholog and assuming that the domain architecture of the mouse ortholog represents that of the ancestral locus prior to WGD, we found 31 cases (48%) in which one of the zebrafish duplicated gene loci had lost a functional domain, 14 cases (22%) in which it had gained a novel protein domain, and 18 cases (28%) with a more complex history of domain gain and loss (Fig. 4). In one other case, the two zebrafish loci encoded complementary domain subsets: zgc:158388 encodes two alternatively spliced proteins with the domain architecture [PH,ArfGap] and ENSDARG00000039386 encodes a single protein with domain architecture [ArfGap,Ank,Ank,Ank,Ank], while the mouse ortholog Acap3 encodes a single protein of domain architecture [PH,ArfGap,Ank,Ank,Ank,Ank], potentially indicating partitioning of the ancestral gene functions among the two zebrafish gene copies. The complete protein domain annotation in mouse and zebrafish is available as Supplemental Table 6. Given the long time since WGD and the possibility that orthologs may diverge in protein domain architecture for reasons other than the effects of WGD, we also compared the protein domain architecture of one-to-one zebrafish–mouse orthologs. In total, we identified 5878 zebrafish–mouse one-to-one orthologs of which 8.8% differed in the types of domains they encoded, while another 6.5% differed in the number of protein domains they encoded. The remaining 84.7% showed the same protein domain architecture in mouse and zebrafish. In contrast, 11.1% of WGD-duplicates differed from the single mouse ortholog in the types of domains encoded, while 11.7% differed in the number of protein domains. Only 77.2% of WGD-duplicates had the same protein domain architecture as the single mouse ortholog, indicating that more WGD-duplicates have diverged in protein domain architecture than have single-copy genes (χ2 = 34.3696, degrees of freedom [df] = 1, P = 4.558 × 10−9).

Subcellular localization of proteins encoded by WGD-duplicates

In addition to changes in domain architecture, changes in the subcellular localization of proteins may bear important functional consequences. For example, secreted proteins may act as messengers, while membrane-associated proteins may function as channels or receptors. To predict the subcellular localization of proteins encoded by WGD-duplicates, we used computational methods and the presence or absence of signal peptides and transmembrane domains. A total of 529 WGD-gene pairs could be annotated with information regarding the subcellular localization of the encoded proteins, and in 73 cases, subcellular localization differed between proteins encoded by duplicate gene loci (Fig. 5; Supplemental Table 7). For example, sema3fa (ENSDARG00000011163) encodes a type II membrane protein, while its WGD-duplicate sema3fb (ENSDARG00000055373) encodes a soluble intracellular protein (Supplemental Table 7). Semaphorins are important receptors and signaling molecules in neural differentiation (Pasterkamp and Kolodkin 2003; Yu and Moens 2005). The different subcellular localization of sema3fa and sem3fb suggest that the proteins encoded by this duplicate gene pair carry out fundamentally different functions.

Figure 5.
Comparison of the subcellular localization of proteins encoded by zebrafish WGD-duplicates. Subcellular localization was assessed using computational methods, SignalP (Bendtsen et al. 2004) and TMHMM (Krogh et al. 2001). 14% of WGD-duplicates encode proteins ...

Changes in regulatory control versus changes in protein-coding sequence

Finally, we investigated the relative importance of changes in regulatory control as indicated by differences in the spatiotemporal expression of WGD-duplicates versus changes in protein function as indicated by differences in the domain architecture or subcellular localization of proteins encoded by WGD-duplicates. In summary, 93% of the 138 gene pairs investigated differed in spatial and/or temporal expression (Table 7). In contrast, only 24% of 545 gene pairs encoded proteins that differed in domain architecture and/or subcellular localization (Table 7). Of the 134 gene pairs for which we had expression and protein domain/ localization data, 24 gene pairs (18%) differed in both. This proportion is slightly lower than one would expect if gene pairs differed in regulatory control and protein domain/localization according to their individual proportions (0.93 × 0.24 = 0.22, or 22%), indicating that after some 400 Myr of evolution, duplicates retained after WGD either differ in regulatory control or in protein function, but not usually in both. Comparison to the single mouse ortholog showed that many zebrafish gene pairs have acquired novel expression localizations and protein domain architectures, potentially indicating that neofunctionalization has been a more common evolutionary fate than subfunctionalization or redundancy. Our differential ability to identify these alternate evolutionary fates may, however, have biased our results in favor of neofunctionalization.

Table 7.
Number of zebrafish gene pairs derived from WGD showing changes in regulatory control versus changes in protein function

Discussion

The functional significance of duplicate gene loci, including those derived by WGD in teleosts, has received much attention recently (MacCarthy and Bergman 2007; Wapinski et al. 2007; Conant and Wolfe 2008; Kleinjan et al. 2008). Several models have been proposed for the evolution of genes after duplication, including models describing the evolution of new functions (neofunctionalization), the partitioning of ancestral functions (subfunctionalization), or a combination of both (subneofunctionalization) (Lynch and Force 2000; He and Zhang 2005; Roth et al. 2007). Most duplicates are assumed to diverge in function, thus avoiding genetic redundancy (Blanc and Wolfe 2004; Sharma et al. 2006; Kleinjan et al. 2008), but some duplicates can be subject to selection for increased gene dosage (Conant and Wolfe 2008). Functional divergence may occur through changes in the substrate of an enzyme, the binding partners of a protein or the response to protein binding, the subcellular localization, and/or the spatial or temporal expression of the locus. Despite the significance of studying the functional divergence of WGD-duplicates for our understanding of genome and gene function evolution, WGD-duplicates in vertebrates are still poorly characterized.

Teleost fishes provide a unique opportunity to investigate the evolution of gene function after genome duplication in vertebrates, as five genomes are available for comparative analyses, extensive expression data are available for D. rerio, and both D. rerio and O. latipes are important model organisms for human developmental biology. In D. rerio a number of duplicated developmental genes—such as eng1a and eng1b, identified as WGD-duplicates in this study, and sox9a and sox9b—have already been shown to have partitioned subfunctions compared with their human orthologs (Force et al. 1999; Cresko et al. 2003). However, it remains unknown whether subfunction-partitioning is a common mechanism underlying the retention of gene duplicates in teleosts.

Thus, we set out to identify as many WGD-duplicates as possible to allow investigation of their functional evolution after duplication. We took special care to distinguish WGD-duplicates from paralogs of other origin, because, based on evidence from yeast, such duplicates are subject to different evolutionary constraints (Wapinski et al. 2007). Given the long time since the teleost-specific WGD, teleost genomes are expected to have undergone multiple rearrangements (Kasahara et al. 2007) and are likely to have lost many anciently duplicated loci. To overcome these challenges, we have used two independent approaches, based on conserved gene order (synteny) and phylogenetic inference. Using a comparative approach and gene order information from multiple related genomes, following an approach similar to Simillion et al. (2004) and Van de Peer (2004), we were able to map approximately half of all protein-coding genes in fish to colinear chromosomal regions with human; using a single fish genome recovered significantly fewer collinear genomic regions and 8%–14% fewer inter-genome positional homologs (data not shown). Paralogous gene copies that mapped to sister chromosome regions in fish were retrieved as potential WGD-duplicates.

However, given the possibility that paralogs may locate to chromosomal sister regions by a process other than shared ancestry, for example, due to genomic rearrangements, it was important to test the co-orthology of potential fish WGD-duplicates using phylogenetic methods. To exclude gene duplicates whose tree topologies were consistent with origin by WGD, but which originated by gene-specific duplication after WGD instead, we required support for origin by WGD in both the synteny and phylogenetic analyses. This strategy identified some 680 gene pairs, or 3%–4% of protein-coding gene loci in each of the five fish genomes, and represents the first genome-scale estimate of the number of gene duplicates retained from this ancient WGD event. Previous studies were based either on the analysis of individual gene families or on the presence of paralogs of any type (Aparicio et al. 2002; Jaillon et al. 2004). Our findings represent a minimum estimate of the true number of duplicate genes retained after WGD, as our analyses are likely biased toward identifying more-conserved gene pairs. The quality of current teleost genome annotations also likely impacts on the number of WGD-duplicates identified in such analyses. Complex genomic rearrangements could have precluded identification of collinear regions for some duplicates, and errors with phylogenetic tree inference could have precluded identification of co-orthology relationships for others. For example, we could not recover synteny regions for sox9a and sox9b despite published literature suggesting that these paralogs have been derived from WGD (Cresko et al. 2003). Thus, we expect that our analyses will underestimate the true number of gene duplicates retained from the last, teleost-specific WGD. However, the approach we have implemented here identifies substantially more WGD-gene pairs than has any previous study, suggesting that we have reduced any potential bias associated with these limitations as much as currently possible.

Acknowledging these limitations, we found no significant differences in the number of gene duplicates retained from WGD in the five teleost species. We mapped reciprocal gene losses describing the loss of alternative copies in different teleost lineages and found that very few were consistent with the loss having occurred at the time of species divergence. These results contrast with those of Sémon and Wolfe (2007), who found many instances of reciprocal losses between D. rerio and T. nigroviridis and concluded that reciprocal gene loss was an important driver in the radiation of teleosts. Our analysis included three additional teleost genomes, and these additional data allowed us to assign a relative time to the inferred gene losses and thus test whether the time of the loss coincided with species divergence. Our results show that few of these losses coincided with the time of species divergence so that most losses are unlikely to have contributed to reproductive isolation in early teleosts. Nevertheless, simple calculations show that a small number of reciprocal gene losses can significantly reduce “hybrid” fitness and could thus lead to reproductive isolation between incipient species. For example, 25% of the offspring (F1) of parents with alternative copies of a WGD-duplicated locus would not inherit any copy of the locus. With each additional reciprocal gene loss, F1 fitness decreases further. It is therefore possible that even just a few of these losses could have been sufficient to drive reproductive isolation and speciation, especially if the loci involved were dosage-sensitive. Nevertheless, our data suggest that the more significant contribution of WGD for teleost radiation was the capacity for functional divergence of post-duplication genomes, potentially allowing exploration of new ecological niches and reducing the risk of extinction as suggested by Crow and Wagner (2006).

Here, we show that the probability for retention after WGD is correlated with gene function as retained duplicates are enriched for function in signaling, transcription, calcium ion transport, and metabolism. A general pattern emerges where genes with function in transcription and regulatory control are preferentially retained after WGD in yeast, plants and vertebrates (Blanc and Wolfe 2004; Davis and Petrov 2005; Blomme et al. 2006). It is possible that these types of genes have biochemical features that make them more amenable to evolving novel functional roles as proposed by Conant and Wolfe (2008). Alternatively, duplication of other types of genes, so-called “duplication-resistant genes,” may be prohibited because of immediate detrimental effects of duplication (Conant and Wolfe 2008). We further show that retained duplicate genes are enriched for ancient metazoan or eukaryotic genes, indicating that WGD did not capitalize on vertebrate-specific innovations. Instead, many of the families that contain WGD-duplicates in teleosts already existed during early metazoan evolution, and these families have expanded in vertebrates not only via preferential retention after WGD but also via gene-specific duplications thereafter. These ancient families are thus particularly amenable to duplication, but the characteristics underlying their “duplicability” are yet to be determined.

To understand the functional significance of WGD-duplicates, we first examined the spatiotemporal expression of duplicated gene loci in D. rerio and found that almost all gene pairs examined have changed regulatory control since duplication. We also examined the protein products of WGD-duplicates in D. rerio. About a quarter of WGD-duplicates differed in domain architecture and/or subcellular localization. We found that duplicates tend to differ in regulatory control or in domain architecture/protein subcellular localization, but not in both. At present, it is not clear why functional adaptation should happen at the regulatory or at the protein level, but not at both. It is possible, however, that these results are biased by our inability to fully characterize the protein functions of WGD-duplicates. Comparison of the spatial expression domains in D. rerio to those of the single mouse ortholog suggest that many D. rerio duplicates have acquired novel expression domains and that, at the regulatory level, neofunctionalization is more common than subfunctionalization. Similarly, comparing the protein domain architecture of D. rerio WGD-duplicates to those of the single mouse ortholog, we found many examples describing the acquisition of a novel protein domain, the loss of a domain in one gene copy, and a complex history of domain loss and gain, but only one example of the loss of complementary domains, potentially indicating that, at the protein level, neofunctionalization may also be the prominent fate for the evolution of WGD-duplicates. However, as discussed by Huminiecki and Wolfe (2004), it is more difficult to meet the conditions to identify subfunctionalization than those for neofunctionalization, potentially introducing an ascertainment bias that would compromise accurate quantification of the relative rates of neo- versus subfunctionalization. In particular, proteins that encode complementary domain subsets may not have been identified as co-orthologs in the phylogenetic analyses due to problems with family classification or sequence alignment. At the regulatory level, we may have failed to recognize gene pairs that were preserved by quantitative subfunctionalization, or for stoichiometric reasons, due to insufficient sensitivity of the in situ and microarray analyses. In addition, sub- and neofunctionalization are not necessarily mutually exclusive, and functional divergence can also occur independent of duplication, as demonstrated by the differences in spatial expression of human–mouse one-to-one orthologs (Huminiecki and Wolfe 2004). Furthermore, the mechanism underlying the initial retention of the duplicate gene copy, such as subfunctionalization, may differ from the processes that have shaped the evolution of the gene pair thereafter, such as acquisition of novel functions. We assumed here that the spatial expression domains and domain architecture of the mouse orthologs can be used to infer those of the ancestral loci before the teleost-specific WGD, but we caution that interpreting current expression patterns and protein domain architectures in light of their past evolution can be misleading.

Our analyses demonstrate that most WGD-duplicates have acquired changes in function, either via changes in regulatory control or via changes in protein function. In both cases we found more examples supporting a model of neofunctionalization, or the evolution of novel functionalities, than one of subfunctionalization, or the partitioning of ancestral functions. We also found that changes at the regulatory level were much more common than changes at the protein level. To our knowledge, this is the first study to investigate, on a large scale and in any vertebrate, the spatiotemporal expression and the protein products of WGD-duplicates. Our aim was to develop a more general understanding of the functional significance of duplicates retained from WGD and of the evolutionary opportunities that polyploidization offers. We have presented here a significant attempt to identify gene duplicates retained from WGD in five teleost genomes as well as information illuminating their functional evolution after duplication. We conclude that, in the teleost lineage, the most significant contribution of WGD has been to allow more-specialized regulatory control of development, typically via the acquisition of novel spatial expression domains. These results contribute to our understanding of vertebrate evolution at both the gene and genome level. Finally, a deeper understanding of the differences in gene regulatory control underlying mammals and teleosts, such as we describe here, will be fundamental to the expanding utility of D. rerio and O. latipes as model organisms for understanding human development and disease.

Methods

Gen(om)e annotations

The sequences and genomic locations of all protein-coding loci for each of the five fish genomes and human were downloaded from Ensembl v48. These included 21,322 protein-coding loci in D. rerio (assembly version 7; K Howe, pers. comm.), 20,121 in O. latipes (v1), 20,791 in G. aculeatus (v1; F Di Palma and K Lindblad-Toh, pers. comm.), 21,880 in T. rubripes (v7), and 27,919 in T. nigroviridis (v4). Eighty-seven manually curated HOX and cytokine genes were excluded from the T. nigroviridis data set. For human, we excluded genes on alternative genome assemblies and mitochondrial genes, resulting in a total of 22,762 protein-coding loci (assembly NCBI 36). The genebuilds for human, D. rerio, O. latipes, and G. aculeatus were produced by Ensembl; that for T. nigroviridis by Genoscope; and that for T. rubripes by the Fugu Genome Project.

Similarity searches

All-against-all proteome comparisons between human and fish were carried out using the SSEARCH sequence alignment tool (Pearson 1995). Where multiple protein translations exist for a single locus, only the longest protein translation was used in the analysis. To avoid exclusion of potential homologs, all protein pairs with E-values <1 × 10−3 were saved.

Synteny mapping between fish and human genomes

Identification of syntenic or collinear genomic regions in fish and human was performed using the i-ADHoRe synteny mapping tool (Vandepoele et al. 2002; Simillion et al. 2004) using four fish genomes (excluding T. rubripes) and the human genome as input. The initial gene (positional) homology matrices were built using the genomic location of unidirectional and reciprocal best hits identified by the proteome comparisons described above. Elements that were homologous in the matrix across genomes are referred to as anchor points. We used the following parameter settings in the i-ADHoRe analyses: gap size (40), cluster gap (150), q-value (0.8), probability cutoff (0.001), and minimum number of anchor points (3). This analysis provided a series of genomic anchor points, namely, fish–human positional homologs with preserved gene order and chromosome location delineating syntenic regions between human and fish.

We also tried other criteria for inferring protein homologies and building the gene homology matrices. For example, in a different analysis, we selected all those fish–human protein matches with an E-value below 1 × 10−5, alignment coverage of greater than 50%, and sequence identity of greater than 20%. Based on published literature (McLysaght et al. 2002; Christoffels et al. 2004) and the observed distribution of values for alignment cover and percentage ID across the protein space, these values are conservative estimates for retrieving potential homologs between these species. However, when using putative homologs defined in this manner to build the initial gene homology matrices, we obtained fewer genomic anchor points than when using unidirectional and reciprocal best hits only. In addition, we performed synteny analyses using only pairwise comparisons between one fish genome and the human genome. These pairwise comparisons retrieved fewer genomic anchors than the analysis based on the inclusion of all four fish genomes. Thus, the use of four fish genomes provided additional information for recovering transient and distant colinearity relationships that could not be retrieved when performing only pairwise genome comparisons with human, supporting similar findings from plant genome comparisons (Simillion et al. 2004).

Identifying additional anchors between fish and human genomes

For the comparison of synteny between T. nigroviridis and human, we were able to benchmark our analyses against previously published chromosomal sister regions (Jaillon et al. 2004). This comparison showed that we had identified fewer positional homologs than did Jaillon et al. (2004), potentially due to the parameter settings used to build the gene homology matrices being too strict. For this reason, we searched for additional positional homologs, using the known anchors identified by i-ADHoRe as a guide. For each protein-coding human locus, we determined whether i-ADHoRe had identified an anchor within a range of 20 genes along the human chromosome location in question. If so, we took these known anchors and searched the fish chromosome for potential homologs within a neighborhood of 40 genes along the fish chromosome location of the known anchor. We repeated this process for each of the fish–human comparisons. All protein pairs with an E-value below 1 × 10−5 were used as putative homologs in the search for additional syntenic anchors. This procedure increased the number of genomic anchors by 10%–20% in each of the fish–human genome comparisons.

Where a single human locus had two or more genomic anchors within the fish genome of interest, we extracted these fish–human protein triplets. These fish proteins shared significant sequence similarity with the human protein and were located within genomic regions that shared at least three fish–human homologs in close proximity, suggesting that the two fish proteins were derived from a shared ancestral chromosome by duplication, most likely as a result of the WGD. Many of these human–fish protein pairs were also unidirectional or reciprocal best hits (Supplemental Table 1). This procedure provided us with “synteny predictions” for duplicates retained from WGD (Fig. 1).

Establishing orthology/paralogy relationships using protein family trees

As an independent line of evidence, we used phylogenetic data, in particular the protein family trees of the Ensembl Compara pipeline v48 (available at ftp://ftp.ensembl.org/pub/release-48/emf/ensembl_compara/homologies/), to identify duplicate fish genes likely derived by WGD. We searched all protein family trees for the presence of an ancient duplication node at the base of teleosts, a topology consistent with origin by WGD. We excluded any trees from the analysis that did not contain any tetrapod or D. rerio sequences, as well as any trees that did not contain multiple protein sequences of at least one fish taxon. This procedure left us with 4453 of the 27,308 protein family trees. Large families were split into subtrees based on the presence of tetrapod and fish sequences, resulting in a total of 11,488 subtrees. Of these 11,488 subtrees, 522 subtrees did not have a D. rerio sequence and 7891 subtrees had only a single sequence per fish taxon (no duplicates). Excluding these subtrees, 3075 subtrees were left in the analysis. In each of these subtrees, we looked for an ancient duplication node basal to the split of D. rerio and the other teleosts. This strategy allowed us to identify instances where a fish taxon contained sequences in both clades derived from the ancient duplication. These sequences were considered to show a tree topology consistent with origin by WGD, and were extracted as our set of “phylogenetic duplicate predictions” (Fig. 1).

Gene-set enrichment analysis of WGD-duplicates

For gene-set enrichment analysis, we used the GO annotation file for zebrafish “gene_association.zfin” downloaded on 24/4/08 from the Gene Ontology site at http://www.geneontology.org/GO.current.annotations.shtml. In order to retrieve GO annotation information from the ZFIN gene association file, the Ensembl peptide IDs corresponding to these duplicate gene loci were mapped to ZFIN IDs using the Ensembl BioMart tool http://www.ensembl.org/biomart/martview. Analyses of gene function enrichment were performed using the GOstat software tool (Beissbarth and Speed 2004). The method of Benjamini and Hochberg (1995) for controlling the false discovery rate (FDR) was used to control experiment-wise error rates in the face of multiple testing. GO terms with FDR-adjusted P-values < 0.1 were considered to be significantly enriched or under-represented among WGD-duplicates compared with the total set of GO annotations for D. rerio. Frequently in GO analysis, the most significant GO terms are annotated in the same set of genes as each gene may have several, similar GO annotations. To account for such clusters of similarly annotated genes, we grouped significant GO terms that were annotated in the same set of genes or where one set of genes was a subset of the other, allowing for a maximum of five genes to differ between the annotations of different GO terms (Supplemental Methods).

Spatial analysis of gene expression

Spatial expression data for D. rerio were downloaded from ZFIN (http://www.zfin.org); those for the mouse orthologs, from MGI (http://www.informatics.jax.org/). We designed riboprobes for two gene pairs, zgc:73223 and g12 and fbxl14 and fbxl14a, that showed interesting temporal expression profiles but for which no expression data were available in ZFIN or the published literature (Supplemental Methods).

Temporal analysis of gene expression

We compared the temporal expression of WGD-duplicates in D. rerio during embryogenesis using two microarray data sets, Wilkins et al. (S Wilkins, M Kerr, M Köppen, B Gardiner, D Taylor, C Simons, M Landsberg, S Grimmond, C Heisenberg, and A Perkins,) and Mathavan et al. (2005). Both expression analyses were performed on the Compugen 16K D. rerio oligonucleotide microarray. This array contains 16,399 oligos (65-mers) representing 15,806 unique D. rerio gene clusters plus controls. To identify which of our 615 gene pairs in D. rerio were represented by unique probes on the microarray, we used the exonerate sequence alignment tool (Slater and Birney 2005) to map between zebrafish cDNA sequences (downloaded from Ensembl) and microarray probes, using a score threshold of 262 and allowing a maximum of seven mismatches between transcript and 65-mer probe. This procedure identified 445 genes that were represented by one or more Compugen probes with less than seven mismatches, and these probes did not match any other genes with the same mismatch setting. These 445 genes correspond to 67 gene pairs. The similarity between the expression profiles of duplicate gene copies was assessed using Pearson's correlation coefficient (r). To determine a suitable threshold r-value below which duplicate gene copies could be considered divergent in expression, we generated 100,000 random probe pairs from each of the two microarray data sets using sampling with replacement. Because two genes represented by a randomly selected probe pair are generally not functionally related and hence not coregulated, any gene pair with r greater than 95% of random probe pairs can be considered significantly coregulated at P = 0.05, following the same strategy as Blanc and Wolfe (2004).

Protein domain architecture and subcellular localization

To test for the acquisition or loss of a protein domain, all protein sequences encoded by WGD-duplicates in D. rerio were scanned against the Pfam-A database (Bateman et al. 2004) using the InterProScan tool (Zdobnov and Apweiler 2001). As a reference for comparison, we also performed Pfam-A protein domain analyses on 5878 zebrafish–mouse orthologs that were identified as one-to-one orthologs using the Ensembl Compara v53 database (list available on request). Protein subcellular localization was assessed using the web-based TMHMM Server v2.0 (Krogh et al. 2001), available at http://www.cbs.dtu.dk/services/TMHMM/, and SignalP-HMM and SignalP-NN applications using the web-based SignalP Server v3.0 (Bendtsen et al. 2004), available at http://www.cbs.dtu.dk/services/SignalP/. Predictions were combined using the schema established by Davis et al. (2006b). For the comparison of protein subcellular localization, we compared gene pairs only if HMM and NN methods resulted in the same signal peptide prediction. We defined subcellular localization by five categories, following the classification of Davis et al. (2006a): (1) soluble intracellular protein, no signal peptide and no transmembrane domains; (2) soluble secreted protein, signal peptide and no transmembrane domains; (3) type I membrane protein, signal peptide and a single transmembrane domain; (4) type II membrane protein, no signal peptide and a single transmembrane domain; and (5) multi-spanning membrane protein, multiple transmembrane domains.

Acknowledgments

We acknowledge financial support from the ARC Centre of Excellence in Bioinformatics (ARC CE0348221) and an award under the Merit Allocation Scheme on the APAC/NCI National Supercomputing Facility. The microarray work in this article was supported by the NH&MRC Australian Zebrafish Phenomics Facility (AZPF) grant no. 455871. We acknowledge the support of an ARC LIEF grant, LE0347607, which enabled purchase of the Compugen 16k probe set. We thank Cédric Simillion for access to the i-ADHoRe software, Simon Wong for assistance with the identification of WGD-duplicates, Melissa Davis for assistance with the analyses of protein subcellular localization, Andrés Esteban Marcos for help with the visualization of synteny maps, Milena Gongora and Mark Crowe of the SRC Microarray Facility for assistance with annotating the Compugen probe set, Aliesha Griffin for assistance with in situs, Angela Lawton for zebrafish husbandry, and Kim-Anh Lê Cao and Emanuelle Meugnier for statistical advice. We also acknowledge access to the zebrafish genome sequence, which is sequenced and annotated by The Wellcome Trust Sanger Institute and funded by the Wellcome Trust (http://www.sanger.ac.uk/Projects/D_rerio/). We thank the Broad Institute Genome Sequencing Platform and Genome Sequencing and Analysis Program, Federica Di Palma, and Kerstin Lindblad-Toh for making the data for Gasterosteus aculeatus available.

Footnotes

[Supplemental material is available online at www.genome.org.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.086827.108.

References

  • Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, et al. Zebrafish hox clusters and vertebrate genome evolution. Science. 1998;282:1711–1714. [PubMed]
  • Aparicio S, Chapman J, Stupka E, Putnam N, Chia J-m, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–1310. [PubMed]
  • Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, et al. The Pfam protein families database. Nucleic Acids Res. 2004;32:D138–D141. [PMC free article] [PubMed]
  • Beissbarth T, Speed TP. GOstat: Find statistically overrepresented gene ontologies within a group of genes. Bioinformatics. 2004;20:1464–1465. [PubMed]
  • Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol. 2004;340:783–795. [PubMed]
  • Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.
  • Blanc G, Wolfe KH. Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004;16:1679–1691. [PMC free article] [PubMed]
  • Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, Van de Peer Y. The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol. 2006;7:R43. doi: 10.1186/gb-2006-7-5-r43. [PMC free article] [PubMed] [Cross Ref]
  • Brunet FG, Crollius HR, Paris M, Aury JM, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol. 2006;23:1808–1816. [PubMed]
  • Bult C, Eppig JT, Kadin JA, Richardson JE, Blake JA. The Mouse Genome Database (MGD): Mouse biology and model systems. Nucleic Acids Res. 2008;36:D724–D728. [PMC free article] [PubMed]
  • Christoffels A, Koh EGL, Chia JM, Brenner S, Aparicio S, Venkatesh B. Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004;21:1146–1151. [PubMed]
  • Conant GC, Wolfe KH. Turning a hobby into a job: How duplicated genes find new functions. Nat Rev Genet. 2008;9:938–950. [PubMed]
  • Cresko WA, Yan YL, Baltrus DA, Amores A, Singer A, Rodriguez-Mari A, Postlethwait JH. Genome duplication, subfunction partitioning, and lineage divergence: Sox9 in stickleback and zebrafish. Dev Dyn. 2003;228:480–489. [PubMed]
  • Crow KD, Wagner GP. What is the role of genome duplication in the evolution of complexity and diversity? Mol Biol Evol. 2006;23:887–892. [PubMed]
  • Davis JC, Petrov DA. Do disparate mechanisms of duplication add similar genes to the genome? Trends Genet. 2005;21:548–551. [PubMed]
  • Davis MJ, Hanson KA, Clark F, Fink JL, Zhang F, Kasukawa T, Kai C, Kawai J, Carninci P, Hayashizaki Y, et al. Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units. PLoS Genet. 2006a;2:e46. doi: 10.1371/journal.pgen.0020046. [PMC free article] [PubMed] [Cross Ref]
  • Davis MJ, Zhang F, Yuan Z, Teasdale RD. MemO: A consensus approach to the annotation of a protein's membrane organization. In Silico Biol. 2006b;6:387–399. [PubMed]
  • Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314. doi: 10.1371/journal.pbio.0030314. [PMC free article] [PubMed] [Cross Ref]
  • Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151:1531–1545. [PMC free article] [PubMed]
  • He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005;169:1157–1164. [PMC free article] [PubMed]
  • Hoegg S, Brinkmann H, Taylor JS, Meyer A. Phylogenetic timing of the fish-specific genome duplication correlates with the diversification of teleost fish. J Mol Evol. 2004;59:190–203. [PubMed]
  • Huminiecki L, Wolfe KH. Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res. 2004;14:1870–1879. [PMC free article] [PubMed]
  • Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. [PubMed]
  • Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447:714–719. [PubMed]
  • Kawahara R, Miya M, Mabuchi K, Lavoué S, Inoue JG, Satoh TP, Kawaguchi A, Nishida M. Interrelationships of the 11 gasterosteiform families (sticklebacks, pipefishes, and their relatives): A new perspective based on whole mitogenome sequences from 75 higher teleosts. Mol Phylogenet Evol. 2008;46:224–236. [PubMed]
  • Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–624. [PubMed]
  • Kleinjan DA, Bancewicz RM, Gautier P, Dahm R, Schonthaler HB, Damante G, Seawright A, Hever AM, Yeyati PL, van Heyningen V, et al. Subfunctionalization of duplicated zebrafish pax6 genes by cis-regulatory divergence. PLoS Genet. 2008;4:e29. doi: 10.1371/journal.pgen.0040029. [PMC free article] [PubMed] [Cross Ref]
  • Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J Mol Biol. 2001;305:567–580. [PubMed]
  • Le Comber SC, Smith C. Polyploidy in fishes: Patterns and processes. Biol J Linn Soc. 2004;82:431–442.
  • Lynch M. Gene duplication and evolution. Science. 2002;297:945–947. [PubMed]
  • Lynch M, Force A. The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000;154:459–473. [PMC free article] [PubMed]
  • Lynch M, O'Hely M, Walsh B, Force A. The probability of preservation of a newly arisen gene duplicate. Genetics. 2001;159:1789–1804. [PMC free article] [PubMed]
  • MacCarthy T, Bergman A. The limits of subfunctionalization. BMC Evol Biol. 2007;7:213. doi: 10.1186/1471-2148-7-213. [PMC free article] [PubMed] [Cross Ref]
  • Mathavan S, Lee SG, Mak A, Miller LD, Murthy KR, Govindarajan KR, Tong Y, Wu YL, Lam SH, Yang H, et al. Transcriptome analysis of zebrafish embryogenesis using microarrays. PLoS Genet. 2005;1:260–276. [PMC free article] [PubMed]
  • McLysaght A, Hokamp K, Wolfe KH. Extensive genomic duplication during early chordate evolution. Nat Genet. 2002;31:200–204. [PubMed]
  • Metscher BD, Ahlberg PE. Zebrafish in context: Uses of a laboratory model in comparative studies. Dev Biol. 1999;210:1–14. [PubMed]
  • Meyer A, Schartl M. Gene and genome duplications in vertebrates: The one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr Opin Cell Biol. 1999;11:699–704. [PubMed]
  • Miya M, Takeshima H, Endo H, Ishiguro NB, Inoue JG, Mukai T, Satoh TP, Yamaguchi M, Kawaguchi A, Mabuchi K, et al. Major patterns of higher teleostean phylogenies: A new perspective based on 100 complete mitochondrial DNA sequences. Mol Phylogenet Evol. 2003;26:121–138. [PubMed]
  • Ohno S. Evolution by gene duplication. Springer-Verlag; New York: 1970.
  • Pasterkamp RJ, Kolodkin AL. Semaphorin junction: Making tracks toward neural connectivity. Curr Opin Neurobiol. 2003;13:79–89. [PubMed]
  • Pearson WR. Comparison of methods for searching protein sequence databases. Protein Sci. 1995;4:1145–1160. [PMC free article] [PubMed]
  • Postlethwait JH, Woods IG, Ngo-Hazelett P, Yan Y-L, Kelly PD, Chu F, Huang H, Hill-Force A, Talbot WS. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res. 2000;10:1890–1902. [PubMed]
  • Postlethwait J, Amores A, Cresko W, Singer A, Yan Y-L. Subfunction partitioning, the teleost radiation and the annotation of the human genome. Trends Genet. 2004;20:481–490. [PubMed]
  • Robinson-Rechavi M, Marchand O, Escriva H, Bardet PL, Zelus D, Hughes S, Laudet V. Euteleost fish genomes are characterized by expansion of gene families. Genome Res. 2001;11:781–788. [PMC free article] [PubMed]
  • Roth C, Rastogi S, Arvestad L, Dittmar K, Light S, Ekman D, Liberles DA. Evolution after gene duplication: Models, mechanisms, sequences, systems, and organisms. J Exp Zoolog B Mol Dev Evol. 2007;308B:58–73. [PubMed]
  • Scannell DR, Byrne KP, Gordon JL, Wong S, Wolfe KH. Multiple rounds of speciation associated with reciprocal gene loss in polyploid yeasts. Nature. 2006;440:341. [PubMed]
  • Sémon M, Wolfe KH. Reciprocal gene loss between Tetraodon and zebrafish after whole genome duplication in their ancestor. Trends Genet. 2007;23:108–112. [PubMed]
  • Sémon M, Wolfe KH. Preferential subfunctionalization of slow-evolving genes after allopolyploidization in Xenopus laevis. Proc Natl Acad Sci. 2008;105:8333–8338. [PMC free article] [PubMed]
  • Sharma MK, Liu R-Z, Thisse C, Thisse B, Denovan-Wright EM, Wright JM. Hierarchical subfunctionalization of fabp1a, fabp1b and fabp10 tissue-specific expression may account for retention of these duplicated genes in the zebrafish (Danio rerio) genome. FEBS J. 2006;273:3216–3229. [PubMed]
  • Simillion C, Vandepoele K, Saeys Y, Van de Peer Y. Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 2004;14:1095–1106. [PMC free article] [PubMed]
  • Slater G, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. [PMC free article] [PubMed] [Cross Ref]
  • Sprague J, Bayraktaroglu L, Clements D, Conlin T, Fashena D, Frazer K, Haendel M, Howe DG, Mani P, Ramachandran S, et al. The Zebrafish Information Network: The zebrafish model organism database. Nucleic Acids Res. 2006;34:D581–D585. [PMC free article] [PubMed]
  • Taylor JS, Braasch I, Frickey T, Meyer A, de Peer YV. Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome Res. 2003;13:382–390. [PMC free article] [PubMed]
  • Van de Peer Y. Computational approaches to unveiling ancient genome duplications. Nat Rev Genet. 2004;5:752–763. [PubMed]
  • Vandepoele K, Saeys Y, Simillion C, Raes J, Van de Peer Y. The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. Genome Res. 2002;12:1792–1801. [PMC free article] [PubMed]
  • Vandepoele K, De Vos W, Taylor JS, Meyer A, Van de Peer Y. Major events in the genome evolution of vertebrates: Paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci. 2004;101:1638–1643. [PMC free article] [PubMed]
  • Wapinski I, Pfeffer A, Friedman N, Regev A. Natural history and evolutionary principles of gene duplication in fungi. Nature. 2007;449:54–61. [PubMed]
  • Wilkins SJ, Yoong S, Verkade H, Mizoguchi T, Plowman SJ, Hancock JF, Kikuchi Y, Heath JK, Perkins AC. Mtx2 directs zebrafish morphogenetic movements during epiboly by regulating microfilament formation. Dev Biol. 2008;314:12–22. [PubMed]
  • Wong S, Butler G, Wolfe KH. Gene order evolution and paleopolyploidy in hemiascomycete yeasts. Proc Natl Acad Sci. 2002;99:9272–9277. [PMC free article] [PubMed]
  • Woods IG, Wilson C, Friedlander B, Chang P, Reyes DK, Nix R, Kelly PD, Chu F, Postlethwait JH, Talbot WS. The zebrafish gene map defines ancestral vertebrate chromosomes. Genome Res. 2005;15:1307–1314. [PMC free article] [PubMed]
  • Woolfe A, Elgar G. Comparative genomics using Fugu reveals insights into regulatory subfunctionalization. Genome Biol. 2007;8:R53. [PMC free article] [PubMed]
  • Yu H-H, Moens CB. Semaphorin signaling guides cranial neural crest cell migration in zebrafish. Dev Biol. 2005;280:373–385. [PubMed]
  • Zdobnov EM, Apweiler R. InterProScan: An integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17:847–848. [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Gene
    Gene
    Gene links
  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • GEO Profiles
    GEO Profiles
    Related GEO records
  • MedGen
    MedGen
    Related information in MedGen
  • Nucleotide
    Nucleotide
    Published Nucleotide sequences
  • Protein
    Protein
    Published protein sequences
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...