• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Aug 2009; 19(8): 1497–1505.
PMCID: PMC2720179

Automated identification of conserved synteny after whole-genome duplication

Abstract

An important objective for inferring the evolutionary history of gene families is the determination of orthologies and paralogies. Lineage-specific paralog loss following whole-genome duplication events can cause anciently related homologs to appear in some assays as orthologs. Conserved synteny—the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time—can help resolve such errors. Several previous studies examined genome-wide syntenic conservation to infer the contents of ancestral chromosomes and provided insights into the architecture of ancestral genomes, but did not provide methods or tools applicable to the study of the evolution of individual gene families. We developed an automated system to identify conserved syntenic regions in a primary genome using as outgroup a genome that diverged from the investigated lineage before a whole-genome duplication event. The product of this automated analysis, the Synteny Database, allows a user to examine fully or partially assembled genomes. The Synteny Database is optimized for the investigation of individual gene families in multiple lineages and can detect chromosomal inversions and translocations as well as ohnologs (paralogs derived by whole-genome duplication) gone missing. To demonstrate the utility of the system, we present a case study of gene family evolution, investigating the ARNTL gene family in the genomes of Ciona intestinalis, amphioxus, zebrafish, and human.

An important objective for inferring the evolutionary history of gene families and chromosome segments is the determination of orthology and paralogy. A stepwise approach generally uses BLAST (basic local alignment search tool) (Altschul et al. 1997) to define coarse relationships among genes, followed by phylogenetic reconstruction to suggest more detailed hypotheses of descent. Events such as gene duplications or whole genome duplications (WGD), with associated differential gene loss, introduce noise into these analyses. Anomalies, such as lineage-specific paralog loss, can cause anciently related homologs to appear to be orthologs, thereby confusing sequence similarity with functional homology (Postlethwait 2007). Such errors can confound attempts to create nonhuman animal disease models and can obscure recent, species-specific evolutionary change among sister lineages.

Orthologs are two genes, one in each of two species, that descended from a single gene in the last common ancestor of those two species. Paralogs are a set of genes derived by duplication within a lineage, and together, a group of paralogs can be co-orthologous to their unduplicated ortholog in a related species. Ohnologs are a special subset of paralogs that result from a whole-genome duplication event (Wolfe 2000). The differential loss of genes that follows a duplication event can create ohnologs gone missing when different ohnologs are lost reciprocally in different lineages.

Understanding and distinguishing ohnologs gone missing from orthologs is a pervasive problem in vertebrate genomics due to multiple genome duplication events. Two rounds of whole-genome duplication events, called R1 and R2, likely occurred at the base of the vertebrate lineage after the divergence of non-vertebrate chordates and prior to the appearance of jawed vertebrates (Garcia-Fernàndez and Holland 1994; Spring 1997; Dehal and Boore 2005). A third duplication, called R3, likely occurred in the teleost lineage after the divergence of ray-finned and lobe-finned fishes (Amores et al. 1998; Taylor et al. 2003; Jaillon et al. 2004), but before the radiation of the teleosts. Additional genome duplications punctuated the evolution of other lineages, like salmonids, catastomids, goldfish, Xenopus laevis, and even a rodent (Uyeno and Smith 1972; Allendorf and Thorgaard 1984; Schmid and Steinlein 1991; Risinger and Larhammar 1993; Larhammar and Risinger 1994; Gallardo et al. 1999; David et al. 2003; Mungpakdee et al. 2008a,b). Given the pervasive nature of genome duplication in chordates and the importance of teleost fish and Xenopus laevis as model organisms, it is important to develop automated methods to identify true orthologs among groups of paralogs and to distinguish them from more ancient, nonorthologous homologs.

Figure 1 illustrates the problem of distinguishing orthologs following duplication and lineage-specific loss of a gene g and some of its neighboring genes after WGD (R1), speciation (S), and a second WGD event (R2) in one of the descendant lineages. In an idealized case, chromosomes would experience few changes in gene order or gene content, as illustrated by genes of the same color in Figure 1. The most common fate of genes created by a WGD event, however, is pseudogenization and nonfunctionalization (Li 1980; Watterson 1983). Surviving duplicates can develop new functions (Ohno 1970) or partition or lose their existing functions (Force et al. 1999; Lynch and Force 2000; Winkler et al. 2003; Postlethwait et al. 2004; Jovelin et al. 2007; Chain et al. 2008; Conant and Wolfe 2008; Jarinova et al. 2008). From the time of the duplication event to the present, duplicated genes can alter their expression patterns (Force et al. 1999) or their exon structure (Altschmied et al. 2002), or their activities (Zhang et al. 2002; Zhang 2003), and such changes can alter protein–protein interactions or subsequent developmental or physiological functions.

Figure 1.
Differential gene loss following whole-genome duplication creates ohnologs gone missing. This image shows the evolutionary history of a gene g and neighbors undergoing a whole-genome duplication event (R1), a speciation event (S), and a second WGD event ...

In the case of differential gene loss and gene rearrangements in lineages S1 and S2, most reciprocal best-hit BLAST algorithms (Wall et al. 2003) would associate gene g2 with g1a and g1b, and most phylogenetic methods, due to a lack of data, would find that the most likely hypothesis of descent was that genes g2, g1a, and g1b shared their most recent common ancestor; in other words, these methods would incorrectly infer that g1a and g1b were orthologs of g2. The erroneous assignment of orthology presents a problem because it implies that the last common ancestor at time S had a single gene with a set of functions that evolved to g1 (and its subsequent duplicates, g1a and g1b) in S2 and g2 in S1, but in fact, no such gene actually existed.

To address this problem and to better infer orthologies and paralogies, we can take advantage of conserved synteny—the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time. In a WGD event, duplicated chromosomes (homeologs) initially have gene orders identical to each other and to their immediate ancestor. Between the time of duplication and speciation events, however, genes can be lost from one homeolog or the other (unless preserved by structures such as embedded regulatory elements) (Kikuta et al. 2007), and inversions and other chromosome rearrangements can occur independently on the two duplicated homeologs. These events occurring in the chromosomal vicinity of the gene in question give an identity to all of the genes in the neighborhood. In the example given in Figure 1, we could test the hypothesis that genes g1a and g1b are co-orthologous to gene g2 by first examining the neighbors of g1a and g1b—ensuring that a sufficient number of gene neighbors are also paralogous—and then by checking those neighboring paralogs to ensure that they are orthologous to the neighbors of g2. The conserved syntenic region defined by such genes would confirm (or in this case, reject) the co-orthology of genes g1a and g1b to g2. This approach complements the use of BLAST and phylogenetic reconstruction and provides additional evidence to infer the evolutionary history of gene families independent of sequence identities.

Several previous studies examined syntenic conservation at a genomic level to determine the nature of the ancestral chromosomes for that organism's lineage. Evidence for two rounds of genome duplication in stem vertebrates came from a whole-genome analysis of human, mouse, and fugu pufferfish using the urochordate Ciona intestinalis as an outgroup (Dehal and Boore 2005). Analysis of the Tetraodon nigroviridis (green spotted pufferfish) genome and the construction of a dense meiotic map for medaka supported earlier conclusions (Amores et al. 1998; Postlethwait et al. 1998; Woods et al. 2000; Postlethwait et al. 2002; Taylor et al. 2003; Van de Peer et al. 2003) that a third genome duplication had occurred in the teleost fish. Analysis of Tetraodon and medaka provided evidence for a 12-chromosome ancestral vertebrate genome by calculating conserved syntenic regions between the fish and human genomes (Jaillon et al. 2004; Naruse et al. 2004). Subsequent work reconstructed the ancestral vertebrate genome using data from human, chicken, and medaka genomes (Nakatani et al. 2007) and, in opposition to earlier work (Jaillon et al. 2004; Naruse et al. 2004; Woods et al. 2005), concluded that the osteichthyan ancestor had ~40 chromosomes. These studies provided insights into the architecture of the ancestral genome, but were not convenient for the study of the evolution of individual gene families, because the methods used did not form individual syntenic clusters (Jaillon et al. 2004; Dehal and Boore 2005; Nakatani et al. 2007); instead, they used hand-curated data (Jaillon et al. 2004; Nakatani et al. 2007) or they downplayed portions of the genome that did not fit into the analysis (Dehal and Boore 2005).

We have developed an automated system to identify conserved syntenic regions in a primary genome using an outgroup genome that diverged from the investigated lineage before a whole-genome duplication. Our Synteny Database allows for the analysis of fully or partially assembled genomes (Bridgham et al. 2008) and is optimized for the investigation of individual gene families in multiple lineages. The Synteny Database specializes in comparing genomes that have undergone one or more whole-genome duplications; it is able to detect chromosome inversions and translocations, as well as ohnologs gone missing in the gene families investigated. To demonstrate the utility and use of the system, we present a case study of the evolution of the ARNTL gene family in the amphioxus, Ciona intestinalis, zebrafish, and human genomes.

Results

The prediction of syntenic clusters allows us to enumerate regions of the genome that have been conserved since the last whole-genome duplication (relative to the unduplicated outgroup). These syntenic clusters, in turn, depend on the identification of paralogous genes within a genome along with their corresponding orthologous genes in the outgroup genome. We built an analysis pipeline to satisfy each of these two dependencies: first, identifying paralogous and orthologous genes, and second, discovering clusters of conserved synteny.

Our modified Reciprocal Best Hit (mRBH) Analysis Pipeline identifies paralogous gene groups in a primary genome (rather than a single “best” hit) and then anchors those gene groups to an ortholog in an outgroup genome using a BLAST-based approach. The pipeline naturally creates paralogous groups relative to the last whole-genome duplication that occurred in the primary genome but not in the outgroup genome. For example, if the primary genome has experienced a duplication since it diverged from the outgroup genome, then the pipeline will produce gene groups of size two. If, on the other hand, a duplication occurred before the two species diverged, then the pipeline reverts to a simple ortholog pipeline with a one-to-one correspondence between genes in the primary and outgroup genomes. In practice, recent tandem gene duplication, gene loss, and sequence divergence heavily influence the number of genes per group.

Given a set of paralogous gene groups in the primary genome that are co-orthologous to a single gene in the outgroup, we wish to look for regions of conserved synteny among paralogous chromosome segments within the primary genome and between orthologous chromosome segments in the primary and outgroup genomes. Our second analysis pipeline, which populates the Synteny Database, uses a sliding-window analysis to identify chromosome regions in the primary and outgroup genomes that have been conserved since the last whole-genome duplication event, while allowing for small-scale changes in gene order, gene orientation, and gene loss. The result is a set of paralogous cluster pairs within the primary genome and a set of orthologous cluster pairs between the primary and outgroup genomes.

The Synteny Database uses a web-based interface to provide paralogous and orthologous gene groups and syntenic clusters to the researcher in a format searchable by gene name or genomic location. The user can also access several web-based visualization tools, including linear and circular plots of paralogs and orthologs, to render gene groups and syntenic clusters. The following case study extensively utilizes these web-based tools and illustrates how researchers can use the Synteny Database to infer gene family histories.

Case study: The ARNTL gene family

The Synteny Database provides a useful data set for the examination of the evolutionary history of the ARNTL gene family. The aryl hydrocarbon receptor nuclear translocator-like gene (ARNTL or BMAL1) is a helix-loop-helix protein that forms a heterodimer with CLOCK to regulate the circadian clock, a system that provides daily periodicity for biochemical, physiological, and behavioral activities (Ikeda and Nomura 1997; Gekakis et al. 1998; Pando and Sassone-Corsi 2002). We will test the ability of the mRBH Analysis Pipeline to identify orthologs and paralogs of the ARNTL gene family in the basally diverging chordate amphioxus, the urochordate Ciona intestinalis (a sea squirt), the ray fin fish Danio rerio (zebrafish), and in the lobe fin fish lineage, Homo sapiens. Then, using the Synteny Database, we will search for conserved chromosome segments surrounding the orthologous or paralogous ARNTL genes. If the amphioxus, Ciona, zebrafish, and human ARNTL gene families descended from a single, ancestral gene in the last common ancestor, then we would expect the genomic neighborhood of the ARNTL genes to reflect the existence of R1 and R2 in the vertebrate lineages and R3 in teleost fish. We will use this syntenic conservation to verify each orthologous and paralogous relationship in the ARNTL gene tree and in the process confirm or reject our orthology and paralogy assignments. The full case study is available in the Supplemental material; here, we will discuss two parts to highlight several features the Synteny Database detects: the paralogy assignment in the human genome and one orthology assignment between the human and zebrafish genomes.

ARNTL paralogs in the human genome

We examine the origins of ARNTL paralogs in three steps: output from the mRBH Analysis Pipeline, a comparison of those results to phylogenetic analysis, and inferences obtained from the Synteny Database. According to the results of the mRBH Analysis Pipeline, ARNTL, located on human chromosome 11 (Hsa11), has a single paralog in the human genome, ARNTL2, on chromosome 12 (Hsa12) (Hogenesch et al. 2000). Because the genome assembly of Ciona intestinalis (Satou et al. 2003) does not contain an ARNTL ortholog, the mRBH pipeline incorrectly anchored the human ARNTL orthologs to the nearest related extant gene in the Ciona genome (Q4H3W4_CIOIN), which is in reality the ortholog of the human ARNT and ARNT2 genes—ancient paralogs of the ARNTL genes. These conclusions were confirmed by building a phylogenetic tree, which shows that amphioxus, which diverged more basally than Ciona in chordate history (Blair and Hedges 2005; Philippe et al. 2005), has an ortholog of human ARNT and ARNT2, as well as an ortholog of ARNTL and ARNTL2 (Fig. 2A). This analysis emphasizes the problem illustrated by Figure 1: Reciprocal BLAST procedures can assign false orthologies in the case of lost gene duplicates. Because the current genome assembly of Ciona lacks an ARNTL ortholog, we will use the amphioxus genome as an outgroup to search for syntenic conservation among the human ARNTL paralogs.

Figure 2.
Analysis of the ARNTL gene family. (A) ARNTL phylogenetic tree based on maximum likelihood showing that Danio rerio (Dre) arntl1a is paralogous to arntl1b, and that both of these genes are co-orthologous to human (Hsa) ARNTL. The tree suggests that Dre ...

Paralogy of human ARNTL chromosome segments

The Synteny Database generates several visualizations, including dotplots, circle plots, and gene traces that the user can download in raster (PNG) and vector (PDF) formats. To our knowledge, this is the only site that provides public access to such visualization tools. A particularly useful display is a dotplot, which plots genes (gray dots) according to their order and relative distance along a user-selected index chromosome displayed along the horizontal axis of the plot in megabases. The paralogs (red dots) of each gene on the index chromosome are plotted vertically above or below on the appropriate chromosomes, ordered with respect to the location of the gene on the index chromosome rather than their order on their native chromosome. Users can specify genes to be circled on the plot and a gray disc shows the index chromosome's centromere, when known. The dotplot readily identifies regions of the index chromosome that are duplicated by a large-scale event, such as a WGD. A paralogy dotplot for Hsa11 (Fig. 2B) showed this duplication pattern within a large region encompassing ARNTL. More than 60 Mb of Hsa11 contained genes with paralogs on Hsa12 (green dots), spanning the region that includes ARNTL2 and providing evidence that this region of Hsa11/Hsa12 was produced in a large-scale duplication event. Hsa19 also showed many paralogs from this region.

While dotplots enhance visualization of data across the entire genome, a gene trace provides a more detailed view of a conserved region. The Synteny Database identified a conserved region of nine pairs of Hsa11/Hsa12 paralogs near ARNTL using a sliding window size of 50 (Fig. 2C). To evaluate the relationship of window size and shared gene pairs, we performed a permutation analysis (see Methods). In brief, with longer windows, the likelihood of finding a pair of orthologs that are syntenic in two species will increase solely by chance. According to the permutation analysis, the nine pairs of genes found using the 50-gene window demonstrates conservation from the last common ancestor of the ARNTL chromosome segments. Each gray square in a gene trace represents a gene with order, but not distance or size, maintained along the chromosome. Colored genes are members of this particular paralogous cluster, while gray genes are not. Lines connect members of the cluster representing paralogs and are colored according to how the sliding window analysis detected them. The colored lines connecting paralogs make chromosome rearrangements readily apparent.

ARNTL paralogs in teleost fish

The hypothesis that teleost fish experienced a third genome duplication after splitting from the lineage that led to humans (Amores et al. 1998; Postlethwait et al. 1998; Taylor et al. 2003; Jaillon et al. 2004; Naruse et al. 2004), predicts that there should be two orthologs (co-orthologs) of each human ARNTL gene in the zebrafish and other teleosts, except for post-duplication gene loss. Additionally, we would expect to find conserved paralogous regions around each pair of zebrafish co-orthologs as well as conserved orthologous regions around each zebrafish/human ortholog pair. To test these predictions, we first queried the mRBH Analysis Pipeline results to identify zebrafish orthologs of human ARNTL and ARNTL2 and then used the Synteny Database to search for conserved synteny in regions surrounding those orthologs. The ortholog circle plot of Figure 3A summarizes the human and zebrafish syntenic clusters identified by the pipeline. The circle plot, which is a third visualization available from the Synteny Database, displays chromosomes drawn around the circumference of a circle, while arcs join orthologous gene pairs positioned relative to their location on the chromosome. Orthologous gene arcs are colored according to their syntenic cluster membership. Users can specify chromosomes, or portions of chromosomes, from the primary genome, or between the primary and outgroup genomes to include in customized circle plots.

Figure 3.
Conserved syntenies for ARNTL genes. (A) A circle plot summarizing human and zebrafish ARNTL family clusters. Arcs along the circumference of the circle represent chromosomes, while arcs within the circle connect pairs of orthologs. (B) The ARNTL2 orthologous ...

The results of the mRBH Analysis Pipeline identified three paralogous zebrafish genes: arntl1a, arntl1b, and arntl2. The output suggested the unexpected result that all three are co-orthologous to human ARNTL and none of them were orthologous to ARNTL2. Three zebrafish ARNTL genes have been reported in the literature: arntl1a and arntl1b were said to be orthologous to human ARNTL, while arntl2 was thought to be orthologous to ARNTL2 (Cermakian et al. 2000; Ishikawa et al. 2002; Wang 2009). The fact that the pipeline yielded results different from the published results raised two questions; first, given two copies of the ARNTL genes (ARNTL and ARNTL2) in the ancestral vertebrate lineage, the R3 duplication event should have produced four copies of the ARNTL paralogs in teleosts, not three. We infer that the fourth zebrafish gene has been lost or modified so greatly that the pipeline could not find it by sequence similarity search. A second question is: Why did the pipeline anchor zebrafish arntl2 to a human ortholog different from the published conclusion? The pipeline properly assigned the three zebrafish arntl genes to a single paralogous group—with arntl1a and arntl1b being highly related to one another, followed by arntl2. When the automated system attempted to anchor the three zebrafish genes to their human orthologs, however, it made an erroneous assignment. In this case, the rate of change of human ARNTL2 relative to its zebrafish ortholog was sufficiently fast that an RBH-based method does not possess enough power to detect the proper ortholog successfully. A phylogenetic analysis (Fig. 2A) confirmed the published results and led us to tentatively reject the assignment from the orthology pipeline.

We next sought to use conserved synteny to provide an independent line of evidence that was not based on sequence similarities. The first step was to confirm the orthology assignment of the zebrafish arntl1 genes, and the Synteny Database provided strong syntenic support showing that arntl1a and arntl1b are co-orthologs of human ARNTL (see Supplemental material for details). The next step was to confirm the orthology of zebrafish arntl2, which is described below.

Orthology of zebrafish arntl2 chromosome segments

Searching for syntenic conservation to support the ARNTL2 orthology assignment, we examined the pipeline results with an orthology dotplot of Hsa12. The dotplot revealed strong conservation along more than 80% of the length of Dre4 (Supplemental Fig. 3A), as well as weak conservation with Dre18 and Dre25. The search for a conserved syntenic cluster between the human ARNTL2 and zebrafish arntl2 genes led to an illuminating situation. The orthology dotplot identified both Dre18, which harbors arntl2, and Dre4, without an arntl-related gene, as the likely R3 paralogons of Hsa12 (Supplemental Fig. 3A). The Synteny Database found a conserved region on Hsa12 surrounding ARNTL2 and orthologous to Dre4 (Fig. 3B), and also found a second region on Hsa12 that is 12 Mb distant from ARNTL2 that shows strong syntenic conservation with Dre18 (Fig. 3C). The Dre4/Hsa12 conserved region contains 38 pairs of orthologous genes, while the Dre18/Hsa12 cluster contains 18 orthologous gene pairs providing strong support. So, the gene traces connect the region on Hsa12 with ARNTL2 to a region on Dre4 without an arntl-related gene (Fig. 3A, orange lines), and they connect a second region on Hsa12, without ARNTL2, to a region on Dre18 that does contain arntl2 (Fig. 3A, green lines). This result poses the question: If Dre4 and Dre18 are paralogons from the R3 duplication event, why do they show syntenic conservation with different regions of Hsa12? One hypothesis to explain these results is that there was an inversion on the ancestral chromosome in the lineage leading to humans after the lobe fin and ray fin fish lineages diverged. This inversion event would have separated the two regions we see on modern Hsa12. If we return to the paralogous cluster that linked Hsa11 with Hsa12 (Fig. 2C), we find that several paralogs within that region of Hsa11 connect it to the Hsa12/Dre18 region, including TPH1/TPH2 and CSRP3/CSRP2 on Hsa11 and Hsa12, respectively. Given two regions on Hsa12, one that is orthologous to Dre4 and the other orthologous to Dre18, with both of those regions on Hsa12 paralogous to Hsa11, the architecture suggests that an inversion on ancestral Hsa12 must have occurred that moved ARNTL2 relative to other genes after the lineage leading to humans split from the lineage leading to zebrafish (see Supplemental Fig. 4 for additional evidence supporting an inversion). Furthermore, the strongly conserved region on Dre4 suggests that the fourth zebrafish ARNTL gene (which would have been called arntl2b) is an ohnolog gone missing (Postlethwait 2007). The original position of arntl2b was likely either directly upstream of zebrafish gene si:dkey-207j16.2 or si:ch211-234f20.7 on Dre4 (Fig. 3B), depending on the layout of the ancestral chromosome prior to the transposition event.

Having established good syntenic support showing co-orthologous regions between zebrafish chromosomes 4 and 18 and Hsa12, the last task is to test for paralogy of Dre4 and Dre18, and we show this analysis in the Supplemental material.

In summary, analysis using the Synteny Database suggests the following model (Fig. 3D). A single ancestral ARNTL gene, whose descendant still exists in amphioxus (but does not appear in the genome assembly of Ciona intestinalis), was duplicated in R1. Because only two copies of that gene remain in the human genome (ARNTL and ARNTL2), we infer that the second copy of the ancient ARNTL gene was lost prior to R2. The remaining pair of genes was duplicated again in R3 after the lineage leading to humans split from the lineage leading to teleost fish. Three of these four predicted genes remain in zebrafish today, arntl1a, arntl1b, and arntl2, and a fourth copy was lost, although it was probably located on Dre4 as inferred from orthologies of neighboring genes. These results are consistent with the recent work by Wang (2009).

Lessons the ARNTL study reveals about the functioning of the Synteny Database

Exercising the Synteny Database with the ARNTL gene family in this case study allowed several observations. First, the mRBH Analysis Pipeline worked well to identify the ARNTL paralogous gene groups in both the human and zebrafish genomes. The limits of the power of the RBH methodology, however, were illustrated by its inability to properly assign the zebrafish arntl2 gene to its human ortholog. Second, the Synteny Database had the strength to rectify the reduced ability of the RBH methodology by identifying conserved synteny, not only where reciprocal best hit analysis was strong and all of the expected R2 and R3 duplicate genes were present, but also when RBH evidence was weak and some genes had been lost. In the former case, the database showed clear syntenic conservation for ARNTL and its co-orthologs, arntl1a and arntl1b, and in the later case, the database was able to buttress the weak evidence from the mRBH pipeline for orthology between the zebrafish arntl2 gene and its human ortholog. Third, the Synteny Database was able to identify the likely location of lost ohnologs, for example, the lost arntl2b gene in zebrafish. Fourth, the Synteny Database identified chromosome rearrangements including inversions, translocations, and transpositions such as the inversion the database identified on Hsa12.

Discussion

In this study, we introduced the Synteny Database: an automated system to identify conserved syntenic regions among sequenced genomes. A unique attribute of this system is that it was designed from the outset to cope with gene duplications, especially whole-genome duplication events. Studies that specifically search for syntenic conservation in support of orthology or paralogy of a particular gene or gene family are often done by hand, and usually use a basic RBH algorithm to infer homology within a region of interest. Because the search for neighboring orthologs or paralogs is laborious and error-prone, the labor involved often limits the number of genes an investigator can reasonably study. The Synteny Database, with its single-linkage clustering algorithm, can identify paralogy for larger groups of genes, providing more targets for conserved areas. In addition, because all orthologous and paralogous relationships are pre-computed, the Synteny Database can rapidly present the results of a comprehensive search for conserved synteny. The power of this approach is evident in the ARNTL case study, in which the automated system was able to identify, first, a region on Dre4 where a member of the ARNTL gene family had been lost during evolution and, second, a transposition on Hsa12 that had moved the syntenically conserved region for ARNTL2 12 Mb upstream on the human chromosome relative to the zebrafish paralogons.

The Synteny Database provides syntenic clusters produced using several different sliding window sizes from 50 to 200 genes. The sliding window method allows the investigator to search for conservation in broad areas using a large window size and, when areas of interest are found, to use a smaller window size to focus on strongly conserved syntenic regions. While the permutation analysis (Fig. 4) showed that all window sizes provided statistically significant results when compared with a randomized distribution, a sliding window size of 50 genes yielded the best results relative to the randomized background.

Figure 4.
A permutation analysis of all syntenic clusters that the Synteny Database found in the human genome using amphioxus as outgroup. We permuted the location of paralogous group members throughout the genome and reclustered the randomized data, repeating ...

One weakness in the mRBH Analysis Pipeline, and in RBH-based algorithms in general, is fallibility when handling substantial evolutionary rate variation among a set of genes. This problem appears when only the domain that defines the gene family remains sufficiently intact to be identified by a BLAST local alignment. The rapidly evolving gene can be assigned to a paralog with the most conserved version of the family domain, rather than the gene with which it shares its preduplication ancestry. In such cases, the analysis of conserved syntenies automated by the Synteny Database can usually provide data that illuminates gene histories.

In this study, we focused on amphioxus, C. intestinalis, human, and zebrafish genomes to examine the ARNTL gene families, but the Synteny Database is also populated with other sequenced genomes, including stickleback, medaka, fugu, and mouse. The Synteny Database can analyze any genome that has been at least partially assembled into scaffolds or a subset of chromosomes and is optimized for the investigation of individual gene families in multiple lineages. Note that the accuracy of the output depends on the accuracy of available genome assemblies. Presently, the human and mouse assemblies are of high quality, and the zebrafish assembly will soon reach this quality. Furthermore, tandem-duplicated regions are often not well assembled, even in the human genome, which can lead to the failure to assemble genes embedded within tandem duplications and apparent gene loss (She et al. 2004). In addition, copy number variation within a species can result in apparent gene duplication or gene loss if the genome sequenced is from a single individual polymorphic for such variants (Sharp et al. 2006; Kidd et al. 2008).

The Synteny Database presents results in an online, searchable database. In addition to the tools used to draw the gene trace images shown in the case study, the Synteny Database provides several uniquely available tools to study the genome-wide distribution of genes, including dotplots and circle plots, which users can customize in a variety of ways. We recently rebuilt the databases for the mRBH Analysis Pipeline and the Synteny Database using data from Ensembl version 52, including the latest releases of the human, mouse, and zebrafish genomes, as well as version 2 of the amphioxus genome. The Synteny Database is available for public use at http://teleost.cs.uoregon.edu/synteny_db/.

Methods

To enumerate regions of the genome that have conserved gene content since the last whole-genome duplication (relative to an unduplicated outgroup), we built two analysis tools, the mRBH Analysis Pipeline, which relies on BLAST (Altschul et al. 1997) to associate homologous genes through a modified reciprocal best hit (RBH) algorithm (Wall et al. 2003), coupled with the Synteny Database, which uses a sliding window analysis to create clusters of paralogous and orthologous genes.

mRBH Analysis Pipeline

The mRBH Analysis Pipeline begins by taking the protein sequence of every gene in the primary genome and performing a BLASTP search against all other proteins in the primary genome. In the case of multiple splice variants, the pipeline performs a search for each transcript. Following the within-primary-genome search, the pipeline conducts a BLAST search using each protein from the primary genome as a query against the outgroup genome, and any sequences found are then used as queries to search back into the primary genome (a retro-BLAST).

The pipeline uses the collected BLAST results to build paralogy groups. Although reciprocal best hit relationships are often used to identify orthologous genes between species (Wall et al. 2003), the mRBH method requires modification to identify paralogous genes. Given the paralogs A, B, and C, only two of them can be reciprocal best hits. Allowing for transitivity, however, can accomodate multiple duplication events: if genes A and B are traditional reciprocal best hits, then if gene C's best hit is either A or B and A or B's next best hit is C, then genes A, B, and C should all be considered reciprocal best hits. The pipeline uses a single-linkage clustering algorithm (Van de Peer 2004), implemented by traversing a directed graph, to achieve this goal. See the Supplemental material for more detail.

The mRBH Analysis Pipeline uses WU-BLAST (http://blast.wustl.edu/) with the BLOSUM62 substitution matrix (Henikoff and Henikoff 1992) and records only BLAST hits with an E-value below 1 × 10−5. We also used a gap opening penalty of 11 and a gap extension penalty of 1. We experimented with different substitution matrices and BLAST parameters, but, given the wide variation in rates of divergence between genes or gene families across the genome, a general approach provided the best results.

Noise reduction

One of the major issues governing the size of the paralog groups that the pipeline builds is how many BLAST hits to make available to the single-linkage clustering algorithm. If a gene has one or more conserved domains, or even if a gene contains weakly conserved motifs, then BLAST will pick up those regions in its search for statistically significant local alignments. Because each BLAST hit is a potential edge in the directed graph, the system must limit those edges to hits that are likely to provide information to infer real paralogy and orthology, not simply a small, well-conserved protein domain. Several heuristic approaches can eliminate noise from BLAST results (Li et al. 2005; Hahn et al. 2007); the mRBH Analysis Pipeline requires that any local alignment (or more accurately, any set of nonoverlapping high-scoring pairs) produced by BLAST between two genes covers at least 50% of the length of the longer of the two genes. Prior to executing the single-linkage clustering algorithm, the pipeline checks every BLAST hit and marks those that do not meet these criteria.

Outgroup anchoring

Prior to executing outgroup anchoring, the analysis pipeline constructs paralogous groups from the primary genome. The system then checks each member of each group to determine its top BLAST hit in the outgroup genome. If a group member does not have a BLAST hit in the outgroup, the pipeline drops that group member from further consideration. If members of a paralogous group have best BLAST hits to different genes in the outgroup, then the pipeline splits the group, with each subset of the original group being anchored to the appropriate (orthologous) outgroup gene. BLAST hits for outgroup genes are then checked to ensure that the outgroup gene hits the original gene in the primary genome (although it does not have to be the top hit). If an outgroup gene does not retro-BLAST back to a gene in the original paralogy group, then the gene from the primary genome is eliminated from the group. Finally, the system performs the outgroup anchoring analysis on all genes in the primary genome that had not been assigned to a paralogous group, i.e., singletons, to attempt to identify orthologs for all genes. The end result is a series of paralogous gene groups from the primary genome each anchored to a single gene in the outgroup.

The Synteny Database

The second analysis pipeline populates the Synteny Database by taking a set of paralogy groups along with its corresponding outgroup genes and searching for conserved syntenic areas within the primary genome and between the primary and outgroup genomes. The algorithm uses a sliding-window analysis, where window size is measured in numbers of contiguous genes. The pipeline places the window on the first gene on the first chromosome and moves this window until it finds a pair of genes, one on each of two chromosomes, that are members of the same paralogy group. It then places the second window at the starting location of the gene on the second chromosome and marks the start of a syntenic cluster. The software then continues to search for paralogous genes located within the space bounded by the two windows. If another gene pair is found, the windows are advanced to the starting positions of the new pair of paralogous genes and the search continues. If the search reaches the tail of either window without finding another pair of paralogous genes, then the pipeline marks the cluster closed and records it. The position of the first window is then reset to the first gene that was not part of the last syntenic cluster, and the search is restarted. The analysis pipeline repeats this process until all paralogous genes on the first and second chromosome have been examined. To identify inverted regions of conserved synteny, the pipeline runs the two windows in opposing directions and again records found clusters. Finally, the analysis pipeline merges clusters from the two analyses that occupy areas on the chromosome within a sliding window's length of one another. The software continues this analysis on every pair of chromosomes in the primary genome—comparing the first and third chromosomes, the first and fourth chromosomes, and so on, coming up with a genome-wide representation of paralogons. To identify conserved syntenies between species, the system performs the entire analysis again, this time comparing each chromosome of the primary genome to every chromosome of the outgroup genome. For this study we experimented with four window sizes, 25, 50, 100, and 200 genes in length.

Permutation analysis

It is important to question whether paralogons defined by the Synteny Database are the result of a large-scale duplication event or could have originated by chance alone. To examine this question, we performed a permutation analysis to test the statistical significance of observed genomic data. For each primary genome, we took all of the paralogous genes defined by our mRBH Analysis Pipeline and randomized their locations throughout the genome. We then re-executed our clustering algorithm and recorded the results—repeating this process 100 times. For each sliding-window length, we plotted with error bars the average number of clusters of a particular size that were detected after randomizing genomic data (cluster size was measured as the number of gene pairs contained within the cluster). We also plotted the actual number of clusters of a particular size found in our original data.

Figure 4 plots the results of a permutation analysis of the human genome with amphioxus as outgroup. As the length of the gene window increased, the pipeline generated larger clusters from the randomized data. With a window size of 25 genes, the largest cluster created from the randomized data contained only three gene pairs. With a window size of 200 genes, however, the simulation generated clusters from randomized data that were as large as any actual cluster produced in the original analysis. A t-test showed, however, that the mean cluster size of our actual data was statistically significantly larger than the mean cluster size of the permuted data for all four sliding window sizes (P-values of 1.7 × 10−126, 1.0 × 10−239, 2.8 × 10−207, and 8.6 × 10−41, for window sizes of 25, 50, 100, and 200 genes, respectively). We conclude that analyses should usually use the 50- or 100-gene windows for most reliable results.

Data sources

For this study, Ensembl (Birney et al. 2004; Kasprzyk et al. 2004) provided data for the Homo sapiens genome, using NCBI v36 obtained from Ensembl version 41; the Danio rerio genome, using Zv7 from the Sanger Institute obtained from Ensembl 46; the Gasterosteus aculeatus genome, using BROAD version S1 obtained from Ensembl 41; the Mus musculus genome, using NCBI version m36 obtained from Ensembl 41; the Ciona intestinalis genome, using JGI version 2 obtained from Ensembl 43. We also obtained version 1 of the Branchiostoma floridae genome, which was produced by and obtained from the US Department of Energy Joint Genome Institute (http://www.jgi.doe.gov/).

Acknowledgments

J.C. was supported in part by an IGERT grant from NSF in Evolution, Development, and Genomics (DGE 9972830). This work was supported by grant 5R01RR020833 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health. The contents of this paper are solely the responsibility of the investigators and do not necessarily represent the official views of NCRR or NIH.

Footnotes

[Supplemental material is available online at www.genome.org. The Synteny Database is freely available at http://teleost.cs.uoregon.edu/synteny_db/.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.090480.108.

References

  • Allendorf FW, Thorgaard GH. Tetraploidy and the evolution of salmonid fishes. In: Turner B, editor. The evolutionary genetics of fishes. Plenum Publishing; New York: 1984. pp. 1–53.
  • Altschmied J, Delfgaauw J, Wildea B, Duschla J, Bouneaub L, Volffa J-N, Schartl M. Subfunctionalization of duplicate mitf genes associated with differential degeneration of alternative exons in fish. Genetics. 2002;161:259–267. [PMC free article] [PubMed]
  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI–BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
  • Amores A, Force A, Yan Y-L, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang Y-L, et al. Zebrafish hox clusters and vertebrate genome evolution. Science. 1998;282:1711–1714. [PubMed]
  • Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L, Coates G, Cuff J, Curwen V, Cutts T, et al. An overview of Ensembl. Genome Res. 2004;14:925–928. [PMC free article] [PubMed]
  • Blair JE, Hedges SB. Molecular phylogeny and divergence times of deuterostome animals. Mol Biol Evol. 2005;22:2275–2284. [PubMed]
  • Bridgham JT, Brown JE, Rodríguez-Marí A, Catchen JM, Thornton JW. Evolution of a new function by degenerative mutation in cephalochordate steroid receptors. PLoS Genet. 2008;4:e1000191. doi: 10.1371/journal.pgen.1000191. [PMC free article] [PubMed] [Cross Ref]
  • Cermakian N, Whitmore D, Foulkes NS, Sassone-Corsi P. Asynchronous oscillations of two zebrafish clock partners reveal differential clock control and function. Proc Natl Acad Sci. 2000;97:4339–4344. [PMC free article] [PubMed]
  • Chain F, Ilieva D, Evans B. Duplicate gene evolution and expression in the wake of vertebrate allopolyploidization. BMC Evol Biol. 2008;8:43. doi: 10.1186/1471-2148-8-43. [PMC free article] [PubMed] [Cross Ref]
  • Conant GC, Wolfe KH. Turning a hobby into a job: How duplicated genes find new functions. Nat Rev Genet. 2008;9:938–950. [PubMed]
  • David L, Blum S, Feldman MW, Lavi U, Hillel J. Recent duplication of the common carp (Cyprinus carpio L.) genome as revealed by analyses of microsatellite loci. Mol Biol Evol. 2003;20:1425–1434. [PubMed]
  • Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:1700–1708. [PMC free article] [PubMed]
  • Force A, Lynch M, Pickett FB, Amores A, Yan Y, Postlethwait J. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999;151:1531–1545. [PMC free article] [PubMed]
  • Gallardo MH, Bickham JW, Honeycutt RL, Ojeda RA, Köhler N. Discovery of tetraploidy in a mammal. Nature. 1999;401:341. doi: 10.1038/43815. [PubMed] [Cross Ref]
  • Garcia-Fernàndez J, Holland PWH. Archetypal organization of the amphioxus Hox gene cluster. Nature. 1994;370:563–566. [PubMed]
  • Gekakis N, Staknis D, Nguyen HB, Davis FC, Wilsbacher LD, King DP, Takahashi JS, Weitzcircadian CJ. Role of the CLOCK protein in the mammalian circadian mechanism. Science. 1998;280:1564–1569. [PubMed]
  • Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003;52:696–704. [PubMed]
  • Hahn MW, Han MV, Han S-G. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:e197. doi: 10.1371/journal.pgen.0030197. [PMC free article] [PubMed] [Cross Ref]
  • Henikoff S, Henikoff J. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci. 1992;89:10915–10919. [PMC free article] [PubMed]
  • Hogenesch JB, Gu Y, Moran SM, Shimomura K, Radcliffe LA, Takahashi JS, Bradfield CA. The basic helix-loop-helix-pas protein MOP9 is a brain-specific heterodimeric partner of circadian and hypoxia factors. J Neurosci. 2000;20:RC83. http://www.jneurosci.org/cgi/content/full/20004296. [PubMed]
  • Ikeda M, Nomura M. cDNA cloning and tissue-specific expression of a novel basic helix–loop–helix/pas protein (BMAL1) and identification of alternatively spliced variants with alternative translation initiation site usage. Biochem Biophys Res Commun. 1997;233:258–264. [PubMed]
  • Ishikawa T, Hirayama J, Kobayashi Y, Todo T. Zebrafish CRY represses transcription mediated by CLOCK-BMAL heterodimer without inhibiting its binding to DNA. Genes Cells. 2002;7:1073–1086. [PubMed]
  • Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. [PubMed]
  • Jarinova O, Hatch G, Poitras L, Prudhomme C, Grzyb M, Aubin J, Bérubé-Simard F-A, Jeannotte L, Ekker M. Functional resolution of duplicated hoxb5 genes in teleosts. Development. 2008;135:3543–3553. [PubMed]
  • Jovelin R, He X, Amores A, Yan Y, Shi R, Qin B, Roe B, Cresko WA, Postlethwait JH. Duplication and divergence of fgf8 functions in teleost development and evolution. J Exp Zool B Mol Dev Evol. 2007;308:730–743. [PubMed]
  • Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E, et al. Ensmart: A generic system for fast and flexible access to biological data. Genome Res. 2004;14:160–169. [PMC free article] [PubMed]
  • Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64. [PMC free article] [PubMed]
  • Kikuta H, Laplante M, Navratilova P, Komisarczuk AZ, Engström PG, Fredman D, Akalin A, Caccamo M, Sealy I, Howe K, et al. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates. Genome Res. 2007;17:545–555. [PMC free article] [PubMed]
  • Larhammar D, Risinger C. Molecular genetic aspects of tetraploidy in the common carp Cyprinus carpio. Mol Phylogenet Evol. 1994;3:59–68. [PubMed]
  • Li W-H. Rate of gene silencing at duplicate loci: A theoretical study and interpretation of data from tetraploid fishes. Genetics. 1980;95:237–258. [PMC free article] [PubMed]
  • Li W-H, Gu Z, Cavalcanti AR, Nekrutenko A. Detection of gene duplications and block duplications in eukaryotic genomes. J Struct Funct Genomics. 2005;3:27–34. [PubMed]
  • Lynch M, Force A. The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000;154:459–473. [PMC free article] [PubMed]
  • Mungpakdee S, Seo H-C, Angotzi AR, Dong X, Akalin A, Chourrout D. Differential evolution of the 13 Atlantic salmon Hox clusters. Mol Biol Evol. 2008a;25:1333–1343. [PubMed]
  • Mungpakdee S, Seo H-C, Chourrout D. Spatio-temporal expression patterns of anterior Hox genes in Atlantic salmon (Salmo salar) Gene Expr Patterns. 2008b;8:508–514. [PubMed]
  • Nakatani Y, Takeda H, Kohara Y, Morishita S. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 2007;17:1254–1265. [PMC free article] [PubMed]
  • Naruse K, Tanaka M, Mita K, Shima A, Postlethwait J, Mitani H. A medaka gene map: The trace of ancestral vertebrate proto-chromosomes revealed by comparative gene mapping. Genome Res. 2004;14:820–828. [PMC free article] [PubMed]
  • Ohno S. Evolution by gene duplication. Springer-Verlag; Berlin, Germany: 1970.
  • Pando MP, Sassone-Corsi P. Unraveling the mechanisms of the vertebrate circadian clock: Zebrafish may light the way. BioEssays. 2002;24:419–426. [PubMed]
  • Philippe H, Lartillot N, Brinkmann H. Multigene analyses of bilaterian animals corroborate the monophyly of ecdysozoa, lophotrochozoa, and protostomia. Mol Biol Evol. 2005;22:1246–1253. [PubMed]
  • Postlethwait JH. The zebrafish genome in context: Ohnologs gone missing. J Exp Zool B Mol Dev Evol. 2007;308:563–577. [PubMed]
  • Postlethwait JH, Yan Y-L, Gates MA, Horne S, Amores A, Brownlie A, Donovan A, Egan ES, Force A, Gong Z, et al. Vertebrate genome evolution and the zebrafish gene map. Nat Genet. 1998;18:345–349. [PubMed]
  • Postlethwait JH, Amores A, Yan Y, Austin C. Duplication of a portion of human chromosome 20q containing topoisomerase (top1) and snail genes provide evidence on genome expansion and the radiation of teleost fish. In: Shimizu N, et al., editors. Aquatic genomics. Springer-Verlag; Tokyo, Japan: 2002. pp. 20–34.
  • Postlethwait J, Amores A, Cresko W, Singer A, Yan Y-L. Subfunction partitioning, the teleost radiation and the annotation of the human genome. Trends Genet. 2004;20:481–490. [PubMed]
  • Risinger C, Larhammar D. Multiple loci for synapse protein SNAP-25 in the tetraploid goldfish. Proc Natl Acad Sci. 1993;90:10598–10602. [PMC free article] [PubMed]
  • Satou Y, Imai KS, Levine M, Kohara Y, Rokhsar D, Satoh N. A genomewide survey of developmentally relevant genes in Ciona intestinalis I. Genes for bHLH transcription factors. Dev Genes Evol. 2003;213:213–221. [PubMed]
  • Schmid M, Steinlein C. Chromosome banding in Amphibia. XVI. High-resolution replication banding patterns in Xenopus laevis. Chromosoma. 1991;101:123–132. [PubMed]
  • Sharp AJ, Hansen S, Selzer RR, Cheng Z, Regan R, Hurst JA, Stewart H, Price SM, Blair E, Hennekam RC, et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat Genet. 2006;28:1038–1042. [PubMed]
  • She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE, et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature. 2004;431:927–930. [PubMed]
  • Spring J. Vertebrate evolution by interspecific hybridization—are we polyploid? FEBS Lett. 1997;400:2–8. [PubMed]
  • Taylor JS, Braasch I, Frickey T, Meyer A, de Peer YV. Genome duplication, a trait shared by 22,000 species of ray-finned fish. Genome Res. 2003;13:382–390. [PMC free article] [PubMed]
  • Uyeno T, Smith GR. Tetraploid origin of the karyotype of catostomid fishes. Science. 1972;175:644–646. [PubMed]
  • Van de Peer Y. Computational approaches to unveiling ancient genome duplications. Nat Rev Genet. 2004;5:752–763. [PubMed]
  • Van de Peer Y, Taylor JS, Meyer A. Are all fishes ancient polyploids? J Struct Funct Genomics. 2003;3:65–73. [PubMed]
  • Wall DP, Fraser HB, Hirsh AE. Detecting putative orthologs. Bioinformatics. 2003;19:1710–1711. [PubMed]
  • Wang H. Comparative genomic analysis of teleost fish bmal genes. Genetica. 2009;136:149–161. [PubMed]
  • Watterson GA. On the time for gene silencing at duplicate loci. Genetics. 1983;105:745–766. [PMC free article] [PubMed]
  • Winkler C, Schäfer M, Duschl J, Schartl M, Volff J-N. Functional divergence of two zebrafish midkine growth factors following fish-specific gene duplication. Genome Res. 2003;13:1067–1081. [PMC free article] [PubMed]
  • Wolfe K. Robustness—it's not where you think it is. Nat Genet. 2000;25:3–4. [PubMed]
  • Woods IG, Kelly PD, Chu F, Ngo-Hazelett P, Yan Y-L, Huang H, Postlethwait JH, Talbot WS. A comparative map of the zebrafish genome. Genome Res. 2000;10:1903–1914. [PMC free article] [PubMed]
  • Woods IG, Wilson C, Friedlander B, Chang P, Reyes DK, Nix R, Kelly PD, Chu F, Postlethwait JH, Talbot WS, et al. The zebrafish gene map defines ancestral vertebrate chromosomes. Genome Res. 2005;15:1307–1314. [PMC free article] [PubMed]
  • Zhang J. Parallel functional changes in the digestive rnases of ruminants and colobines by divergent amino acid substitutions. Mol Biol Evol. 2003;20:1310–1317. [PubMed]
  • Zhang J, Zhang Y, Rosenberg HF. Adaptive evolution of a duplicated pancreatic ribonuclease gene in a leaf-eating monkey. Nat Genet. 2002;30:411–415. [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...