![]() | ![]() |
Formats:
|
||||||||||||||||
Evolutionary Change of the Numbers of Homeobox Genes in Bilateral Animals Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University E-mail: jnam/at/caltech.edu. The publisher's final edited version of this article is available free at Mol Biol Evol. See other articles in PMC that cite the published article.Abstract It has been known that the conservation or diversity of homeobox genes is responsible for the similarity and variability of some of the morphological or physiological characters among different organisms. To gain some insights into the evolutionary pattern of homeobox genes in bilateral animals, we studied the change of the numbers of these genes during the evolution of bilateral animals. We analyzed 2,031 homeodomain sequences compiled from 11 species of bilateral animals ranging from Caenorhabditis elegans to humans. Our phylogenetic analysis using a modified reconciled-tree method suggested that there were at least about 88 homeobox genes in the common ancestor of bilateral animals. About 50–60 genes of them have left at least one descendant gene in each of the 11 species studied, suggesting that about 30–40 genes were lost in a lineage-specific manner. Although similar numbers of ancestral genes have survived in each species, vertebrate lineages gained many more genes by duplication than invertebrate lineages, resulting in more than 200 homeobox genes in vertebrates and about 100 in invertebrates. After these gene duplications, a substantial number of old duplicate genes have also been lost in each lineage. Because many old duplicate genes were lost, it is likely that lost genes had already been differentiated from other groups of genes at the time of gene loss. We conclude that both gain and loss of homeobox genes were important for the evolutionary change of phenotypic characters in bilateral animals. Keywords: homeobox genes, molecular evolution, gene duplication, gene loss, evolutionary developmental biology Introduction Homeobox genes that regulate morphogenesis were first discovered by Garber, Kuroiwa, and Gehring (1983) and Scott et al. (1983) in Drosophila melanogaster (fruit-fly). Subsequent studies of homeobox genes in fruitflies, frogs, and humans revealed a highly conserved motif of about 180 bp called the homeobox (McGinnis et al. 1984; Scott and Weiner 1984). The homeobox encodes a DNA-binding domain called the homeodomain. In the genomes of animals and plants, homeobox genes form a large transcription factor gene family, with more than 200 genes in humans and about 80 genes in Arabidopsis. Animal homeobox genes were previously classified into about 30 different groups or families based on their sequence similarity and protein domain structure (Burglin 1994). Additional groups of homeobox genes were identified later (e.g., PBC, MEIS, PKNOX/PREP, TGIF, and IRO; Burglin 1997), and now the homeobox genes in animals can be classified into at least 49 different gene families (see Burglin 2005, for a more detailed classification). Member genes of the same family are often functionally related, and different families of homeobox genes are concerned with different aspects of development (Burglin 1994). For example, the genes of the HOX, CDX, and EVX families and their cognate genes play important roles in different steps of body pattern formation during early embryogenesis of animals. The PAX6, SIX, VAX, and EMX gene families are concerned with the development of eyes, whereas the LIM and HMX gene families are important in the development of neurons (reviewed in Duboule 1994). Because of their important roles in development, homeobox genes have been studied extensively by both developmental and evolutionary biologists. Homeobox genes are generally highly conserved and control similar phenotypic characters among distantly related organisms (reviewed in De Robertis 1994). However, they are also responsible for controlling different phenotypic characters among relatively closely related species (e.g., Galant and Carroll 2002; Ronshaugen, N. McGinnis, and W. McGinnis 2002). The formation of similar phenotypic characters can be explained by the conservation of shared homeobox genes. By contrast, different phenotypic characters are believed to be generated by duplication of homeobox genes and their functional differentiation. It has also been hypothesized that the loss of some homeobox genes are responsible for morphological differentiation (Ruddle et al. 1994). Therefore, it is interesting to study the pattern of duplication and loss of homeobox genes to have some insights into the evolutionary change of phenotypic characters. It is likely that the number of homeobox genes is related to the complexity of organisms. Although the patterns of gain and loss of homeobox genes belonging to some families have been studied (e.g., Zhang and Nei 1996; Aparicio et al. 1997; Kappen 2000; Wada et al. 2003; Amores et al. 2004; Edvardsen et al. 2005), no one appears to have studied the gain and loss of the entire set of homeobox genes covering diverse bilateral animals. We have therefore decided to study the evolutionary change of the homeobox gene superfamily examining 11 completely or nearly completely sequenced genomes from bilateral animals. For this purpose, we used a modified version of the reconciled-tree method (Goodman et al. 1979; Page and Charleston 1997) taking into account the ambiguity of gene tree. Although our estimates are crude and conservative, we can still obtain a rough picture of the gain and loss of homeobox genes and their significance for morphological evolution. The method used and the results obtained will be presented in this paper. Materials and Methods Identification of Homeodomain-Containing Proteins To find homeodomain-containing proteins, we performed homology search using the tool Psi-Blast (Altschul et al. 1997) for the entire set of annotated proteins of Caenorhabditis elegans, Caenorhabditis briggsae, mosquito (Anopheles gambiae), fruitfly (D. melanogaster), tunicate (Ciona intestinalis), zebrafish (Danio rerio), pufferfish (Fugu rubripes), frog (Xenopus tropicalis), rat, mouse, and humans. All sequence data except for the tunicate were downloaded from the ENSEMBL (ftp://ftp. ensembl.org) as of February 21, 2005. The tunicate data set (version 1) was downloaded from the Joint Genome Institute (http://genome.jgi-psf.org/). We used 207 homeodomain sequences from animals, plants, and fungi as queries, with an E value ≤ 10−5 (see Supplementary Material online). We also searched for homeobox genes from the expressed sequence tag (EST) database of the tunicate from the DNA Data Bank of Japan (http://www.ddbj.nig.ac.jp/) because Wada et al. (2003) reported several unannotated homeobox genes from the EST database of this organism. Phylogenetic Analysis Because the homeodomain is the only alignable region between different groups of homeodomain-containing proteins, we used only this domain (≈60 aa) for phylogenetic analysis. The homeodomain sequences were aligned against the alignment of 207 query sequences (seed alignments) using the profile alignment of the ClustalX program (Thompson et al. 1997). We then constructed a neighbor-joining tree (Saitou and Nei 1987) using the computer program NJBOOT (Takezaki, Rzhetsky, and Nei 1995) with the pairwise deletion option, proportional amino acid differences (p-distances), and 1,000 bootstrap resamplings (Nei and Kumar 2000). Because of the large number of sequences used, other tree construction methods such as maximum-parsimony and maximum-likelihood methods were notused. Each homeobox gene was assigned to one of the 49 previously defined groups according to the sequence similarity and protein domain structure. Domain structure was examined by using the computer program HMMER (Eddy 2001) for each protein domain profile downloaded from the Pfam (http://pfam.wustl.edu/). A phylogenetic tree for the 49 families of genes was constructed to find their evolutionary relationships. Estimation of the Number of Genes in the Ancestral Species When there is a rooted tree of m species, the tree has m − 1 ancestral nodes or species (Nei and Kumar 2000). We are interested in estimating the number of homeobox genes in each of the ancestral species and how the number has changed in the evolutionary process. This can be studied by comparing the species tree with the gene tree for a given set of genes and constructing a reconciled tree (Goodman et al. 1979; Page and Charleston 1997). In this paper, we use a modified version of this reconciled-tree method in which multifurcating branching patterns are taken into account. For simplicity, let us consider the tree of three species α, β, and γ in figure 1A
A similar inference indicates that the ancestral species also contained three genes, i.e., two genes in group I, one gene in group II, and zero gene in group III (fig. 1CIn the above estimation of the number of genes, we assumed that the gene tree is correct. In practice, however, some interior branches are often weakly supported in terms of bootstrap values. For example, one interior branch may have a low bootstrap value (<50% in the present study). In this case, the existence of this branch is questionable, so that the length of this branch is reduced to zero, and a condensed tree (Nei and Kumar 2000) is constructed (fig. 1D . In the case of tree F, the reconciled tree is given by figure 1H are given in figure 1IWhen there are several branches with low bootstrap values, the numbers of genes in ancestral species are estimated by the same procedure as the above under the principle of parsimony. Therefore, one can estimate the number of genes for any number of ancestral species. Obviously, the number of genes estimated would be minimal, but because homeobox genes evolve very slowly, the present method appears to give reasonably good estimates (see below). When m is large, the computation can be quite complicated, and we have developed a computer program (available by request to J.N.). Results Number of Homeobox Genes in the Genome Table 1 shows the numbers of nonredundant homeobox genes obtained from the annotated gene sets of 11 species. The majority of the homeobox genes encode only one homeodomain (single-homeobox genes), but some encode more than one domain (multihomeobox genes). The number of homeodomains encoded by a multihomeobox gene was less than 10 with some exceptions. All vertebrate species studied (pufferfish, zebrafish, frog, mouse, rat, and humans) had about 200 or more homeobox genes, and all invertebrate species (C. elegans, C. briggsae, fruitfly, mosquito, and tunicate) had about 100 or fewer homeobox genes. All the sequences used are presented as Supplementary Material online (see file 1).
Evolutionary Relationships of Different Families of Homeobox Genes The majority of the homeobox genes were assigned into the 49 previously defined groups or families. The remaining homeobox genes were either highly divergent or multihomeobox genes. The list of the genes in each of the 49 groups and the multihomeobox genes is available from the Supplementary Material online. Figure 2
At least 13 groups of homeobox genes (gene groups with orange boxes in fig. 2 Figure 2 Evolutionary Change of the Number of Homeobox Genes in Bilateral Animals Knowing that there were already many homeobox genes in the MRCA of all the 11 species (archi-MRCA), we estimated the numbers of homeobox genes in all ancestral organisms and their increase and decrease in different stages of the evolution of bilateral animals. We constructed a phylogenetic tree of 2,031 homeodomain sequences compiled from single- and multihomeodomain–containing proteins in relation to the species tree (fig. 3
To check the reliability of our estimates, we first analyzed HOX family genes. The numbers of ancestral HOX genes at several evolutionary time points have already been estimated by several researchers (e.g., Holland and Garcia-Fernandez 1996; Zhang and Nei 1996; Stellwag 1999; Wada et al. 2003). In the case of HOX genes, estimation of the numbers of ancestral genes is relatively easy because information about the conserved genomic locations of HOX genes can also be used for reconstructing the ancestral states. We compared our estimates of the numbers of ancestral genes with the previous estimates, assuming that the previous estimates are correct (fig. 3A Increase of Homeobox Genes in the Evolutionary Process Keeping in mind this possibility of underestimation, we estimated the numbers of ancestral genes and the numbers of genes lost and gained for the entire homeobox gene superfamily. Figure 3B After the divergence of coelomates and pseudocoelomates, the number of homeobox genes increased almost threefold in the vertebrate lineages. In invertebrates, however, the increase was small or moderate, and our results suggest that the number of homeobox genes did not merely increase during the evolutionary process, but the number sometimes decreased. For example, the MRCA of insects and vertebrates had at least 118 homeobox genes, but fruit-flies have 102 at present. Tunicates also have fewer homeobox genes than the MRCA of tunicates and vertebrates. In the case of vertebrate lineages, the number of genes increased primarily in two time periods, that is, the early stages of coelomate evolution (between nodes α and β in fig. 3B The Ecdysozoa hypothesis (e.g., Aguinaldo et al. 1997; H. Dopazo and J. Dopazo 2005) proposes that insects are more closely related to nematodes than to vertebrates. We therefore examined the numbers of ancestral genes for all the MRCAs of the species tree (fig. 4 The Coelomata and the Ecdysozoa hypotheses are quite controversial now. However, the studies based on a large number of nuclear protein sequences (e.g., Blair et al. 2002; Wolf, Rogozin, and Koonin 2004; Philip, Creevey, and McInerney 2005) generally support the Coelomata tree. It should also be noted that in our data set the number of gains and losses of genes in the entire evolutionary process is considerably smaller (more parsimonious) for the Coelomata tree than for the Ecdysozoa tree (figs. 3 Retention and Loss of Ancestral Homeobox Genes in Each Species It is interesting to know how many gene families of the archi-MRCA have left descendent genes in the 11 species and how many gene families have been lost during this evolutionary period. Figure 2 We also studied the numbers of ancestral homeobox genes lost during the time period from the archi-MRCA to the present species (table 2). Let us consider figure 1C
Table 2 shows that the invertebrate lineages lost about 30–38 genes of the 88 ancestral genes in the archi-MRCA, whereas the vertebrate lineages lost about 25–28 genes during the same period. This suggests that invertebrates lost somewhat more genes than vertebrates. However, the difference is much smaller than that observed with full genome analysis, which suggests that about two-fold or more gene losses occurred in the lineage leading to C. elegans than the lineage leading to human (Hughes and Friedman 2004; Koonin et al. 2004; Ogura, Ikeo, and Gojobori 2005). Similarly, the numbers of genes lost from the ancestor β to insects and tunicates were somewhat higher than those in the vertebrate lineages (see column 3 in table 2). In vertebrates, the numbers of lost genes are more or less the same for each MRCA. However, because the major increase of gene number occurred in the early stage of vertebrate evolution (between γ and δ), fishes lost somewhat smaller numbers of genes than other vertebrates. These results suggest that the degree of gene loss varies significantly among different families of homeobox genes, but it is not so different among different species. When the Ecdysozoa tree was used, the number of genes lost from the archi-MRCA is more than two times greater than that for the Coelomata tree (table 3). This indicates that the Coelomata tree is much more parsimonious than the Ecdysozoa tree. Therefore, our data support the former tree, as mentioned earlier.
Discussion In this study, we showed that there were at least 88 homeobox genes in the archi-MRCA of bilateral animals when the Coelomata tree was used. Previously, we mentioned that our statistical method would give minimum estimates of the numbers of ancestral genes. However, our estimate of the total number of genes in the archi-MRCA is close to the current number of genes in nematodes and insects. This is also true with the number in each gene family. These observations suggest that our estimates may not be too far off from the true numbers. Furthermore, the similarity of the estimates for the archi-MRCA and those for nematodes and insects suggest that the archi-MRCA had the same degree of phenotypic complexity as that of current nematodes or insects. Because vertebrates gained more homeobox genes than invertebrates, it appears that this increase in the number of homeobox genes is responsible for the formation of more complex characters in vertebrates than in invertebrates. We have also seen that many homeobox genes have been lost in the process of evolution of phenotypic characters. This loss of homeobox genes might have been either inactivation of redundant genes after gene duplication or loss of functionally differentiated genes (Ruddle et al. 1994; Wagner, Amemiya, and Ruddle 2003). The genes lost in our study are losses of fairly old duplicate genes, and therefore it is likely that the genes lost were already functionally differentiated from their paralogous genes at the time of gene loss. This raises the question of why genes could be lost so often. There are at least three possible explanations. First, without closely related paralogous genes, functional redundancy can be achieved by something called distributed robustness (reviewed in Wagner 2005). In other words, loss (or mutation) of a homeobox gene can be buffered by the rewiring of functionally different parts of the regulatory network. If so, it is possible that losses of homeobox genes might not have caused any noticeable changes of phenotypes. Second, it is also possible that gene loss occasionally has had beneficial effects. For example, loss of genes may be related to the reduction of unused characters. Third, the phenotypic changes caused by the loss of homeobox genes might have been more or less neutral with respect to fitness. In the case of multifunctional genes, this is possible if the critical functions are shared by duplicate genes. The gain and loss of homeobox genes are probably initially opportunistic, but these events may change the evolutionary courses of different organisms. However, the possible causes of gene loss mentioned above are speculative, and more detailed studies are needed to identify the real reason. Supplementary Material Supplementary files are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments We thank David Geiser, Claudede Pamphilis, Hong Ma, Jan Klein, Junyul Kim, Li Hao, Yoshi Suzuki, and Kerstin Kaufmann for valuable comments on an earlier version of the manuscript. We also thank Thomas Bürglin for sending his up-to-date classification of homeobox genes. This study was supported by the National Institutes of Health Grant GM20293 to M.N. Footnotes William Martin, Associate Editor References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
EMBO J. 1983; 2(11):2027-36.
[EMBO J. 1983]Cell. 1983 Dec; 35(3 Pt 2):763-76.
[Cell. 1983]Nature. 1984 Mar 29-Apr 4; 308(5958):428-33.
[Nature. 1984]Proc Natl Acad Sci U S A. 1984 Jul; 81(13):4115-9.
[Proc Natl Acad Sci U S A. 1984]Nucleic Acids Res. 1997 Nov 1; 25(21):4173-80.
[Nucleic Acids Res. 1997]Nature. 2002 Feb 21; 415(6874):910-3.
[Nature. 2002]Nature. 2002 Feb 21; 415(6874):914-7.
[Nature. 2002]Dev Suppl. 1994; ():155-61.
[Dev Suppl. 1994]Genetics. 1996 Jan; 142(1):295-303.
[Genetics. 1996]Nat Genet. 1997 May; 16(1):79-83.
[Nat Genet. 1997]Proc Natl Acad Sci U S A. 2000 Apr 25; 97(9):4481-6.
[Proc Natl Acad Sci U S A. 2000]Dev Genes Evol. 2003 Jun; 213(5-6):222-34.
[Dev Genes Evol. 2003]Genome Res. 2004 Jan; 14(1):1-10.
[Genome Res. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Dev Genes Evol. 2003 Jun; 213(5-6):222-34.
[Dev Genes Evol. 2003]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]Mol Biol Evol. 1987 Jul; 4(4):406-25.
[Mol Biol Evol. 1987]Mol Biol Evol. 1995 Sep; 12(5):823-33.
[Mol Biol Evol. 1995]Mol Phylogenet Evol. 1997 Dec; 8(3):349-62.
[Mol Phylogenet Evol. 1997]Proc Natl Acad Sci U S A. 1997 Dec 9; 94(25):13749-53.
[Proc Natl Acad Sci U S A. 1997]Nucleic Acids Res. 1997 Nov 1; 25(21):4173-80.
[Nucleic Acids Res. 1997]Nature. 1995 Nov 9; 378(6553):150-7.
[Nature. 1995]Genes Dev. 2004 Jul 15; 18(14):1725-36.
[Genes Dev. 2004]Gene Expr Patterns. 2004 Nov; 5(1):11-22.
[Gene Expr Patterns. 2004]Trends Genet. 2002 Jan; 18(1):41-7.
[Trends Genet. 2002]BMC Evol Biol. 2002; 2():1.
[BMC Evol Biol. 2002]Genome Res. 2004 Jan; 14(1):29-36.
[Genome Res. 2004]Mol Biol Evol. 2005 May; 22(5):1175-84.
[Mol Biol Evol. 2005]Dev Biol. 1996 Feb 1; 173(2):382-95.
[Dev Biol. 1996]Genetics. 1996 Jan; 142(1):295-303.
[Genetics. 1996]Semin Cell Dev Biol. 1999 Oct; 10(5):531-40.
[Semin Cell Dev Biol. 1999]Dev Genes Evol. 2003 Jun; 213(5-6):222-34.
[Dev Genes Evol. 2003]Nat Genet. 2002 Jun; 31(2):205-9.
[Nat Genet. 2002]Nature. 1997 May 29; 387(6632):489-93.
[Nature. 1997]BMC Evol Biol. 2002; 2():1.
[BMC Evol Biol. 2002]Genome Res. 2004 Jan; 14(1):29-36.
[Genome Res. 2004]Mol Biol Evol. 2005 May; 22(5):1175-84.
[Mol Biol Evol. 2005]J Mol Evol. 2004 Dec; 59(6):827-33.
[J Mol Evol. 2004]Genome Biol. 2004; 5(2):R7.
[Genome Biol. 2004]Gene. 2005 Jan 17; 345(1):65-71.
[Gene. 2005]Dev Suppl. 1994; ():155-61.
[Dev Suppl. 1994]Proc Natl Acad Sci U S A. 2003 Dec 9; 100(25):14603-6.
[Proc Natl Acad Sci U S A. 2003]Bioessays. 2005 Feb; 27(2):176-88.
[Bioessays. 2005]Nature. 1998 Apr 30; 392(6679):917-20.
[Nature. 1998]Proc Natl Acad Sci U S A. 2001 Feb 27; 98(5):2497-502.
[Proc Natl Acad Sci U S A. 2001]Nature. 2001 Nov 22; 414(6862):419-24.
[Nature. 2001]Genetics. 1996 Jan; 142(1):295-303.
[Genetics. 1996]Dev Biol. 1996 Feb 1; 173(2):382-95.
[Dev Biol. 1996]