![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2003, The National Academy of Sciences Evolution Evolution of olfactory receptor genes in the human genome Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, 328 Mueller Laboratory, University Park, PA 16802 * To whom correspondence should be addressed. E-mail: nxm2/at/psu.edu. Contributed by Masatoshi Nei, August 11, 2003 This article has been cited by other articles in PMC.Abstract Olfactory receptor (OR) genes form the largest known multigene family in the human genome. To obtain some insight into their evolutionary history, we have identified the complete set of OR genes and their chromosomal locations from the latest human genome sequences. We detected 388 potentially functional genes that have intact ORFs and 414 apparent pseudogenes. The number and the fraction (48%) of functional genes are considerably larger than the ones previously reported. The human OR genes can clearly be divided into class I and class II genes, as was previously noted. Our phylogenetic analysis has shown that the class II OR genes can further be classified into 19 phylogenetic clades supported by high bootstrap values. We have also found that there are many tandem arrays of OR genes that are phylogenetically closely related. These genes appear to have been generated by tandem gene duplication. However, the relationships between genomic clusters and phylogenetic clades are very complicated. There are a substantial number of cases in which the genes in the same phylogenetic clade are located on different chromosomal regions. In addition, OR genes belonging to distantly related phylogenetic clades are sometimes located very closely in a chromosomal region and form a tight genomic cluster. These observations can be explained by the assumption that several chromosomal rearrangements have occurred at the regions of OR gene clusters and the OR genes contained in different genomic clusters are shuffled. Olfaction, the sense of smell, is important for mammals to find food, identify mates and offspring, and avoid danger. Mammalian olfactory systems can discriminate between thousands of different odorant molecules in the environment. These odorant molecules are detected by olfactory receptors (ORs), which are encoded by the largest multigene family in mammals. OR genes were first characterized in rats (1), and have been identified in various vertebrates, from lampreys to humans (reviewed in refs. 2–5). ORs are G protein-coupled receptors that have seven α-helical transmembrane regions and trigger a signaling cascade. Mammalian OR genes are expressed mainly in sensory neurons of olfactory epithelium in nasal cavities. It is generally believed that each olfactory neuron expresses only one OR gene (6, 7), but this mechanism is still unknown (8). Some mammalian OR genes are expressed in spermatogenic cells, and recent study indicates that they have a function in sperm chemotaxis (9, 10). OR genes are ≈310 codons long and contain no introns in the coding region. This property facilitates the identification of OR genes from genome sequences. A total of 906 OR genes and pseudogenes were identified from draft human genome sequences and other databases by homology search (11). They were distributed on different chromosomes and typically found in clusters, although there were some singletons. Statistical analysis suggested that at least 63% of them are pseudogenes. The number of intact (potentially functional) OR genes in the human genome was reported as 322 (11) or 347 (12). In mice, 1,296–1,393 OR functional genes and pseudogenes were detected from draft genome sequences (13, 14). The fraction of pseudogenes in the mouse genome is much lower than that in the human and has been estimated to be ≈20%. This observation is likely to reflect the difference in the importance of olfaction between humans and mice (15, 16). The catfish genome is believed to contain ≈100 OR genes (17). On the basis of sequence similarity, OR genes in mammals, birds, and amphibians are classified into two groups: class I and class II genes (3, 18). All known OR genes in teleosts belong to class I. Several experiments suggested that class I ORs are specialized for recognizing water-soluble odorants, whereas class II ORs are specialized for airborne odorants in amphibians (19, 20). For this reason, class I ORs in mammals were thought to be evolutionary relics. However, the human and mouse genomes also contain a substantial number of class I OR genes that are potentially functional (11, 13). The functional significance of these class I genes in mammals is unknown. The classification or nomenclature of OR genes is not fully established. Glusman et al. (18) proposed a hierarchical nomenclature based on families and subfamilies, which correspond to the largest phylogenetic groups with >40% and 60% amino acid identities, respectively. According to them, class I OR genes can be classified into 17 families, and class II genes were classified into 14 families. In contrast, Zozulya et al. (12) proposed another nomenclature, in which both phylogenetic grouping and chromosomal location were taken into account. They classified human OR genes into 119 families. The evolution of OR genes is poorly understood, mainly because the number of genes is so large. The purpose of this study is to obtain some insight into the evolutionary dynamics of a large multigene family. We conducted a phylogenetic analysis of all OR genes that are putatively functional and introduced a previously undescribed system of OR gene classification. Using this classification, we investigated evolutionary relationships of OR genes from the same and different chromosomal regions. Materials and Methods Detection of OR Genes and Pseudogenes. To detect OR functional genes and pseudogenes from the complete human genome sequences, a homology search was conducted. The DNA sequences of all human chromosomes were downloaded from genome.ucsc.edu (hg15, the April 10, 2003, version; ref. 21). Human OR gene sequences were obtained from Zozulya et al. (12) and the Human Olfactory Receptor Data Exploratorium (HORDE), which is available on the web site, bioinformatics.weizmann.ac.il/HORDE (11). We merged these two databases to make a nonredundant data set, including 356 intact human OR genes. We then conducted a tblastn search (22) with the E value of 10–20 against the whole human genome sequences by using each of the intact human OR genes as a query. We regarded all of the matches detected by the homology search as OR functional genes or pseudogenes. The criterion of the E value of 10–20 is similar to that previously used for searching mouse OR pseudogenes (14), but ours is more stringent for short matches of <150 amino acids and is slightly weaker for the matches covering almost the entire sequence. The matches obtained by the homology search were classified into functional genes and pseudogenes in the following way: We first regarded matches that were shorter than 250 amino acids, and those containing interrupting stop codons or frameshifts, as pseudogenes. The other matches were used for further analysis. For each of the matches, we extended the DNA sequence to both 3′ and 5′ directions along the chromosome to extract the longest sequence that starts with the initiation codon ATG and ends with the stop codon. All these sequences were translated and aligned by using the program fft-ns-i (23), and the most appropriate start codon positions were chosen through visual inspection. We then assigned the transmembrane regions according to Zozulya et al. (12), and the sequences having long (>3 amino acids) deletions or insertions within transmembrane regions and those lacking the extracellular region completely before the first transmembrane region were regarded as pseudogenes. The remaining sequences were defined as functional genes. Phylogenetic Analysis. Phylogenetic trees in Figs. Figs.22
Classification of Pseudogenes into Class I and Class II. We conducted a blastp search (22) for all of the 414 OR pseudogenes detected above, against all of the 388 functional OR genes. Each pseudogene is classified into class I and class II, when the best hit belongs to class I and class II, respectively. Results OR Functional Genes and Pseudogenes in the Human Genome. Conducting an extensive homology search, we detected 388 potentially functional OR genes that have intact ORFs and determined their exact positions in the human genome. This number is considerably larger than the previous report, i.e., 322 OR genes detected by Glusman et al. (11) or 347 OR genes detected by Zozulya et al. (12). We also identified 414 apparent pseudogenes and their locations in the human genome. Although Glusman et al. (11) reported >900 human OR genes and pseudogenes, they included the sequences that were detected from EST databases. According to them, the total number of the OR genes and pseudogenes of which the genomic positions were assigned was 764, which is smaller than ours (802). Our analysis suggests that the proportion of pseudogenes in the human genome is ≈52%, which is significantly smaller than the previous estimate, i.e., 72% by Rouquier et al. (26) or 63% by Glusman et al. (11). The nucleotide and amino acid sequences and the genomic locations for OR genes are available from our web site, mep.bio.psu.edu/databases. Genomic Distribution of Human OR Genes. Fig. 1
Phylogenetic Analysis. To classify human OR genes into related groups of sequences, we conducted a phylogenetic analysis. Here, we confined our analysis to functional genes only, because most pseudogenes contained deletions and were much shorter than functional genes. Fig. 2 Fig. 2 Relationships Between Genomic Clusters and Phylogenetic Clades. Fig. 3
Fig. 3 Fig. 4
Class I OR genes are exceptional in that all of functional class I genes are located in one cluster, 11.3, and the cluster does not contain any functional genes from class II. To see the genomic distribution of class I pseudogenes, we classified all OR pseudogenes into class I and class II on the basis of homology search against functional genes (see Materials and Methods). Of the 414 pseudogenes, 45 were classified into class I and 369 were classified into class II. The fraction of pseudogenes was 44% in class I and 53% in class II. We also found that all of the class I pseudogenes are located in genomic cluster 11.3, and genomic cluster 11.3 does not contain any class II pseudogenes. Therefore, the correspondence between the cluster 11.3 and class I genes holds true for both functional genes and pseudogenes. Discussion Our results can be summarized as follows: (i) A substantial fraction of human OR genes are pseudogenes. (ii) Functional OR genes that belong to one phylogenetic clade are generally located close to one another on a chromosome, and, in many cases, have the same transcriptional direction. However, (iii) functional OR genes belonging to one phylogenetic clade are often found in several different genomic clusters. (iv) One genomic cluster often contains OR genes belonging to different phylogenetic clades that are distantly related. Observation i suggests that the OR gene family is subject to the birth-and-death model of evolution, in which new genes are formed by gene duplication and some of the duplicate genes differentiate in function, whereas others are inactivated or deleted from the genome (27, 28). In fact, our joint phylogenetic analysis of human and mouse OR genes, which will be published elsewhere, has confirmed this assertion (see also Fig. 5 Observation iii implies that a single genomic cluster was fragmented by chromosomal translocation into smaller clusters that were eventually dispersed on different chromosomal regions. To explain observation iv, a mechanism that brings two clusters from different chromosomal regions into one cluster should be considered. A possible explanation is that recombination takes place between two OR gene clusters located in different chromosomal regions, and genes included in the two clusters are shuffled (see Fig. 7, which is published as supporting information on the PNAS web site). This event can occur by an inversion (when the two clusters are located on the same chromosome), or by a reciprocal translocation (when they are located on different chromosomes). It is thought that reciprocal translocations cannot easily be fixed in the population, because an organism heterozygous for a reciprocal translocation and the original chromosome usually produces only half as many offspring as the homozygotes, and thus they are deleterious (29). In the present case, however, OR genomic clusters are often located at the terminal regions of chromosomes (see Fig. 1 This model appears to be acceptable for the following reasons: First, mammalian species have undergone extensive chromosomal rearrangements. It has been estimated that at least ≈300 chromosomal rearrangements have occurred after the divergence of humans and mice (30). The divergence of phylogenetic clades A–S is much more ancient than the human-mouse divergence (data not shown). Therefore, chromosomal rearrangements appear to have occurred many times after the formation of OR gene clusters. Second, several studies have shown that the recombination between nonallelic low-copy repeats are responsible for chromosomal rearrangements such as deletions, duplications, inversions, and, possibly, translocations (reviewed in refs. 31 and 32). It has also been suggested that, in humans, several chromosomal rearrangements, including reciprocal translocation t(4;8)(p16;p23), have occurred by the mediation of OR gene clusters on chromosomal regions 4p16 and 8p23 (33, 34), although these clusters seem to contain only pseudogenes. As mentioned above, OR genes belonging to one phylogenetic clade tend to form a tandem array in a genomic cluster. However, the genes from different phylogenetic clades often intermingle in a genomic cluster. For example, OR genes from clades B, C, and L are mixed to one another in a genomic cluster 1.5 (Fig. 3 Glusman et al. (11) proposed the “out of chromosome 11” theory of evolution for human OR genes. According to this theory, the duplication of class I OR genomic cluster on chromosome 11 resulted in the formation of the first class II genomic cluster on the same chromosome. This class II cluster was again duplicated, and one of the clusters generated by the duplication was transferred to chromosome 1. This cluster on chromosome 1 was then duplicated many times, and the resultant duplicate clusters were transferred to other chromosomes. However, this theory seems to be unreasonable, because it assumes that each genomic cluster of OR genes is an evolutionary unit. The real process of evolution of OR genes is much more complicated, as we have seen. Supporting Information
Acknowledgments We thank Alex Rooney, Shozo Yokoyama, and Jianzhi Zhang for valuable comments. This work was supported by National Institutes of Health Grant GM20293 (to M.N.). Y.N. was partially supported by the Japan Society for the Promotion of Science. Notes Abbreviation: OR, olfactory receptor. References 1. Buck, L. & Axel, R. (1991. ) Cell 65, 175–187. [PubMed] 2. Buck, L. B. (2000. ) Cell 100, 611–618. [PubMed] 3. Dryer, L. (2000. ) BioEssays 22, 803–810. [PubMed] 4. Firestein, S. (2001. ) Nature 413, 211–218. [PubMed] 5. Mombaerts, P. (2001. ) Nat. Neurosci. 4, Suppl., 1192–1198. [PubMed] 6. Chess, A., Simon, I., Cedar, H. & Axel, R. (1994. ) Cell 78, 823–834. [PubMed] 7. Serizawa, S., Ishii, T., Nakatani, H., Tsuboi, A., Nagawa, F., Asano, M., Sudo, K., Sakagami, J., Sakano, H., Ijiri, T., et al. (2000. ) Nat. Neurosci. 3, 687–693. [PubMed] 8. Kratz, E., Dugas, J. C. & Ngai, J. (2002. ) Trends Genet. 18, 29–34. [PubMed] 9. Parmentier, M., Libert, F., Schurmans, S., Schiffmann, S., Lefort, A., Eggerickx, D., Ledent, C., Mollereau, C., Gérard, C., Perret, J., et al. (1992. ) Nature 355, 453–455. [PubMed] 10. Spehr, M., Gisselmann, G., Poplawski, A., Riffell, J. A., Wetzel, C. H., Zimmer, R. K. & Hatt, H. (2003. ) Science 299, 2054–2058. [PubMed] 11. Glusman, G., Yanai, I., Rubin, I. & Lancet, D. (2001. ) Genome Res. 11, 685–702. [PubMed] 12. Zozulya, S., Echeverri, F. & Nguyen, T. (2001. ) Genome Biol. 2, research0018.1–0018.12. [PubMed] 13. Zhang, X. & Firestein, S. (2002. ) Nat. Neurosci. 5, 124–133. [PubMed] 14. Young, J. M., Friedman, C., Williams, E. M., Ross, J. A., Tonnes-Priddy, L. & Trask, B. J. (2002. ) Hum. Mol. Genet. 11, 535–546. [PubMed] 15. Rouquier, S., Blancher, A. & Giorgi, D. (2000. ) Proc. Natl. Acad. Sci. USA 97, 2870–2874. [PubMed] 16. Gilad, Y., Man, O., Pääbo, S. & Lancet, D. (2003. ) Proc. Natl. Acad. Sci. USA 100, 3324–3327. [PubMed] 17. Ngai, J., Dowling, M. M., Buck, L., Axel, R. & Chess, A. (1993. ) Cell 72, 657–666. [PubMed] 18. Glusman, G., Bahar, A., Sharon, D., Pilpel, Y., White, J. & Lancet, D. (2000. ) Mamm. Genome 11, 1016–1023. [PubMed] 19. Freitag, J., Ludwig, G., Andreini, I., Rossler, P. & Breer, H. (1998. ) J. Comp. Physiol. A 183, 635–650. [PubMed] 20. Mezler, M., Fleischer, J. & Breer, H. (2001. ) J. Exp. Biol. 204, 2987–2997. [PubMed] 21. International Human Genome Sequencing Consortium (2001. ) Nature 409, 860–921. [PubMed] 22. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997. ) Nucleic Acids Res. 25, 3389–3402. [PubMed] 23. Katoh, K., Misawa, K., Kuma, K. & Miyata, T. (2002. ) Nucleic Acids Res. 30, 3059–3066. [PubMed] 24. Saitou, N. & Nei, M. (1987. ) Mol. Biol. Evol. 4, 406–425. [PubMed] 25. Takezaki, N., Rzhetsky, A. & Nei, M. (1995. ) Mol. Biol. Evol. 12, 823–833. [PubMed] 26. Rouquier, S., Taviaux, S., Trask, B. J., Brand-Arpon, V., van den Engh, G., Demaille, J. & Giorgi, D. (1998. ) Nat. Genet. 18, 243–250. [PubMed] 27. Nei, M. (1969. ) Nature 221, 40–42. [PubMed] 28. Nei, M., Gu, X. & Sitnikova, T. (1997. ) Proc. Natl. Acad. Sci. USA 94, 7799–7806. [PubMed] 29. Wright, S. (1941. ) Am. Nat. 75, 513–522. 30. Mouse Genome Sequencing Consortium (2002. ) Nature 420, 520–562. [PubMed] 31. Samonte, R. V. & Eichler, E. E. (2002. ) Nat. Rev. Genet. 3, 65–72. [PubMed] 32. Stankiewicz, P. & Lupski, J. R. (2002. ) Trends Genet. 18, 74–82. [PubMed] 33. Giglio, S., Broman, K. W., Matsumoto, N., Calvari, V., Gimelli, G., Neumann, T., Ohashi, H., Voullaire, L., Larizza, D., Giorda, R., et al. (2001. ) Am. J. Hum. Genet. 68, 874–883. [PubMed] 34. Giglio, S., Calvari, V., Gregato, G., Gimelli, G., Camanini, S., Giorda, R., Ragusa, A., Guerneri, S., Selicorni, A., Stumm, M., et al. (2002. ) Am. J. Hum. Genet. 71, 276–285. [PubMed] 35. Mighell, A. J., Smith, N. R., Robinson, P. A. & Markham, A. F. (2000. ) FEBS Lett. 468, 109–114. [PubMed] 36. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T., Roskin, K. M., Schwartz, M., Sugnet, C. W., Thomas, D. J., et al. (2003. ) Nucleic Acids Res. 31, 51–54. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Cell. 1991 Apr 5; 65(1):175-87.
[Cell. 1991]Cell. 2000 Mar 17; 100(6):611-8.
[Cell. 2000]Nat Neurosci. 2001 Nov; 4 Suppl():1192-8.
[Nat Neurosci. 2001]Cell. 1994 Sep 9; 78(5):823-34.
[Cell. 1994]Nat Neurosci. 2000 Jul; 3(7):687-93.
[Nat Neurosci. 2000]Genome Res. 2001 May; 11(5):685-702.
[Genome Res. 2001]Genome Biol. 2001; 2(6):RESEARCH0018.
[Genome Biol. 2001]Nat Neurosci. 2002 Feb; 5(2):124-33.
[Nat Neurosci. 2002]Hum Mol Genet. 2002 Mar 1; 11(5):535-46.
[Hum Mol Genet. 2002]Proc Natl Acad Sci U S A. 2000 Mar 14; 97(6):2870-4.
[Proc Natl Acad Sci U S A. 2000]Mamm Genome. 2000 Nov; 11(11):1016-23.
[Mamm Genome. 2000]Genome Biol. 2001; 2(6):RESEARCH0018.
[Genome Biol. 2001]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Genome Biol. 2001; 2(6):RESEARCH0018.
[Genome Biol. 2001]Genome Res. 2001 May; 11(5):685-702.
[Genome Res. 2001]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Hum Mol Genet. 2002 Mar 1; 11(5):535-46.
[Hum Mol Genet. 2002]Nucleic Acids Res. 2002 Jul 15; 30(14):3059-66.
[Nucleic Acids Res. 2002]Mol Biol Evol. 1987 Jul; 4(4):406-25.
[Mol Biol Evol. 1987]Mol Biol Evol. 1995 Sep; 12(5):823-33.
[Mol Biol Evol. 1995]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Genome Res. 2001 May; 11(5):685-702.
[Genome Res. 2001]Genome Biol. 2001; 2(6):RESEARCH0018.
[Genome Biol. 2001]Nat Genet. 1998 Mar; 18(3):243-50.
[Nat Genet. 1998]Nat Genet. 1998 Mar; 18(3):243-50.
[Nat Genet. 1998]Genome Res. 2001 May; 11(5):685-702.
[Genome Res. 2001]Nat Neurosci. 2002 Feb; 5(2):124-33.
[Nat Neurosci. 2002]Hum Mol Genet. 2002 Mar 1; 11(5):535-46.
[Hum Mol Genet. 2002]Mamm Genome. 2000 Nov; 11(11):1016-23.
[Mamm Genome. 2000]Mamm Genome. 2000 Nov; 11(11):1016-23.
[Mamm Genome. 2000]Nature. 1969 Jan 4; 221(5175):40-2.
[Nature. 1969]Proc Natl Acad Sci U S A. 1997 Jul 22; 94(15):7799-806.
[Proc Natl Acad Sci U S A. 1997]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Nat Rev Genet. 2002 Jan; 3(1):65-72.
[Nat Rev Genet. 2002]Trends Genet. 2002 Feb; 18(2):74-82.
[Trends Genet. 2002]Am J Hum Genet. 2001 Apr; 68(4):874-83.
[Am J Hum Genet. 2001]Am J Hum Genet. 2002 Aug; 71(2):276-85.
[Am J Hum Genet. 2002]FEBS Lett. 2000 Feb 25; 468(2-3):109-14.
[FEBS Lett. 2000]Genome Res. 2001 May; 11(5):685-702.
[Genome Res. 2001]Mamm Genome. 2000 Nov; 11(11):1016-23.
[Mamm Genome. 2000]Nucleic Acids Res. 2003 Jan 1; 31(1):51-4.
[Nucleic Acids Res. 2003]