![]() | ![]() |
Formats:
|
||||||||||||||
Copyright Bekpen et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Death and Resurrection of the Human IRGM Gene 1Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America 2Howard Hughes Medical Institute, Seattle, Washington, United States of America 3Institut de Biologia Evolutiva (UPF-CSIC), Barcelona, Spain 4Universita' degli Studi di Bari, Bari, Italy 5Institute of Genetics, University of Cologne, Cologne, Germany Mikkel H. Schierup, Editor University of Aarhus, Denmark * E-mail: eee/at/gs.washington.edu Conceived and designed the experiments: CB JCH EEE. Performed the experiments: CB FA MBL MV PS. Analyzed the data: CB TMB CA JMK. Contributed reagents/materials/analysis tools: CB TMB. Wrote the paper: CB JCH EEE. Received October 23, 2008; Accepted January 20, 2009. This article has been cited by other articles in PMC.Abstract Immunity-related GTPases (IRG) play an important role in defense against intracellular pathogens. One member of this gene family in humans, IRGM, has been recently implicated as a risk factor for Crohn's disease. We analyzed the detailed structure of this gene family among primates and showed that most of the IRG gene cluster was deleted early in primate evolution, after the divergence of the anthropoids from prosimians ( about 50 million years ago). Comparative sequence analysis of New World and Old World monkey species shows that the single-copy IRGM gene became pseudogenized as a result of an Alu retrotransposition event in the anthropoid common ancestor that disrupted the open reading frame (ORF). We find that the ORF was reestablished as a part of a polymorphic stop codon in the common ancestor of humans and great apes. Expression analysis suggests that this change occurred in conjunction with the insertion of an endogenous retrovirus, which altered the transcription initiation, splicing, and expression profile of IRGM. These data argue that the gene became pseudogenized and was then resurrected through a series of complex structural events and suggest remarkable functional plasticity where alleles experience diverse evolutionary pressures over time. Such dynamism in structure and evolution may be critical for a gene family locked in an arms race with an ever-changing repertoire of intracellular parasites. Author Summary The IRG gene family plays an important role in defense against intracellular bacteria, and genome-wide association studies have implicated structural variants of the single-copy human IRGM locus as a risk factor for Crohn's disease. We reconstruct the evolutionary history of this region among primates and show that the ancestral tandem gene family contracted to a single pseudogene within the ancestral lineage of apes and monkeys. Phylogenetic analyses support a model where the gene has been “dead” for at least 25 million years of human primate evolution but whose ORF became restored in all human and great ape lineages. We suggest that the rebirth or restoration of the gene coincided with the insertion of an endogenous retrovirus, which now serves as the functional promoter driving human gene expression. We suggest that either the gene is not functional in humans or this represents one of the first documented examples of gene death and rebirth. Introduction Immunity Related GTPases (IRG), a family of genes induced by interferons, are one of the strongest resistance systems to intracellular pathogens [1]–[4]. The IRGM gene has been shown to have a role in the autophagy-targeted destruction of Mycobacterium bovis BCG [5]. Recently, whole genome association studies have shown that specific IRGM haplotypes associate with increased risk for Crohn's disease [6],[7]. The IRG gene family exists as multiple copies (3–21) in most mammalian species but has been reduced to two copies, IRGC and a truncated gene IRGM, in humans [8]. Analysis of mammalian genomes (dog, rat and mouse) has shown that all IRG genes except IRGC are organized in tandem gene clusters mapping to mouse chromosomes 11 and 18 (both syntenic to human chromosome 5) [8]. A comparison of the mouse and human genomes identified 21 genes in mouse but only a single syntenic truncated IRGM copy and IRGC in human [8]. We investigated the copy number and sequence organization of the IRG gene family in multiple nonhuman primate species in order to reconstruct the evolutionary history of this locus. Results Sequence analysis of two different prosimian species (Microcebus murinus and Lemur catta) confirmed the mammalian archetypical organization with three IRGM paralogs in each species (Figure 1
We next compared the structure of the IRGM gene in various primate species. One of the three mouse lemur IRGM genes (IRGM9) preserves a complete ORF based on the mouse model and shows the greatest homology to mouse Irgm1. The ORF encodes a putative 47 kD protein including a classical N-terminal region as well as classical motifs at the end of the carboxyl-terminus associated with most functional murine IRGM loci [8],[9] (see Text S1). The second mouse lemur gene, IRGM8, is likely a pseudogene because of a mutation generating a stop codon within the G domain and a frameshift mutation at the C terminus. The third mouse lemur gene, IRGM7, is atypical because it has substitutions in the G domain that disrupt the G1 motif that interacts with the nucleotide phosphates and is highly conserved in P-loop GTPases [10] (Figure S1 and Text S1). In contrast to mouse and prosimian species, all anthropoid primate lineages show the presence of an AluSc repeat immediately after the splicing acceptor that disrupts the ORF of the sole remaining IRGM gene (Figure 1 In contrast to New World and Old World monkeys, sequencing of the IRGM locus in humans and African great ape species reveals a restored, albeit truncated, ORF of ~20 kD in length. This is consistent with an antiserum raised against peptides from the human IRGM protein that detected a specific signal at ~20 kD by Western blot [11]. In contrast to humans and the African great apes, analysis of the orangutan genome assembly predicted a nonfunctional protein (C to T transition at nucleotide position 150 with respect to the start codon resulting in a premature shared stop codon in the ORF (Figure 1 We noticed an important structural difference in the gene organization for species that regained putative IRGM function when compared to those primates with a pseudogenized version. In the common ancestor of humans and great apes, an ERV9 retroviral element integrated within the 5′ end of the IRGM gene (Figure 1 In humans, we observe constitutive levels of expression of IRGM in all tissues examined, with the highest expression of IRGM in the testis (Figure 3A
We tested for natural selection on IRGM coding sequence using maximum likelihood models to estimate evolutionary rates for individual branches in the phylogeny as well as specific codon changes [13],[14]. Based on the structural differences in IRGM organization, we first divided our species into three groups: Group 1 consists of species that carry a single copy of IRGM with the ERV9 element (human (Hs), chimpanzee (Ptr), gorilla (Ggo) and orangutan (Ppy)); Group 2 consists of species that carry a single copy of IRGM but lack the ERV9 element (Macaque (Rh), baboon (Pha) and marmoset (Cja)); while Group 3 was formed by species (dog and mouse lemur) that had multiple copies in a tandem orientation (Figure 4 = 0.9254) and Group 3 (ω = 0.3866) with an intermediate value for Group 1 (ω = 0.6073). Group 3 was found to be under constrained evolution (ω = 0.3866) and it was significantly different (P = 6.09E−12) from a model of neutral evolution. In contrast, Group 1 and 2 gene evolutions were indistinguishable from a model of neutral evolution (see Text S1).
Discussion There are two possible interpretations of our results. First, the IRGM gene is not functional in humans having lost its role in intracellular parasite resistance ~40 million years ago when the gene family experienced a contraction from a set of three tandem genes to a sole, unique member whose ORF was disrupted by an AluSc repeat in the anthropoid primate ancestor. In light of the detailed functional studies [11] and the recent associations of this gene with Crohn's disease [6],[7], we feel that this interpretation is unlikely. For example, McCarroll and colleagues recently demonstrated that a 20.1 kb deletion upstream of IRGM associates with Crohn's disease as well as the most strongly associated SNP and that the deletion haplotype showed a distinct pattern of IRGM gene expression consistent with its putative role in autophagy and Crohn's disease. An alternate scenario is that the IRGM gene became nonfunctional ~40 million years ago (leading to pseudogene copies in Old World and New World monkeys) but was resurrected ~20 million years ago in the common ancestor humans and apes (Figure 5
Methods Sequence Analyses We retrieved whole genome shotgun sequence of the IRGM locus for chimpanzee (Pan troglodytes), gorilla (Gorilla Gorilla), orangutan (Pongo pygmaeus), rhesus macaque (Macaca mulatta), marmoset (Callithrix jacchus), baboon (Papio hamadryas), and Gray Mouse Lemur (Microcebus murinus) from NCBI Trace Archive (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?) and constructed local sequence assemblies using PHRAP (http://www.phrap.org). We sequenced and confirmed the IRGM genome organization based on DNA samples from four different New World monkey species and from eleven different Old World monkey species. We also resequenced the IRGM gene in unrelated macaques (n = 5), baboons (n = 5), orangutans (n = 12) and gibbons (n = 7). For Microcebus murinus with multiple copies of IRGM, we first isolated large-insert BAC clones, subcloned and sequenced PCR amplicons corresponding to the different copies.FISH Experiments Metaphase spreads were obtained from lymphoblast or fibroblast cell lines from human (Homo sapiens), rhesus macaque (Macaca mulatta), marmoset (Calithrix jacchus) and lemur (Lemur catta). FISH was performed using either human IRGM probe WIBR2-3607H18 or lemur IRGM BAC DNA LB2-61D22, LB2-77B23 and LB-61A22, directly labeled by nick translation with Cy3-dUTP (Perkin-Elmer). Lemur BAC probes were obtained by library hybridization screening of a L. catta genomic library (CHORI Resources: LBNL-2 Lemur BAC Library [http://gsd.jgi-psf.org/cheng/LB2]). Expression Analyses Full-length human IRGM transcript was obtained by 5′RACE PCR followed by subcloning (PGEM-T easy) and sequencing (EU742619). RT-PCR experiments were performed using cDNA synthesized (Advantage RT-PCR, Clontech) from mRNA extracted (Oligotex isolation kit, Qiagen) from total RNA (RNA Easy, Qiagen). Total RNA was obtained from tissues isolated from chimpanzee, rhesus macaque, marmoset and human. IRGM splice variants were detected by a quantitative PCR assay using the LightCycler SYBR Green System (Roche) with primers IRGM (b)-(c)-(d) and IRGM all primers (Text S1). Transcript levels were normalized to the amount of the GAPDH and UBE1 transcript, which also served as positive controls for RT-PCR experiments. Phylogenetic Analyses We generated multiple sequence alignments using Clustal-W[18],[19] and constructed neighbor-joining phylogenetic trees (MEGA 3.1) [20]. Tests of selection (ω = dN/dS) were performed by maximum likelihood using PAML [13] applying the Sites Model [14] to calculate the percentage of codons under positive, neutral evolution or purifying selection and the Branch model [21] to estimate evolutionary pressures at different times during evolution. The Likelihood Ratio Test (LRT) was used to assess the significance of different values of ω for different groups.Figure S1 Amino acid alignment of the IRGM proteins. Protein sequence alignment of primate, dog and mouse IRGM shows close homology in N-terminal GTPase binding domain (G domain). Canonical GTPase motifs are indicated by red boxes. The sequences are edited to maintain the open reading frame of Cja, Rh, and Pph IRGM, which are considered to be pseudogenes (names are indicated in red color). Species names are indicated as: Hs (Homo sapiens), Ptr (Pan trogylodytes), Ggo (Gorilla gorilla), Ppy (Pongo pygmaeus), Rh (Rhesus macaque-Macaca mulatta), Cja (Callithrix jacchus), Pph (Papio hamadryas), IRGM7, IRGM8, IRGM9 Mmu (Microcebus murinus), IRGM4, IRGM5, IRGM6 (Dog IRGM GMS type GTPases), IRGM1, IRGM2, IRGM3 (Mouse IRGM GMS type GTPases). (0.09 MB PDF) Click here for additional data file.(85K, pdf) Figure S2 Alignment of the IRGM Alu repeat integration region. Blue highlighted sequence denotes the canonical splicing acceptor (based on murine gene model) with the red underlined sequence indicating the position of polypyrimidine tract. Green highlighted sequences correspond to the IRGM ORF. Alu integration site is indicated as red box (292 bp). Translation start site with preferred Kozak consensus sequence for Human IRGM is indicated as a green arrow. Stop codons in the ORF are indicated as red triangles. (0.09 MB PDF) Click here for additional data file.(85K, pdf) Figure S3 Phylogeny of IRGM. Phylogenetic reconstruction of IRGM related genes in different primate, dog and mouse species using the NJ method. Species names are indicated as: Mouse (Mus musculus domesticus), Dog (Canis familiaris), Gray mouse lemur (Microcebus murinus), Sbo (Saimiri boliviensis), Cge Marmoset (Callithrix geofroyi), Cmo (Callicebus moloch), Ppi (Pithecia pithecia), Mar (Macaca arctoides), Mni (Macaca nigra), Mmu Rhesus macaque (Macaca mulatta), Mfa (Macaca fascicularis), Pan (Papio hamadryas anubis), Pha Baboon (Papio hamadryas), Cce (Cercopithecus cephus), Cae (Cercopithecus aethiops), Pcr (Presbytis cristata), Cpo (Colobus polykomos), Cgu (Colobus guereza), Hga Gibbon (Hylobates gabriellae), Ppy Orangutan (Pongo pygmaeus), Ggo Gorilla (Gorilla gorilla), Ptr Chimpanzee (Pan troglodytes) and Hs Human (Homo sapiens). Shared stop codons for New World and Old World monkeys are highlighted in purple and blue respectively. Pseudogenes are highlighted in red. (0.47 MB PDF) Click here for additional data file.(456K, pdf) Figure S4 Alignment of the IRGM ERV9 region in (human, chimp, orangutan, macaque and marmoset). Red highlighted sequence denotes the ERV9 element. Yellow and green highlighted sequences correspond to the AluSc element and the IRGM ORF. Intron sequence is not included in this alignment indicated as red box (489 bp). Transcription start site (+1) indicated as green box. Stop codons in open reading frame are indicated as red triangles. Note the presence of a marmoset insertion sequence: (TAATGATAATTTCTAATCACTGCAAGAATCACATCACCTTCTTTGAATCAATCTCAAATACCTGGCCTGGTGGGAGCCAGGTTCTGCTCTTCTTCAAGG). (0.11 MB PDF) Click here for additional data file.(112K, pdf) Figure S5 Structural variation and IRGM mRNA expression levels. A) A schematic summarizing the location of a sequenced structural polymorphism with respect to the IRGM gene (see Figure S6). B) Relative fold expression of IRGM mRNA and proportion of splice variants were detected by real-time PCR. Expression data were first normalized against housekeeping gene UBE1 and then cross-compared using the heterozygote as the reference (GM15510 (I/D)). The figure shows the relative fold expression of GM18507 (I/I), GM18555 (D/D) and GM15510 (I/D). C) Relative fold expression of IRGM (B) detected by real-time PCR. The figure shows a two-fold expression difference between a lymphoblastoid cell line homozygous for the 20.1 kb insertion GM18507 (I/I) and cell line homozygous for the deletion GM18555 (D/D). (1.38 MB PDF) Click here for additional data file.(1.3M, pdf) Figure S6 Structural polymorphism 5′ upstream of the IRGM locus. A) A miropeats alignment comparing the human chromosome 5 reference sequence to a sequence from an alternate haplotype (AC207974 from HapMap individual NA18956). The alignment depicts a 20.1 kb deletion region 5′ upstream of the human IRGM. Arrow indicates the transcription start point within the ERV9 retroviral element. Green box represents IRGM open reading frame; red boxes indicate exons for adjacent MST150 gene. B) Array comparative genomic hybridization (aCGH) results for nine human DNA samples (four African and four non-African) against a reference genome DNA sample (NA15510). The analysis confirms a 20.1 kb deletion polymorphism (indicated as red dotted line) located at a distance of 2.82 kb 5′ to the IRGM transcription start site. The individual NA15510 is hemizygous (one copy) and is used as the reference in these experiments. (0.42 MB PDF) Click here for additional data file.(411K, pdf) Text S1 Supplementary note: Death and resurrection of the human IRGM gene. (0.61 MB PDF) Click here for additional data file.(597K, pdf) Acknowledgments We are grateful to H. Bouabe, G. Cooper and X. Ramnik for critical discussions during the preparation of this paper and to R. Uthaiah and R.M. Leonhardt for their help and encouragement in the beginning of this study. We are also indebted to M. Johnson, J. Horvath and J. Rogers (Southwest Foundation for Biomedical Research) for providing tissue and total RNA from human, chimp, marmoset, macaque and mouse lemur. Footnotes The authors have declared that no competing interests exist. This work was supported in part by Deutsche Forschungsgemeinschaft grant SFB680 to JCH and by NIH grants GM058815 and HG002385 to EEE. TM-B is supported by a Marie Curie fellowship. EEE is an investigator of the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. References 1. Boehm U, Guethlein L, Klamp T, Ozbek K, Schaub A, et al. Two families of GTPases dominate the complex cellular response to interferon-g. J Immunol. 1998;161:6715–6723. [PubMed] 2. Taylor GA. p47 GTPases: regulators of immunity to intracellular pathogens. Nature Reviews Immunology. 2004;4:100–109. 3. Shenoy AR, Kim BH, Choi HP, Matsuzawa T, Tiwari S, et al. Emerging themes in IFN-gamma-induced macrophage immunity by the p47 and p65 GTPase families. Immunobiology. 2007;212:771–784. [PubMed] 4. Howard J. The IRG proteins: A function in search of a mechanism. Immunobiology. 2008;213:367–375. [PubMed] 5. MacMicking J, Taylor GA, McKinney J. Immune control of tuberculosis by IFN-gamma-inducible LRG-47. Science. 2003;302:654–659. [PubMed] 6. Fisher SA, Tremelling M, Anderson CA, Gwilliam R, Bumpstead S, et al. Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease. Nat Genet. 2008 7. Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat Genet. 2007;39:830–832. [PubMed] 8. Bekpen C, Hunn JP, Rohde C, Parvanova I, Guethlein L, et al. The interferon-inducible p47 (IRG) GTPases in vertebrates: loss of the cell autonomous resistance mechanism in the human lineage. Genome Biol. 2005;6:R92. [PubMed] 9. Kaiser F, Kaufmann SH, Zerrahn J. IIGP, a member of the IFN inducible and microbial defense mediating 47 kDa GTPase family, interacts with the microtubule binding protein hook3. J Cell Sci. 2004;117:1747–1756. [PubMed] 10. Leipe DD, Wolf YI, Koonin EV, Aravind L. Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol. 2002;317:41–72. [PubMed] 11. Singh SB, Davis AS, Taylor GA, Deretic V. Human IRGM induces autophagy to eliminate intracellular mycobacteria. Science. 2006;313:1438–1441. [PubMed] 12. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453:56–64. [PubMed] 13. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–1591. [PubMed] 14. Yang Z. Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol. 2000;51:423–432. [PubMed] 15. Dunn CA, Mager DL. Transcription of the human and rodent SPAM1/PH-20 genes initiates within an ancient endogenous retrovirus. BMC Genomics. 2005;6:47. [PubMed] 16. Ling J, Pi W, Bollag R, Zeng S, Keskintepe M, et al. The solitary long terminal repeats of ERV-9 endogenous retrovirus are conserved during primate evolution and possess enhancer activities in embryonic and hematopoietic cells. J Virol. 2002;76:2410–2423. [PubMed] 17. McCarroll SA, Huett A, Kuballa P, Chilewski S, Landry A, et al. A 20-kilobase deletion polymorphism upstream of IRGM is associated with altered IRGM expression and Crohn's disease. Nature Genetics in press. 2008 18. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. [PubMed] 19. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31:3497–3500. [PubMed] 20. Kumar S, Tamura K, Nei M. MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput Appl Biosci. 1994;10:189–191. [PubMed] 21. Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998;15:568–573. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
J Immunol. 1998 Dec 15; 161(12):6715-23.
[J Immunol. 1998]Immunobiology. 2008; 213(3-4):367-75.
[Immunobiology. 2008]Science. 2003 Oct 24; 302(5645):654-9.
[Science. 2003]Nat Genet. 2007 Jul; 39(7):830-2.
[Nat Genet. 2007]Genome Biol. 2005; 6(11):R92.
[Genome Biol. 2005]Genome Biol. 2005; 6(11):R92.
[Genome Biol. 2005]J Cell Sci. 2004 Apr 1; 117(Pt 9):1747-56.
[J Cell Sci. 2004]J Mol Biol. 2002 Mar 15; 317(1):41-72.
[J Mol Biol. 2002]Science. 2006 Sep 8; 313(5792):1438-41.
[Science. 2006]Genome Biol. 2005; 6(11):R92.
[Genome Biol. 2005]Nature. 2008 May 1; 453(7191):56-64.
[Nature. 2008]Genome Biol. 2005; 6(11):R92.
[Genome Biol. 2005]Mol Biol Evol. 2007 Aug; 24(8):1586-91.
[Mol Biol Evol. 2007]J Mol Evol. 2000 Nov; 51(5):423-32.
[J Mol Evol. 2000]Science. 2006 Sep 8; 313(5792):1438-41.
[Science. 2006]Nat Genet. 2007 Jul; 39(7):830-2.
[Nat Genet. 2007]BMC Genomics. 2005 Apr 1; 6(1):47.
[BMC Genomics. 2005]J Virol. 2002 Mar; 76(5):2410-23.
[J Virol. 2002]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Nucleic Acids Res. 2003 Jul 1; 31(13):3497-500.
[Nucleic Acids Res. 2003]Comput Appl Biosci. 1994 Apr; 10(2):189-91.
[Comput Appl Biosci. 1994]Mol Biol Evol. 2007 Aug; 24(8):1586-91.
[Mol Biol Evol. 2007]J Mol Evol. 2000 Nov; 51(5):423-32.
[J Mol Evol. 2000]