![]() | ![]() |
Formats: |
||||||
Copyright © 2008 by the Genetics Society of America Two New Y-Linked Genes in Drosophila melanogaster *Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Caixa Postal 68011, CEP 21944-970, Rio de Janeiro, Brazil and †Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, 60637-1503 1These authors contributed equally to this work. 2Corresponding author: Departamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Caixa Postal 68011, CEP 21944-970, Rio de Janeiro, Brazil. E-mail: bernardo/at/biologia.ufrj.br Communicating editor: T. C. Kaufman Received January 8, 2008; Accepted May 22, 2008. This article has been cited by other articles in PMC.Abstract The Y chromosome and other heterochromatic regions present special challenges for genome sequencing and for the annotation of genes. Here we describe two new genes (ARY and WDY) on the Drosophila melanogaster Y, bringing its number of known single-copy genes to 12. WDY may correspond to the fertility factor kl-1. THE Y chromosome of Drosophila melanogaster is not involved in sex determination, but is essential for male fertility (Bridges 1916). Formal genetics identified six “fertility factors” on the Y of D. melanogaster: kl-1, kl-2, kl-3, and kl-5 on the long arm, and ks-1 and ks-2 on the short arm (Kennison 1981; Hazelrigg et al. 1982; Gatti and Pimpinelli 1983). On the other hand, molecular methods have identified up to this moment 10 protein-encoding single-copy genes (Gepner and Hays 1993; Carvalho et al. 2000, 2001; Carvalho and Clark 2003). The correspondence between these two sets of genes has been partially elucidated. The fertility factors kl-2, kl-3, and kl-5 encode three different axonemal dynein heavy chains, which are motor proteins responsible for the flagelar beating (Goldstein et al. 1982; Gepner and Hays 1993; Carvalho et al. 2000). The genes ORY and CCY map respectively to the same region of the fertility factors ks-1 and ks-2 and may correspond to them (Carvalho et al. 2001). Only the kl-1 fertility factor has not been identified (at least tentatively) at the molecular level. The heterochromatic state of the Drosophila Y creates particular difficulties for its study (Carvalho et al. 2003). Before the genome project of D. melanogaster, only one single-copy gene was known at the molecular level in the Y chromosome (Gepner and Hays 1993), and even after that the identification of additional genes on the Y required the development of appropriate computational methods (Carvalho et al. 2000, 2001, 2003), whereas thousands of new genes were immediately found in the euchromatin using standard methods (Adams et al. 2000). The main difficulty arises as a result of the limitations of whole genome shotgun sequencing in dealing with repetitive sequences: Y chromosome sequences are assembled in small fragments and usually are found among a large set of small unmapped scaffolds (typically, 2000–10,000). Furthermore, the different exons of the same gene frequently are scattered in different scaffolds because intronic repetitive DNA causes assembly failures (Carvalho et al. 2003). These fragments of genes are missed by gene annotation programs, and even when partially annotated their Y linkage cannot be inferred from the assembly itself. The bottom line is that the identification of Y-linked genes in whole genome shotgun projects requires computational methods that identify among the many unmapped scaffolds which ones are more likely to be Y linked. These selected scaffolds can then be experimentally tested for Y linkage, and the structure of the genes confirmed by RT–PCR (Carvalho et al. 2003; Krzywinski et al. 2006). One such computational method utilizes testis expressed sequence tags (ESTs). Briefly, as Y-linked genes usually have testis expression, unmapped scaffolds that are matched by ESTs from testis may be Y linked (Lahn and Page 1997; Carvalho et al. 2001). Previous searches with the first data set of ~3000 testis ESTs (Andrews et al. 2000) yielded one Y-linked gene (CCY) among five candidates (Carvalho et al. 2001). Since then, the number of deposited testis ESTs has risen to ~31,000 (Stapleton et al. 2002). Using this expanded data set, we report here the identification of three Y-linked genes (among five candidates; Table 1). One of them (BF502240/AE003124) turns out to encode the missing N terminus of the previously found Y-linked gene CCY (Carvalho et al. 2001; L. B. Koerich, A. G. Clark and A. B. Carvalho, unpublished results). The remaining two are novel Y-linked genes and are detailed below.
ARY (aldehyde reductase Y): The unmapped scaffold AE002861 was matched by the testis EST BE976842. PCR assays using male and female DNA as templates produced a band with male DNA only. Therefore, AE002861 is part of the Y chromosome. We have employed 5′ RACE and 3′ RACE to obtain the full sequence of the orthologs of ARY in other Drosophila species (L. B. Koerich, A. G. Clark and A. B. Carvalho, unpublished results). Using the D. willistoni gene (accession BK006428) we annotated the D. melanogaster gene, which was fully contained in the improved scaffold CP000212.2 (the EST BE976842 was lacking the C terminus). The full D. melanogaster ARY gene (accession BK006427) encodes a protein of 360 amino acids. It has been partially annotated before as CG40064, but mistakenly mapped to the heterochromatin of chromosome 3L (Hoskins et al. 2007; Table 1). ARY is a potential member of the aldo-keto reductase gene family, which is involved in conversion of glucose to fructose, and in the inactivation of cytotoxic metabolites (Hyndman et al. 2003). Interestingly, at least in mammals, fructose is an energy source for spermatozoa. In addition, the spermatozoa membrane has a higher content of polyunsaturated fatty acids, which through peroxidation generates 4-HNE, a toxic product that is inactivated by aldo-keto reductases (Kobayashi et al. 2002). Another hint on the possible function of ARY is that the aldo-keto reductase ARK1B7 is expressed at very high levels in the male reproductive tract in the mouse (Baumann et al. 2007). Knockout of ARK1B7 seems to cause partial sperm impairment, although the animals have normal fertility (Tables 3 and 4 in Baumann et al. 2007). Hence, it seems that ARY is another case of a male-specific gene recruited by the Drosophila Y. Using the methods described in Carvalho et al. (2000) we found that ARY is located in the kl-2 region. Since this region is known to contain one gene essential for male fertility (the kl-2 dynein heavy chain; Carvalho et al. 2000) and because the available evidence strongly suggests that each region contains only one essential gene (Kennison 1981), ARY probably is not essential for male fertility. We have previously reported on other nonessential genes in the D. melanogaster Y, such as PPrY and PRY (Carvalho et al. 2000, 2001). WDY (WD40 Y): WDY was identified by a match between the unmapped scaffolds AE003005 and AE003380, and the testis EST BF489509. After we identified the WDY gene, the EST was fully sequenced (accession BT021451). This sequence still lacks the N terminus; we identified the full WDY coding sequence in D. melanogaster (accession BK006449) by comparison with the D. ananassae ortholog (accession EU362855). The WDY coding sequence spans four scaffolds (AE003109, AE003005, AE003380, and AE002858), and in each of them a separate gene was misannotated (CG40245, CG40583, CG40551, and CG41020; also CG40449). The full protein has 991 amino acids and has clear orthologs in Anopheles, Aedes, and Tribolium (accession nos. XP_563902, XP_001655383, and XP_970543, respectively). None of these orthologs has assigned functions. There are no similar proteins in the D. melanogaster genome, except for the small CG34164 (106 amino acids; 60% of identity), and the only hint regarding the function of WDY is the presence of at least three WD40 domains. This domain is found in a number of eukaryotic proteins and is involved in a wide variety of functions including adaptor/regulatory modules in signal transduction, pre-mRNA processing, and cytoskeleton assembly (Smith et al. 1999; Li and Roberts 2001). Mapping as described in Carvalho et al. (2000) showed that WDY is fully contained in the kl-1 region (both the N and the C terminus), and hence WDY may correspond to this fertility factor. RT–PCR showed that WDY has a strong expression in testis and a very weak expression in the soma (head plus thorax; data not shown), whereas microarray experiments showed that it (as well as ARY) is overexpressed in late spermatogenesis (after meiosis; M. D. Vibranovski, H. F. Lopes, T. L. Karr and M. Long, unpublished data). Therefore, it is likely that WDY has a male-related function. It remains to be determined which function this is and whether or not WDY is the kl-1 gene. The identification of ARY and WDY raises the number of known single-copy protein-encoding genes of the D. melanogaster Y chromosome to 12 (Gepner and Hays 1993; Carvalho et al. 2000, 2001; Carvalho and Clark 2003). This brings the question of how many more genes remain to be found. It is worth pointing out that the total gene number is uncertain even in the euchromatic portion of the Drosophila genome (Hild et al. 2004) and that the Y chromosome has the additional difficulties of being heterochromatic (Hoskins et al. 2007; Smith et al. 2007), as well as having low shotgun coverage (approximately threefold, which causes sequence gaps; Carvalho et al. 2003). Furthermore, we certainly have not exhausted all candidates and methods (e.g., not all genes are represented in ESTs databases, due to low expression, etc.). Thus, a definitive answer is not possible at this moment. On the other hand, we have been searching for Y-linked genes in D. melanogaster since 2000, and the diminishing returns of our searches employing a variety of methods suggest that there are not many more to be found. The CCY gene was independently identified three times (two with testis ESTs and one with the “low-shotgun depth” method; Carvalho et al. 2003), which again suggests that we are approaching saturation. Finally, now all six fertility factors have been identified (at least tentatively) at the molecular level. Though it is difficult to translate these arguments into a number, we would be surprised if there are substantially >20 protein-encoding genes in the D. melanogaster Y. For the future, comparison with the other sequenced Drosophila species (Drosophila 12 Genomes Consortium 2007; L. B. Koerich, A. G. Clark and A. B. Carvalho, unpublished data), progress being made at the Drosophila Heterochromatin Genome Project (Hoskins et al. 2007; Smith et al. 2007), and perhaps high throughput methods for the identification of Y-linked scaffolds, are likely to yield a more precise answer. Acknowledgments We acknowledge Roger Hoskins and J. Roman Arguello for valuable comments and the funding from Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro, Fundação Universitária José Bonifácio-UFRJ, Conselho Nacional de Desenvolvimento Científíco e Tecnológico, Coordenacao de Aperfeicoamento do Pessoal de Ensino Superior, Fogarty International Center–National Institutes of Health (NIH) (grants TW005673-01A1 and TW007604-02), and NIH (grant GM64590). M.D.V. was also supported by the Pew Latin America Fellowship Program. Notes References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||
Genetics. 1916 Jan; 1(1):1-52.
[Genetics. 1916]Genetics. 1981 Jul; 98(3):529-548.
[Genetics. 1981]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11132-6.
[Proc Natl Acad Sci U S A. 1993]Proc Natl Acad Sci U S A. 2000 Nov 21; 97(24):13239-44.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2001 Nov 6; 98(23):13225-30.
[Proc Natl Acad Sci U S A. 2001]Genetica. 2003 Mar; 117(2-3):227-37.
[Genetica. 2003]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11132-6.
[Proc Natl Acad Sci U S A. 1993]Proc Natl Acad Sci U S A. 2000 Nov 21; 97(24):13239-44.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2001 Nov 6; 98(23):13225-30.
[Proc Natl Acad Sci U S A. 2001]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Proc Natl Acad Sci U S A. 2001 Nov 6; 98(23):13225-30.
[Proc Natl Acad Sci U S A. 2001]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Genome Biol. 2002; 3(12):RESEARCH0085.
[Genome Biol. 2002]Science. 2007 Jun 15; 316(5831):1625-8.
[Science. 2007]Proc Natl Acad Sci U S A. 2001 Nov 6; 98(23):13225-30.
[Proc Natl Acad Sci U S A. 2001]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Genome Biol. 2002; 3(12):RESEARCH0085.
[Genome Biol. 2002]Science. 2007 Jun 15; 316(5831):1625-8.
[Science. 2007]Science. 2007 Jun 15; 316(5831):1625-8.
[Science. 2007]Chem Biol Interact. 2003 Feb 1; 143-144():621-31.
[Chem Biol Interact. 2003]J Androl. 2002 Sep-Oct; 23(5):674-83.
[J Androl. 2002]Reproduction. 2007 Jul; 134(1):97-109.
[Reproduction. 2007]Proc Natl Acad Sci U S A. 2000 Nov 21; 97(24):13239-44.
[Proc Natl Acad Sci U S A. 2000]Trends Biochem Sci. 1999 May; 24(5):181-5.
[Trends Biochem Sci. 1999]Cell Mol Life Sci. 2001 Dec; 58(14):2085-97.
[Cell Mol Life Sci. 2001]Proc Natl Acad Sci U S A. 2000 Nov 21; 97(24):13239-44.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11132-6.
[Proc Natl Acad Sci U S A. 1993]Proc Natl Acad Sci U S A. 2000 Nov 21; 97(24):13239-44.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2001 Nov 6; 98(23):13225-30.
[Proc Natl Acad Sci U S A. 2001]Science. 2007 Jun 15; 316(5831):1625-8.
[Science. 2007]Science. 2007 Jun 15; 316(5831):1586-91.
[Science. 2007]