![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2003, Cold Spring Harbor Laboratory Press Extensive Exon Reshuffling Over Evolutionary Time Coupled to Trans-Splicing in Drosophila Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA 1Corresponding author.E-MAILcorces/at/jhu.edu; FAX (410)516-5456. Received April 15, 2003; Accepted August 4, 2003. This article has been cited by other articles in PMC.Abstract The relative position of exons in genes can be altered only after large structural mutations. These mutations are frequently deleterious, impairing transcription, splicing, RNA stability, or protein function, as well as imposing strong inflexibility to protein evolution. Alternative cis- or trans-splicing may overcome the need for genomic structural stability, allowing genes to encode new proteins without the need to maintain a specific exon order. Trans-splicing in the Drosophila melanogaster modifier of mdg4 (mod[mdg4]) gene is the best documented example in which this process plays a major role in the maturation of mRNAs. Comparison of the genomic organization of this locus among several insect species suggests that the divergence between the lineages of the mosquito Anopheles gambiae and D. melanogaster involved an extensive exon rearrangement, requiring >11 breakpoints within the mod(mdg4) gene. The massive reorganization of the locus also included the deletion or addition of a new function as well as exon duplications. Whereas both DNA strands are sense strands in the Drosophila gene, the coding region in mosquito lays in a single strand, suggesting that trans-splicing may have originated in the Drosophila lineage and might have been the triggering factor for such a dramatic reorganization. Splicing joins exons after the removal of introns from pre-mRNA sequences to produce a mature mRNA molecule that can be translated into a protein. Alternative splicing is a widespread and well-characterized splicing mechanism consisting of the variable removal of introns from the precursor mRNA to produce different mature RNAs encoding functionally different proteins from a single transcription unit (Maniatis and Tasic 2002). Because this process enables the production of several proteins from a single gene sequence, alternative splicing contributes significantly to cell protein diversity among eukaryotes (Maniatis and Tasic 2002; Modrek and Lee 2002; Sullenger and Gilboa 2002; Tasic et al. 2002). A particular variation of splicing that may also contribute to generate protein diversity is trans-splicing. This process requires the joining of exons from two independently transcribed pre-mRNAs to form a single mature transcript, potentially increasing the putative combinations of exons able to generate novel proteins (Tasic et al. 2002). The most common form of trans-splicing is found in trypanosomes and Caenorhabditis elegans; in these organisms, trans-splicing results in the addition of a noncoding exon known as spliced leader (SL) to the 5′ end of the mRNA. SL trans-splicing, despite its frequency, does not contribute to protein diversity in the cell, because the SL exon is common to all trans-splicing events and lacks coding capabilities (Nilsen 2001). Alternative trans-splicing, on the other hand, involves the association of coding exons from independent mRNAs, making possible the acquisition of new functions by exploiting the combination of unrelated gene transcripts (Tasic et al. 2002). In vivo and in vitro evidence has revealed that alternative trans-splicing actually occurs in mammalian cells and may be a common theme among eukaryotes (Eul et al. 1996; Caudevilla et al. 2001a,b), although the only reported functional major protein apparently originated by trans-splicing so far is encoded by the Drosophila mod(mdg4)gene (Dorn et al. 2001; Labrador et al. 2001; Mongelard et al. 2002; Pirrotta 2002). mod(mdg4) is a complex locus encoding >25 different mRNAs with protein products that are believed to be involved in the regulation of higher-order chromatin structure (Dorn et al. 1993; Gerasimova et al. 1995; Buchner et al. 2000). All mod(mdg4) mRNAs share the first four exons, which encode a BTB domain, and differ in the fifth and sixth exons encoding the variable C terminus of the protein (Gerasimova et al. 1995; Buchner et al. 2000; Dorn and Krauss 2003). The first indication of a requirement for trans-splicing in the generation of Mod(mdg4) proteins came after the realization that the two DNA strands of the gene have coding capabilities and contain coding sequences present in mature mRNAs that are translated into functional proteins (Labrador et al. 2001). Further analysis of the encoded products of the mod(mdg4) gene revealed that as many as seven out of 27 mRNAs are encoded by the complementary DNA strand (Dorn et al. 2001; Dorn and Krauss 2003). The finding that single molecules of mRNA could originate from independent mod(mdg4) transgenes located in different chromosomal positions or from two trans-heterozygous mutant alleles (Dorn et al. 2001; Mongelard et al. 2002) was further evidence supporting the involvement of trans-splicing in the maturation of mod(mdg4) mRNAs and discarded alternative hypothesis such as the existence of somatic DNA rearrangements of the locus. Because potentially all eukaryotic cells have the capability of performing trans-splicing, it is surprising that so far only one well-characterized example of trans-spliced mRNAs has been found. One can argue that the absence of additional examples of trans-splicing, even after the sequencing of multiple eukaryotic genomes, suggests that the mechanism is not biologically relevant. However, the annotation of mod(mdg4) by the Drosophila genome project failed to detect the involvement of trans-splicing in the maturation of the mod(mdg4) encoded mRNAs, even though a wealth of information was already known about the gene and its transcripts. Therefore, it is still possible that trans-splicing is not uncommon in eukaryotic cells, and only after a thorough genomic and proteomic analysis, will we have a full picture of the relevance of this process in the generation of protein diversity. Alternatively, it is also possible that trans-splicing occurs only rarely and has thus remained elusive to experimental detection. In either case, gaining further insights into the mechanisms of trans-splicing and understanding how it can be experimentally induced to obtain a specific mRNA encoding a predicted combination of exons may be of particular interest for the correct interpretation of genomic data, for the development of in vivo molecular tools, or for the improvement of gene therapy technology. To gain insights into the mechanism of trans-splicing and into how this process originated and was maintained at a specific gene, we asked the question of how trans-splicing evolved at the mod(mdg4) locus and what was the impact of this process on the structure of the gene during the course of evolution. To do so, we have compared the structure of the mod(mdg4) locus from D. melanogaster with that of D. pseudoobscura and the mosquito A. gambiae. D. melanogaster and D. pseudoobscura belong to the same Sophophora subgenus, with an estimated phylogenetic divergence of 25 million years (Russo et al. 1995), whereas A. gambiae is evolutionarily separated from Drosophila by 250 million years (Gaunt and Miles 2002). By using BLASTN, tBLASTP, and tBLASTN algorithms (Altschul et al. 1990), we have found sequences homologous to mod(mdg4) in the genome of both D. pseudoobscura and A. gambiae. The comparative analysis of mod(mdg4) sequences shows that the two Drosophila species share exactly the same structure of the locus. In A. gambiae, however, the mod(mdg4) locus differs remarkably from the one in Drosophila, with all exons located in a single strand of the DNA. The changes in the structure of the gene indicate that a massive rearrangement occurred during the divergence of the two genera, involving a large number of breakpoints within the sequences of the locus. The data reveal that the maturation and processing of mod(mdg4) mRNAs may have changed dramatically in the course of the independent evolution of the Anopheles and Drosophila lineages and supports the suggestion that trans-splicing plays an important role in these processes and probably in the establishment of the structural differences between both lineages. RESULTS The Structure of the mod(mdg4) Gene Is Conserved Between D. melanogaster and D. pseudoobscura The presence of coding exons in both DNA strands along the ~30 kb of the D. melanogaster mod(mdg4) locus suggests that trans-splicing, and therefore the organization of the gene, may play a role in the post-transcriptional regulation of the gene. The recent publication of the mosquito A. gambiae (Holt et al. 2002) and the D. pseudoobscura (Human Genome Sequencing Center, Baylor College of Medicine; http://hgsc.bcm.tmc.edu/projects/drosophila/update.html, unpubl.) complete genome sequences provides a unique opportunity to test this hypothesis by determining the conservation in the structure and organization of the mod(mdg4) gene through evolution. To search for sequences homologous to mod(mdg4) in the genome of both species, we first used the BLASTN algorithm individually by using the constant region encoding for the BTB domain of the D. melanogaster mod(mdg4) mRNAs as a query. These searches gave a positive match only for the D. pseudoobscura genome. At the time of this analysis (whole genome assembly as January 13, 2003), only contig 4540 of the D. pseudoobscura genome, spanning 17,700 bp, contained DNA sequences homologous to D. melanogaster. To elaborate a map of the gene in D. pseudoobscura, we proceeded to identify exon-coding sequences homologous to the D. melanogaster mod(mdg4) locus by using the BLASTX algorithm and contig 4540 as a query (Fig. 1
Characterization of the mod(mdg4) Locus in A. gambiae Because the lineages of D. melanogaster and A. gambiae split from a common ancestor >250 million years ago, comparing the structure of the mod(mdg4) locus between these two species may provide additional insights into the biological significance of the intricate structure found in the Drosophila gene. Because of the large amount of divergence between the two species, the BLASTN algorithm was not capable of finding significant homologies at the DNA level when the constant region encoding for the BTB domain of the D. melanogaster mod(mdg4) mRNAs was used as query. Instead, we found multiple sequences with statistically significant scores by using D. melanogaster mod(mdg4), amino acid sequences as query in a tBLASTN search against the mosquito genome. When compared with Drosophila, the best-conserved A. gambiae sequences correspond to the second, third, and fourth exons of the gene, which contain the mod(mdg4) BTB coding sequence common to all mod(mdg4) mRNAs so far characterized (see Fig. 3 Highly significant homologies were also found multiple times along a sequence spanning >40,000 bp downstream of the fourth exon of the mosquito mod(mdg4) gene when the variable region of each of the 27 mod(mdg4) mRNAs was used independently as query. Examination of these sequences showed that they belong to the same variable exons encoding the zinc finger-like motif also present in Drosophila. Because the identities between the amino acid sequences encoding this motif were not as high as those observed for the two Drosophila species, it was impossible to distinguish in a reliable manner paralogous from orthologous associations between exons of the A. gambiae and D. melanogaster mod(mdg4) genes. To identify true orthologous exons between these two species, we decided to perform a phylogenetic analysis including all mod(mdg4)sequences encoding this motif in A. gambiae and D. melanogaster. To perform such an analysis, we first decided to saturate the search for homologous sequences in the mod(mdg4) locus of the two species. Only after saturation we can be certain that we are taking into account all the coding sequences present in each gene, and therefore, we will be able to establish a phylogenetic relationship between them. We used the tBLASTN algorithm by using a composite sequence containing a tandem array of all variable sequences from the D. melanogaster mod(mdg4) gene encoding the zinc finger-like motif as sequence 1. The intervening sequences between known exons from D. melanogaster and between exons previously found by tBLASTN searches in A. gambiae were used as sequence 2 in this search. By using this approach, we were able to find five previously nondescribed exons also encoding a zinc finger-like domain in D melanogaster. We are confident of the significance of this result because four of these sequences were also found in the D. pseudoobscura gene (the fifth is located in the region for which there is no available sequence). With these additional sequences, the number of putative proteins encoded by the mod(mdg4) gene in D. melanogaster is 33. After the same type of analysis, the number of putative alternative splicing products identified in A. gambiae is 35. Figure 2
By using the multiple alignment shown in Figure 2
Extensive Exon Rearrangements Are Necessary to Explain the Structural Differences Between A. gambiae and D. melanogaster in the mod(mdg4) Locus Figure 3 Taking into account only exons connected by lines (orthologous exons), one can estimate a minimum number of breakpoints necessary to go from the gene structure in one species to that in the second. We considered that at least one breakpoint was required to explain how two consecutive exons in one lineage are not consecutive in the other lineage, interpreting this discontinuity as a rearrangement that altered the exon ordering between the two lineages. For example, Ag 54975 and Ag 53732 are two consecutive exons in the A. gambiae gene, but their orthologs Dm 64.2 and Dm 1.8 in D. melanogaster are separated by eight additional exons (considering only exons for which orthologs were found in the phylogenetic analysis). The different order observed in each lineage suggests that at least one breakpoint (but probably more) occurred between Ag 54975 and Ag 53732 to give rise to the exon order observed in D. melanogaster. There are 14 pairs of consecutive exons orderly aligned in the A. gambiae mod(mdg4) gene, 11 of which are not adjacent to each other in D. melanogaster. This observation indicates that a minimum of 11 breakpoints are necessary to go from one arrangement to the other. Comparison of these results with those described above for D. pseudoobscura suggests that the bulk of rearrangements in the mod(mdg4) gene occurred prior to the split between the D. pseudoobscura and D. melanogaster lineages. An important question raised by these observations is whether the structural changes in the locus occurred in concert with changes in the encoded proteins. A search of the A. gambiae EST library by using the 40,000-bp DNA sequence spanning the mod(mdg4) locus as a query suggests that the genes from both species apparently encode for the same mRNAs. Figure 3 The presence of mod(mdg4)2.2 and other coding sequences in both DNA strands of the Drosophila gene was probably induced by recurrent inversions over the ancestral form of the gene. After an inversion dragged coding sequences from one DNA strand to the complementary strand, for example, affecting mRNAs such as mod(mdg4)2.2, transcription could no longer be driven by using the original promoter of the mod(mdg4)gene, located in the 5′ region on the opposite strand. One possible explanation to account for the lack of deleterious effects due to single inversions is the presence of promoter elements adjacent to many or all individual 3′ exons. In support of this hypothesis, it has been previously suggested that one of the mod(mdg4)2.2 transcripts involved in trans-splicing is transcribed from a predicted promoter located 5′ of the sequence in the complementary strand of the gene (Labrador et al. 2001). Evidence for multiple promoters along the mod(mdg4) gene in Drosophila was also found when transgenes containing only the last exon of the mod(mdg4) 55.1 transcript were able to transcribe in the absence of a known promoter (Dorn et al. 2001). A possible test of the hypothesis suggesting that multiple promoters can drive transcription along both DNA strands of the gene would be to search for ESTs homologous to the C-terminal region of the different mod(mdg4) proteins in the Drosophila Gene Collection 1, a mRNA collection that was obtained by selecting for full-length mRNAs (Stapleton et al. 2002). Transcripts SD11801 and SD03001 (Fig. 3 Additional Structural Changes Occurred During the Evolution of the mod(mdg4) Locus in the Drosophila and Anopheles Lineages The similarity between amino acid sequences encoded by paralogous exons that duplicated after the split of two lineages from a common ancestor should be higher than between any other sequence in a phylogenetic tree, including true orthologs. The phylogenetic analysis in the previous sections also revealed that several exon duplications occurred in the mod(mdg4) locus after the split of the Drosophila and the Anopheles lineages. This result suggests that through this mechanism, the mod(mdg4) gene might have acquired additional and probably different new functions in each lineage. In addition to inverted DNA segments and exon duplications, the mod(mdg4) gene also acquired new properties by addition or deletion of functions after incorporating (or removing from the mosquito gene) an exon encoding a BED finger domain, as is the case for the mod(mdg4)65.0 mRNA. This transcript is found only in D. melanogaster and encodes a protein containing a BTB domain plus a BED finger domain. The BED finger domain is believed to function by binding DNA and is found in DNA transposases and other DNA binding proteins, such as the Drosophila gene stand still (Aravind 2000). In addition, significant homologies for the C-terminal part of the mod(mdg4)46.3 mRNA were not found in the Anopheles gene. This finding suggests the possibility that in mosquito, sequences similar to mod(mdg4)46.3 may exist but have a divergent function and can not be recognized from sequence comparisons. Evolution of the mod(mdg4) Locus at the Chromosome Level One of the questions arising from the comparison of the structure of the mod(mdg4) locus among different insect species is whether the mechanisms responsible for the large amount of local rearrangements observed within the locus are different from those causing rearrangements at the chromosome level. The detailed information obtained from the Drosophila and Anopheles genome projects provides an exceptional opportunity to map genes accurately in chromosomes without the need for genetic or in situ hybridization data (Zdobnov et al. 2002). To test whether the chromosomal region of the mod(mdg4) gene was particularly active in generating chromosomal rearrangements during the divergence of the two lineages, we compared the chromosomal map of mod(mdg4) and neighboring genes from the two species. Figure 4
DISCUSSION All mod(mdg4) mRNAs consist of four constant exons encoding a BTB domain plus one or two variable exons, in most cases encoding a zinc finger-like domain that is present in >30 exons of the gene (Dorn et al. 1993). In addition to the structural complexity of the locus, trans-splicing has also been invoked to explain the existence of some mod(mdg4) mRNAs (Dorn et al. 2001; Labrador et al. 2001; Mongelard et al. 2002). We have analyzed the mod(mdg4) locus in the genomes of A. gambiae and D. pseudoobscura, two species related to D. melanogaster. The sequence data used for these two species were originated by unfinished genome shotgun assemblies and may therefore contain errors. However, our results show a perfect alignment of D. melanogaster sequences with those of D. pseudoobscura in both DNA strands and a conservation of most exons in the A gambiae gene, conventionally oriented in a single DNA strand. Both results suggest that the genome fragments used in this study are accurately assembled. The genomic approach used to compare the structure of the mod(mdg4) locus between phylogenetically related species has provided valuable information on the evolution and the origin of the trans-splicing process associated with the maturation of several mod(mdg4) mRNAs. In addition, the results show that the comparative analysis of genome sequences could be efficiently used to identify new potential examples of trans-splicing. This and previous reports have shown that a large number of mod(mdg4) variable exons are found in both DNA strands of the D. melanogaster gene and that this distribution requires alternative trans-splicing to explain the presence of hybrid mRNAs and the encoded proteins in the cell (Dorn et al. 2001; Labrador et al. 2001). Similar trans-splicing events have been described elsewhere (Eul et al. 1996; Caudevilla et al. 2001a,b). What makes trans-splicing of the Drosophila mod(mdg4) gene unique compared with other described examples is that the protein encoded by the hybrid mod(mdg4)2.2 mRNA is a major protein with a functional role as a component of the gypsy insulator (Gerasimova et al. 1995; Gerasimova and Corces 1998). The significance of the structure of the gene for the function of the encoded proteins is evident from the conservation of the same structure, likely by natural selection, for >25 million years in two independent lineages leading to the D. pseudoobscura and D. melanogaster species. It is not clear from our results, however, whether trans-splicing is important for the regulation of expression of the proteins encoded by the mod(mdg4) gene in A. gambiae. In this species, all variable exons are found in the same DNA strand as the constant exons, implying that all mRNAs can be generated by cis-splicing, by trans-splicing, or by both mechanisms. Whether the contribution of trans-splicing to the pool of mRNAs encoded by the mod(mdg4) gene in mosquito is significant can only be explored experimentally. However, when similar exon arrangements are found in other complex genes, such as the Drosophila Dscam or the protocadherin genes in the mouse, cis-splicing apparently accounts for the presence in the cell of all functional alternative variants (Schmucker et al. 2000; Tasic et al. 2002). Interestingly, like in mod(mdg4), the mouse protocadherin α, β, and γ genes encode multiple proteins consisting of common and variable regions, the latter encoded by a number of variable exons. Experimental evidence suggests that transcription of the gene systematically produces trans-spliced mRNAs involving premature RNAs transcribed from different promoters located at the 5′ of the variable exons. However, the level of these trans-spliced RNAs is so low that it is difficult to picture a biological role for the encoded proteins (Tasic et al. 2002). It is possible that the same is true for the mosquito mod(mdg4) gene, in which trans-splicing, similarly to the protocadherin α, β, and γ genes, may be occurring apparently without any biological significance, hence explaining the orderly arrangement of all exons in a single DNA strand. The presence of putative promoters along the sequence of the Drosophila gene suggests that transcripts may be produced at different transcription start sites in the 5′ of sequences encoding the variable region of the protein. These transcripts will later trans-splice with the mRNAs encoding the common region. Increasing evidence suggests that small RNAs transcribed by the complementary strand of genes may have an important regulatory role in gene transcription (Allshire 2002). Interestingly, putative small RNAs originally involved in the regulation of transcription of the gene may also be the source for the origins of transcription in the opposite strand, necessary to explain trans-splicing in D. melanogater. The same presence of such RNAs transcribed from the opposite strand raises the question of how mod(mdg4) escapes or benefits from the silencing presumably induced by RNAi. According to this scenario, the ancestral mod(mdg4) organization would be similar to that of A. gambiae, and the derived organization will correspond to the one observed in Drosophila, with the bulk of rearrangements occurring only in the Drosophila lineage prior to the split between D. pseudoobscura and D. melanogaster. The significance of trans-splicing in the ancestral organization would be similar to that observed in the protocadherin α, β, and γ genes and will only gain biological relevance in the Drosophila lineage, concomitantly with the emergence of DNA rearrangements. Unfortunately, there are no data available at the moment to test this hypothesis by determining the organization of the locus in an ancestor common to both lineages. Although biologically possible, the alternative hypothesis, that is, the last common ancestor between Drosophila and Anopheles shared the same kind of gene organization as Drosophila, is less parsimonious because it requires that most rearrangements in the mosquito lineage arose toward the perfect alignment of the exons in a single DNA strand. The high number of sequence rearrangements and the subsequent structural stability observed within the D. pseudoobscura and D. melanogaster lineages suggest that the reorganization of the locus may have occurred under the control of positive selection that may have reinforced the role of trans-splicing in the maturation of mod(mdg4) mRNAs, perhaps adding a new level of regulation to the expression of the gene. One of the most remarkable findings of this work is the large number of breakpoints within mod(mdg4) required to explain the evolution of the gene. A rough estimate of the rate of chromosome breakage and fixation of ~1.2 sequence disruptions per million years per Mb can be obtained if we consider that at least 11 breakpoints were produced in a DNA sequence encompassing ~40 kb during a time lapse of 250 million years. This number is surprisingly large considering that the rearrangements took place within the transcribed region of a gene and that Drosophila has the highest rate of chromosomal evolution reported so far, with an estimated number of sequence disruptions per Mb per million years of only 0.066 to 0.05 (Ranz et al. 2001). We have tested the possibility that transposable elements could be involved in the generation of these rearrangements and concluded that no evidence or traces of such repetitive sequences can be found in the current sequence of the locus in the three species studied (data not shown). We cannot rule out, however, that these sequences may have been eliminated during evolution due to the rapid turnover at which some transposable elements are subject to in Drosophila (Petrov et al. 2000). An alternative possibility is that the number of chromosome breaks described in the literature has traditionally been obtained based on in situ hybridization analysis and without genome sequence data. The numbers thus derived might be an underrepresentation of the total breakage rate, because small inversions may be undetectable by the low resolution of this technique. Interestingly, this is the case in Saccharomyces cerevisiae, in which frequent small inversions found in the genome will go undetected by alternative large-scale detection methods. With a size of 14 Mb, a total of 1100 small single gene inversions are necessary to explain the differences in gene arrangement observed between the S. cerevisiae and Candida albicans genomes. Considering that the divergence between the two species is ~140 million years, an estimated rate of 1.2 sequence disruptions per Mb and million years is necessary to account for such reorganization (Seoighe et al. 2000). A similar magnitude of rearrangement rates of ~0.4 to 1.0 chromosomal breakages per Mb per million years has been found when partial regions of the genomes of Caenorhabditis elegans and Caenorhabditis briggsae were compared (Coghlan and Wolfe 2002). This rate is at least four times that of the previously reported rate for Drosophila and is comparable to what we have observed within the mod(mdg4) locus. An analysis of small inversions inducing differences in gene orientation between genomes of closely related species of yeast suggests that such inversions are in fact small gene duplications followed by differential sequence degeneration (Fischer et al. 2001). Considering that our data show that exon duplications are frequent in mod(mdg4), a similar mechanism could at least partially explain some of the rearrangements that took place during the evolution of this gene. Our findings strongly support that trans-splicing plays a role in the maturation and probably in the regulation of the abundance of specific isoforms of mod(mdg4) mRNAs in Drosophila. Trans-splicing and its possible regulatory role may have evolved in the mod(mdg4) locus under selective pressure, probably to regulate the levels of the different encoded proteins. For example, evidence suggests that the mod(mdg4)2.2 protein is one of the more abundant isoforms in the cell (Gerasimova et al. 1995; Buchner et al. 2000; Mongelard et al. 2002). Interestingly, the C terminus of this mRNA is encoded by the complementary strand of the gene (Labrador et al. 2001), and one may argue that the rearrangement of the gene and the concomitant trans-splicing favored the production of this particular protein to the detriment of other proteins encoded by the gene. This process involved the generation of rearrangements that continuously reshuffled the variable exons, alternatively placing coding sequences in both DNA strands of the gene. It is possible that short duplications and small rearrangements constantly occur in the genome because of mistakes during replication or during double-strand break repair and are thereafter eliminated from the population by negative selection. Only when the rearrangement provides a benefit for the cell, the new sequence order may be positively selected. Continuous sequence rearrangements in addition to trans-splicing could be exploited by the cell to develop new and intriguing ways to control gene expression or to generate new functions by combining into a single mRNA exons derived from unrelated proteins. METHODS All sequences used in this work were obtained from the Drosophila and A. gambiae Genome projects (Adams et al. 2000; Celniker et al. 2002; Holt et al. 2002) through GenBank, except for D. pseudoobscura mod(mdg4), which was obtained directly from the whole genome assembly as of January 13, 2003 (Human Genome Sequencing Center at Baylor College of Medicine). A new assembly was made available on February 27, 2003, in which a contig73 completely overlaps with contig 4540 used in this study with a difference in only a few bases. Homology searches were performed by using BLASTN, tBLASTN, BLASTX and BLAST algorithms at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST/). Multiple alignments and bootstrap neighbor-joining tree were performed by using ClustalX (Thompson et al. 1997). Maximum parsimony tree was performed by using protpars and Seqboot from the Phylogeny Inference Package PHYLIP (Felsenstein 1989). The maximum likelihood tree was performed by using PROTML from the Molecular Phylogeny Package MOLPHY 2.3b3 (Adachi 1995) at the server of the Pasteur Institute (http://bioweb.pasteur.fr/seqanal/interfaces/prot_nucml.html). The D. melanogaster, D. pseudoobscura, and A. gambiae mod(mdg4) maps were elaborated with the assistance of the nucleic acid and protein sequence analysis package Omiga 1.1.3 (Oxford Molecular Ltd). D pseudoobscura and A. gambiae and new exons from D. melanogaster described here were named after the first nucleotide position, as described in the figure legends. Acknowledgments We thank Dr. F. Mongelard for valuable discussions on the data presented in this manuscript. Work reported here was supported by U.S. Public Health Service Award GM35463 from the National Institutes of Health. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact. Footnotes [Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1440703. References
WEB SITE REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Nature. 2002 Jul 11; 418(6894):236-43.
[Nature. 2002]Nat Genet. 2002 Jan; 30(1):13-9.
[Nat Genet. 2002]Nature. 2002 Jul 11; 418(6894):252-8.
[Nature. 2002]Mol Cell. 2002 Jul; 10(1):21-33.
[Mol Cell. 2002]Trends Genet. 2001 Dec; 17(12):678-80.
[Trends Genet. 2001]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11376-80.
[Proc Natl Acad Sci U S A. 1993]Cell. 1995 Aug 25; 82(4):587-97.
[Cell. 1995]Genetics. 2000 May; 155(1):141-57.
[Genetics. 2000]Genetica. 2003 Mar; 117(2-3):165-77.
[Genetica. 2003]Nature. 2001 Feb 22; 409(6823):1000.
[Nature. 2001]Mol Biol Evol. 1995 May; 12(3):391-404.
[Mol Biol Evol. 1995]Mol Biol Evol. 2002 May; 19(5):748-61.
[Mol Biol Evol. 2002]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Science. 2002 Oct 4; 298(5591):129-49.
[Science. 2002]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11376-80.
[Proc Natl Acad Sci U S A. 1993]Cell. 1995 Aug 25; 82(4):587-97.
[Cell. 1995]Genetics. 2000 May; 155(1):141-57.
[Genetics. 2000]Nature. 2001 Feb 22; 409(6823):1000.
[Nature. 2001]Proc Natl Acad Sci U S A. 2001 Aug 14; 98(17):9724-9.
[Proc Natl Acad Sci U S A. 2001]Genome Res. 2002 Aug; 12(8):1294-300.
[Genome Res. 2002]Trends Biochem Sci. 2000 Sep; 25(9):421-3.
[Trends Biochem Sci. 2000]Science. 2002 Oct 4; 298(5591):149-59.
[Science. 2002]J Cell Sci. 2000 Oct; 113 Pt 19():3485-97.
[J Cell Sci. 2000]Proc Natl Acad Sci U S A. 1993 Dec 1; 90(23):11376-80.
[Proc Natl Acad Sci U S A. 1993]Proc Natl Acad Sci U S A. 2001 Aug 14; 98(17):9724-9.
[Proc Natl Acad Sci U S A. 2001]Nature. 2001 Feb 22; 409(6823):1000.
[Nature. 2001]Genetics. 2002 Apr; 160(4):1481-7.
[Genetics. 2002]Proc Natl Acad Sci U S A. 2001 Aug 14; 98(17):9724-9.
[Proc Natl Acad Sci U S A. 2001]Nature. 2001 Feb 22; 409(6823):1000.
[Nature. 2001]Nucleic Acids Res. 1996 May 1; 24(9):1653-61.
[Nucleic Acids Res. 1996]Nucleic Acids Res. 2001 Jul 15; 29(14):3108-15.
[Nucleic Acids Res. 2001]FEBS Lett. 2001 Nov 2; 507(3):269-79.
[FEBS Lett. 2001]Cell. 2000 Jun 9; 101(6):671-84.
[Cell. 2000]Mol Cell. 2002 Jul; 10(1):21-33.
[Mol Cell. 2002]Science. 2002 Sep 13; 297(5588):1833-7.
[Science. 2002]Genome Res. 2001 Feb; 11(2):230-9.
[Genome Res. 2001]Science. 2000 Feb 11; 287(5455):1060-2.
[Science. 2000]Proc Natl Acad Sci U S A. 2000 Dec 19; 97(26):14433-7.
[Proc Natl Acad Sci U S A. 2000]Genome Res. 2002 Jun; 12(6):857-67.
[Genome Res. 2002]Genome Res. 2001 Dec; 11(12):2009-19.
[Genome Res. 2001]Cell. 1995 Aug 25; 82(4):587-97.
[Cell. 1995]Genetics. 2000 May; 155(1):141-57.
[Genetics. 2000]Genetics. 2002 Apr; 160(4):1481-7.
[Genetics. 2002]Nature. 2001 Feb 22; 409(6823):1000.
[Nature. 2001]Science. 2000 Mar 24; 287(5461):2185-95.
[Science. 2000]Genome Biol. 2002; 3(12):RESEARCH0079.
[Genome Biol. 2002]Science. 2002 Oct 4; 298(5591):129-49.
[Science. 2002]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]Genetics. 2000 May; 155(1):141-57.
[Genetics. 2000]Genome Res. 2002 Aug; 12(8):1294-300.
[Genome Res. 2002]