• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosbiolPLoS BiologySubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)View this Article
PLoS Biol. May 2007; 5(5): e106.
Published online Apr 17, 2007. doi:  10.1371/journal.pbio.0050106
PMCID: PMC1852585

Peptides Encoded by Short ORFs Control Development and Define a New Eukaryotic Gene Family

Alfonso Martinez Arias, Academic Editor

Abstract

Despite recent advances in developmental biology, and the sequencing and annotation of genomes, key questions regarding the organisation of cells into embryos remain. One possibility is that uncharacterised genes having nonstandard coding arrangements and functions could provide some of the answers. Here we present the characterisation of tarsal-less (tal), a new type of noncanonical gene that had been previously classified as a putative noncoding RNA. We show that tal controls gene expression and tissue folding in Drosophila, thus acting as a link between patterning and morphogenesis. tal function is mediated by several 33-nucleotide–long open reading frames (ORFs), which are translated into 11-amino-acid–long peptides. These are the shortest functional ORFs described to date, and therefore tal defines two novel paradigms in eukaryotic coding genes: the existence of short, unprocessed peptides with key biological functions, and their arrangement in polycistronic messengers. Our discovery of tal-related short ORFs in other species defines an ancient and noncanonical gene family in metazoans that represents a new class of eukaryotic genes. Our results open a new avenue for the annotation and functional analysis of genes and sequenced genomes, in which thousands of short ORFs are still uncharacterised.

Author Summary

How cells organize into embryos remains a fundamental question in developmental biology. It is likely that significant insights into embryo development will emerge from the characterisation of novel types of genes. Yet most current genome annotation methods rely heavily on comparisons with already-known gene sequences, so genes with previously uncharacterised structures and functions can be missed. Here we present the characterisation of one of these novel genes, tarsal-less. tarsal-less has two unusual features: it contains more than one coding unit, a structure more similar to some bacterial genes; and it codes for small peptides rather than proteins. In fact, these peptides represent the smallest gene products known to date. Functional analysis of this gene in the fruit fly Drosophila shows that it has important functions throughout development, including tissue morphogenesis and pattern formation. We identify genes similar to tarsal-less in other species, and thus define a tarsal-less–related gene family. We expect that a combination of bioinformatic and functional methods, such as those presented in this study, will identify and characterise more genes of this type. These results suggest that hundreds of novel genes may await discovery.

Introduction

The work of the last decades has seen a breakthrough in our understanding of the genetic and molecular mechanisms of development. Classical genetic approaches have been complemented by systematic searches for new genes and their functions, resulting in an exponential increase of information. This new knowledge has filtered to related areas such as cell biology, medical research, and increasingly, evolution and population genetics. However, there still remain significant gaps in our understanding, not only of how different aspects of development such as patterning, morphogenesis, and differentiation are organised and implemented at the cellular level, but also in how these different aspects are coordinated. One exciting possibility is that new types of genes with new coding arrangements await discovery and characterisation. The number of known key regulatory genes and signalling proteins remains small, in the region of the hundreds, but sequenced and annotated genomes, including the human genome, still contain thousands of genes and transcripts without known function or sequence similarity to other genes [13] or are deemed RNA or noncoding genes [4].

The development of the Drosophila leg offers a good system in which to pursue this analysis further. Fly legs have a high density of pattern elements and a simple developmental topology, with a single main axis of patterning and growth, the PD axis [5,6]. The legs of Drosophila develop from presumptive organs called imaginal discs, and the morphogenesis of these discs, in particular their acquisition of a stereotyped set of folds that prefigure the morphology of the final appendage, is coordinated with patterning and growth [7,8]. An understanding of the main patterning events in leg development has recently been achieved [9,10], and a preliminary understanding of the coordination of a cell-signalling–mediated patterning event with its morphogenesis, in the development of joints, via Notch signalling, has been obtained [1115]. More genes with well-defined morphogenetic functions await integration into this scheme [16], but the identification of further links between patterning and morphogenesis remains elusive. Our search for these links led us to the isolation and characterisation of a new Drosophila gene that we call tarsal-less (tal). This gene expresses a 1.5-kilobase (Kb) transcript that had been classified as putatively noncoding [17,18]. It contains several open reading frames (ORFs) smaller than 50 amino acids (aa) and thus is putatively polycistronic. Our analysis shows that surprisingly, the peptides translated from ORFs of just 11 aa mediate the function of the gene. Therefore tal has two novel features for eukaryotic coding genes: the direct translation of short, unprocessed peptides with full biological function, and their tandem arrangement in a polycistronic messenger. We identify tal homologous genes in other species and observe that they define a new, noncanonical gene family of ancient origin. We expect that a combination of new bioinformatics and proteomics methods tailored to the search of peptides and small ORFs (smORFs) [19,20], plus a reassessment of classical data, will identify and characterise more new coding genes with similarly important functions in these and other areas of biology.

Results

Isolation and Characterisation of the tarsal-less Gene

We identified the tal gene through a spontaneous mutant (tal1) with defective legs in which the tarsal segments [21] do not develop (Figure 1). Meiotic and deficiency mapping, followed by cytogenetic and molecular methods, revealed tal1 to be a small inversion between regions 86E1,2 and 87F15. The tal1 phenotype maps to the 87F15 breakpoint, to the left of the Mst87F gene (Figure 1A). There is no gene prediction in this region, but there is a noncoding cDNA, LD11162 [22], and two lethal P element inserts, S011041 and KG1680, located 5′ and 3′ respectively to LD11162 (Figure 1A). We found KG1680 to be allelic to tal1 and to produce similar phenotypes in legs over a chromosomal deficiency for the tal region. These are regulatory mutants that affect only the imaginal disc function. Mobilisation of both KG1680 and S011041 insertions produced a number of alleles that all define a single complementation group. Alleles producing a deletion of the coding region for LD11162 (talS68, talS18, and talK40; see Figure 1A) behave as nulls.

Figure 1
Characterisation of the tal Locus

In addition to LD11162, there are several cDNAs isolated independently [22]. We sequenced one of these, LP10384, that is identical to LD11162. In addition, a single transcript of 1.5 kb corresponding to this cDNA has been identified by Northern blots [17] and reverse-transcriptase PCR (unpublished data). The expression of this transcript is similar to the lacZ reporter S011041 (Figure 2A and and2B),2B), is coincident with the regions affected in tal mutants (Figure 1B and and1C),1C), and is lost in ta1 mutants (unpublished data). To prove definitely that this transcript encodes the function of the tal gene, we performed a rescue experiment. The KG1680 insert was replaced by a Gal4 insert [23]. The resulting Gal4 line (P{GaWB}talKG, subsequently referred to as tal-Gal4) is a regulatory viable allele similar to tal1 and the KG1680 insertion, and produces a tal phenotype in legs (Figure 1B–1D) while simultaneously driving the expression of upstream activating sequence (UAS) constructs [24] in the tal pattern. We generated a construct with the full-length LP10384 cDNA downstream of a UAS promoter (UAS-tal) and tried to rescue mutant animals of the genotype tal-Gal4/talS68 by introducing this UAS-tal construct. In these tal-Gal4/talS68; UAS-tal/+ animals, the phenotypes were rescued to wild type (Figure 1E). This rescue proves that the tal function is encoded by LP10384, which represents the tal RNA. Moreover, ectopic expression of UAS-tal produces mutant phenotypes that are consistent with tal being a tarsal determinant: transformation of distal tibia and fusion to tarsi, where tal is normally expressed (Figure 1F).

Figure 2
tal Regulates Tarsal Patterning

Functions of tal in Development

tal expression in the leg has the interesting feature of being transient (Figure 2A–2C). The time of tal expression (from about 80 to 96 h after egg laying [AEL]) coincides with the specification of the tarsal region by the activation of specific genes in ring patterns similar to that of tal [9,10]. One of the genes activated transiently at this time and required for tarsal patterning is the zinc-finger transcription factor rotund (rn) [25]. We observe that the expression of rn is lost in tal mutants and is extended following ectopic expression of UAS-tal (Figure 2D–2F). In contrast, loss or excess of function of rn (induced with a UAS-rn construct) has no effect on tal expression (unpublished data). These results show that the rn gene is a downstream target of tal.

Further functions of tal are apparent. In tal mutants, the whole tarsal region is missing, a stronger phenotype than that produced by rn mutants [25], and anti-Caspase 3 staining reveals that this is not produced by cell death (unpublished data). tal expression precedes and then straddles the tarsal furrow within which the tarsal segments develop (Figures 2A, A,2B,2B, and and3)3) [26]. In tal mutant discs, the tarsal fold does not form further than a superficial constriction, subsequent tarsal folds do not form, and the tarsal region does not grow (Figure 3). Reciprocally, ectopic expression of tal induces the appearance of ectopic folds in legs (unpublished data). These morphogenetic phenotypes are not produced by changes of rn expression on its own [25], and the lack of folding is not rescued by inducing expression of rn in tal mutants.

Figure 3
tal Has a Morphogenetic Function

tal null alleles are embryonic lethal. tal expression in the developing embryo is initially segmental (Figure 4A; see also http://www.fruitfly.org), followed by a later and more complex pattern of expression in many organs (Figure 4B–4D). The embryonic mutant phenotypes include broken trachea, loss of cephalopharyngeal skeleton, abnormal posterior spiracles, and lack of denticle belts (Figure 4E–4H). These are the regions where tal is expressed from stage 13 until the end of development (Figure 4C and and4D).4D). This phenotype is identical to a deletion of the entire 87F13–15 region, and is not enhanced by removing any putative maternal contribution in germ-line clones (unpublished data). Ectopic expression of UAS-tal produces reciprocal mutant phenotypes, such as extra sclerotised elements in the cephalopharyngeal skeleton (Figure 4I).

Figure 4
tal Is Required for Embryonic Development

Despite the early segmental pattern of expression, tal mutants do not show any segmentation or homeotic phenotype (Figures 4 and S2). Therefore, the early segmental expression seems to be only a transient state to establish expression in the precursors of the tracheal system (Figure 4B). Although the mutant epidermis lacks denticle belts, segment-specific epidermal sensory organs are present, and segments are formed. Expression of markers such as wingless (Figure 4J), Distal-less, and Ubx (Figure S2) is normal. The late expression of wingless is not expanded and thus is not responsible for the observed loss of denticles [27]. Furthermore, tal function is independent of shaven-baby (Figure 4K) [28]. Altogether these results suggest that tal acts in parallel to the canonical denticle-patterning cascade [29]. Interestingly, tal mutant cells do not undergo the tubulin accumulation and cell morphology changes leading to the differentiation of denticles [30] (Figure 4L and and4M,4M, and unpublished data).

An 11-aa ORF Provides tal Function

Our results show that tal is required for several key developmental processes. The tal cDNA has been classified as “putatively noncoding” [17,18] on the basis of having no ORF longer than 100 aa and no known homologies. A number of candidate smORFs are present in the tal transcript. We will refer to these smORFs according to their sequence and position from 5′ to 3′ as 1A, 2A, 3A, AA, and B (Figure 5A). The type-A ORFs (1A, 2A, 3A, and AA) include a conserved LDPTGXY motif of 7 aa, and this motif is very strongly conserved in the cDNA of homologous genes that we have identified in other arthropods (Figure 5 and Figure S1). ORF 1A and 2A encode an identical 11-aa peptide. ORF 3A encodes another 11-aa peptide very similar to 1A. ORF AA encodes a 32-aa peptide whose N- and C-termini each contain a LDPTGXY motif (Figure 5A). ORF-B encodes a 49-aa peptide without known domains other than a poly-Arg stretch and is somehow weakly conserved in other insects (Figures 5 and S1).

Figure 5
The tal Transcript in Drosophila and Other Species

The conservation of the aa sequences in other species suggests, but does not prove, the translation of these smORFs. With such short sequences, aa conservation cannot be distinguished easily from simple nucleotide conservation, and therefore we decided to study the functional significance of these smORFs and to obtain experimental evidence for their translation. For this, we have built upon our rescue and ectopic expression experiments that proved that tal is encoded by the mRNA represented by LD11162 and LP10384 (Figure 1B–1F). We have tried to rescue tal mutants with UAS constructs containing different directed mutations affecting specific ORFs, and in separate experiments, we have studied the ectopic effects of such constructs and compared them with those of full-length UAS-tal. The results are summarised in Figure 6A.

Figure 6
Directed Mutagenesis and Translation of tal

A construct containing a full-length cDNA from Bombyx mori (Bm-wds) produces the same effects as a full-length Drosophila one. This result validates the comparative results described above and also indicates that tal functionality lies in the ORFs, since these are the only stretches of DNA sequence conserved between Drosophila and Bombyx (Figure S1). Therefore, we next concentrated on dissecting the role of the ORF sequences in the Drosophila cDNA. A deletion construct (AB) leaving only a type-A ORF plus ORF-B is still fully functional. It can rescue tal mutants, and it produces the same ectopic effects as full-length tal. Construct delA deletes the type-A ORF and is just 32 base pairs (bp) shorter than AB, but has lost all functionality, suggesting that the type-A ORF is key for the tal function, and ORF-B is dispensable. It could be argued that the translation initiation context of ORF-B is too weak and that its expression requires an upstream functional type-A ORF. However, the construct ATG-B, in which we have put ORF-B under the control of the Tal1A initiation context, is still unable to reproduce the tal rescue or ectopic effects. Reciprocally, two constructs in which potential translation of ORF-B has been abolished, by either deleting it (delB) or by mutating its start codon (NoB), are fully functional, rescue tal mutants, and produce the same ectopic effects as full-length UAS-tal, including activation of rn expression (unpublished data). Finally, a construct containing only one type-A ORF (1A) is fully functional, and a one-nucleotide insertion that produces a frameshift (1A-FS) abolishes its functions.

Altogether, these results show that (1) an 11-aa type-A ORF provides tal function, and (2) ORF-B has no developmental function.

Polycistronic Translation of tal RNA

These functional results indicate that tal function resides in the type-A ORFs, and the results with constructs Bm-wds, 1A, and 1A-FS seem to exclude a model of tal function as a noncoding RNA. Thus we sought direct proof of tal translation.

The small size of the putative tal peptides makes them difficult to detect directly. In order to facilitate their detection in in vitro and in vivo experiments, we have tagged them by introducing the green fluorescent protein (GFP) coding sequence, minus the start and stop codons, in frame and within each of the type-A ORFs and the ORF-B (Figure 6B). Thus, the resulting fusion constructs still have the tal sequences relevant for translation, including the 5′ and 3′ UTRs, the initiation consensi, and start codons. Construct 1A-GFP contains the GFP sequence within the type-A ORF of the AB construct, which was functional and contains the 1A translation initiation environment. 2A-GFP, 3A-GFP, AA-GFP, and B-GFP contain each GFP fusion within a full-length tal cDNA. Expression of these constructs in a reticulocyte in vitro transcription and translation system with [35S]-methionine shows that the fusion peptides are expressed from the 1A-GFP, 2A-GFP, and AA-GFP constructs, but not from the B-GFP (Figure 6C). Transfection of these constructs into Drosophila S2R+ cells confirmed these results and also showed translation of 3A-GFP (Figure 6D). In all cases, we can discard the interpretation that the results are due to translation from a second methionine in the GFP sequence, not only because of the size of the fusion products obtained, but also because these putative peptides would lack the N-terminal sequences that are essential for GFP fluorescence [31].

Thus, our results show that the tal gene is coding, and polycistronic, because several peptides can be synthesised from a single RNA species. The type-A peptides provide the full tal function, and are translated both in vitro and in vivo.

Discussion

Our results show that translation of an RNA containing smORFs of just 11 aa is required for several important processes during development. Although the tal cDNA contains several copies of the type-A ORFs related by a common LDPTGXY domain, a construct containing just one of them is fully functional. Small peptides are known to have important biological functions, most clearly in endocrine and neural communication [32], but in all described cases, these peptides are mature, cleaved products of a longer ORF. The originality of the tal gene is thus 2-fold. First, smORFs of just 33 nucleotides are fully functional and capable of translation. Second, the carefully regulated local expression of these peptides in complex patterns (as opposed to a systemic release) has important developmental functions. Our genetic and molecular analysis (Figure 1A and unpublished data) show that the tal genomic region contains specific regulatory sequences spread out over a minimum of 25 Kb.

tal Acts during Patterning and Morphogenesis

We notice that tal expression and function are often associated with tissues undergoing changes of shape such as folding and invagination. The development of the fly leg is directed by a regulatory cascade involving cell signals and region-specific transcription factors [9,10,33] (reviewed in [6]). Regulatory interactions between these identity-conferring transcription factors refine and stabilise the final pattern [34,35]. This pattern is then translated into morphogenetic movements and position-specific cell differentiation programs [16,36]. tal seems to be an important part of the leg developmental process and to act as a link between patterning and morphogenesis. On the one hand, the transient ring of tal expression appears in the precise time and place to control tarsal patterning, by promoting rn expression and by being involved in further regulatory interactions with other leg-patterning genes (Figure 2 and unpublished data). On the other hand, tal controls folding of the leg tissue independently of these effects. In the wild-type leg imaginal discs, a complex morphogenetic process involving the appearance of extra folds within the tarsal furrow, in correlation with leg growth, is apparent [26]. In tal mutants, this morphogenetic process is compromised, whereas in excess-of-function experiments, ectopic expression of tal induces the appearance of ectopic folds in legs. In the mutant discs, cells undergo an apico-basal constriction, but the tarsal furrow never widens into a fold; the appearance of further tarsal sub-folds is precluded, and the presumptive tarsal region does not grow. In the embryo, tal expression is found in tissues of ectodermal origin that undergo an invagination without compromising their epithelial organisation, such as the foregut (and later on in its derivatives, the proventriculus and the pharynx), the hindgut, the developing trachea, and the spiracles [37]. In mutant embryos, head involution is slow, the pharynx is short and misplaced, and tracheal fusion is incomplete (Figure 4 and unpublished data). The loss of denticles in the epidermis does not seem based on alterations of the segmental patterning cascade, but on cell morphology defects that do not involve defects in apico-basal cell polarity or epidermal integrity (Figure 4 and unpublished data). Altogether, these results suggest that tal is required for the control of cell movements during tissue morphogenesis. Further research beyond the scope of this initial study should identify the cellular and molecular targets of this function.

An 11-aa Peptide Defines a New Polycistronic Gene Family

Our results provide experimental evidence for function and translation of the type-A ORFs. These include the in vitro and in vivo translation assays, functional rescues, and sequence analysis. Our results therefore imply that tal is polycistronic, because several ORFs can be translated from a single RNA molecule. The question arises of how this can be accomplished in an eukaryotic gene, but the literature provides a possible mechanism. Polycistronic genes are known in eukaryotes including Drosophila [3840], and so in principle, all tal ORFs could be potentially translated simultaneously. Experimental evidence supports three models for translation of polycistronic messengers in eukaryotes, namely “internal ribosomal entry sites (IRES),” “leaky scanning,” and “reinitiation” [41]. There are clear rules backed by experimental data concerning the DNA sequences and transcript structure involved in each of these models. The tal RNA sequence seems to exclude both the IRES and the leaky scanning possibilities. There is not enough space for IRES between the tal ORFs, and the initiation consensi are stronger in the 5′ ORFs than in the 3′ ones, the opposite of conditions favourable for leaky scanning. However, polycistronic translation of type-A ORFs in the tal transcript is possible under the reinitiation model because their spacing is between 40 and 200 bp, and the short type-A ORFs (1A to 3A) are much shorter than 35 aa. In all cases studied, the presence of 5′ ORFs has a dramatic impact on the rate of translation of the 3′ ones, leading in certain conditions, to total blockage of 3′ translation [41]. Accordingly, our in vitro translation experiment shows a diminishing amount of protein arising from each ORF, with highest levels produced by 1A, and lowest by AA (Figure 5C). We would expect, by virtue of its conserved common domain, that these translated type-A peptides will share the same functions. The presence of repeated or similar ORFs is perhaps a device to ensure enough translation of LDPTGXY-containing peptides. This hypothesis coincides with the results of our structure/function analysis, which shows that a single artificial type-A ORF suffices to provide tal function.

These conclusions are further corroborated by our discovery of tal homologous genes in other insects. These genes contain repeated copies of type-A ORFs in varying number from two (crustaceans and primitive insects) to 11 (Bombyx mori), and an evolutionary trend towards accumulation of more type-A ORFs, including duplications of the entire gene, is apparent. The aa sequence of these type-A ORFs is very strongly conserved in their core domain LDPTGXY. The spacing between ORFs is most compatible with the reinitiation model described above. Not only sequence, but also functionality is conserved, as indicated by the rescue of Drosophila mutants with a Bombyx cDNA. The resilience and long age of the evolutionary history of this gene family suggest, not a recently evolved curiosity of some insects, but a peptide with ancestral and current importance.

All available data suggest that the weakly conserved ORF-B is spurious or nonfunctional. In Drosophila, our functional analysis fails to identify any essential function for ORF-B, and both our in vitro and in vivo studies fail to detect its translation. This is in agreement with the fact that the 5′ presence of several type-A ORFs with strong initiation contexts, allied to the weakness of the context for ORF-B, does not favour the translation of ORF-B (Figure 5A). Furthermore, the size of the ORF AA is 32 aa, near the limit of 35 aa required for continued downstream reinitiation at ORF-B. In agreement with this sequence analysis, ectopic expression of the Bombyx Bm-wds construct containing an ORF-B in Drosophila does not produce any additional phenotypes when compared to those produced by the Drosophila constructs, indicating that the Bombyx ORF-B is not functional either. We would surmise that the weak conservation of ORF-B sequences is either related to some functional requirement (other than translation) for the nucleotide sequence in the region of the transcript, or pure chance.

The mlpt Gene in Tribolium

The conservation of aa sequences has been suggested as evidence for the translation of three type-A ORFs and one ORF-B in a homologous gene called milles-pattes (mlpt) found in the flour beetle Tribolium castaneum [42]. These ORFs are of a similar small size as in Drosophila, but again such aa conservation is not conclusive evidence. In the absence of a biochemical and functional analysis of these different ORFs like the one we present here, it is difficult to guess which ORFs are translated and mediate the function of mlpt. The ORF-B of mlpt has been deemed the main functional element of the gene due to its longer length [42], but in fact, the available data belie this interpretation and favour our own conclusion of ORF-B as nonfunctional. The ORF-B of mlpt has no Kozak consensus at all, and its start codon overlaps with the stop codon of the previous 5′ type-A ORF, a situation that seems most unlikely to lead to ORF-B translation, even by a mechanism of readthrough as postulated [42]. Readthrough and ribosome codon slippage always proceed by skipping bases forward, rather than backwards as would be needed here. Further, ORF-B aa conservation is rather weak. Although Savard et al. [42] identify a “poly-Arg” conserved domain in alignments of selected sequences from species of only three insect orders, this conservation disappears when the comparisons are extended to further orders such as in our sequence analysis (Figures 4 and S1). We note that (1) “orphan” AUG codons are not a rare occurrence (about 500,000 in Drosophila; M. Ladoukakis, personal communication), and (2) that the nucleotide sequence in the ORF-B region is thymidine-poor, which produces a bias in its conceptual translation towards certain amino acids, including Arg. In addition, our analysis shows that tal genes without ORF-B exist, and in fact, an ORF-B is only present in some genes from holometabolous insects.

RNA interference (RNAi) analysis of the function of the whole mlpt transcript identifies several functions [42] that seem homologous to the one we have identified in Drosophila, in particular the tarsal-promoting function, and a requirement in the tracheal system. However, Savard et al. [42] also identify a “gap” and homeotic segmentation phenotypes that our expression and functional data results show to be absent in Drosophila (Figures 3 and S2). This functional difference might be due to the different modes of early embryonic development in Drosophila and Tribolium, which also involve a different complement of gap and maternal genes [43]. To clarify whether this segmentation function is ancestral, but has been lost in Drosophila, or whether it is a recently arisen specialization of Tribolium, will require the functional characterisation of tal in other insects.

A Noncanonical Class of Eukaryotic Genes Contains smORFs

All sequenced and annotated genomes contain genes and transcripts without known function, sequence homologies, or even known protein domains. In particular, an increasing number of RNA transcripts are being classified as “noncoding” on the basis of not having ORFs longer than 50–100 aa. Furthermore, genomes contain hundreds of thousands of similarly smORFs that are systematically eliminated from gene annotations for statistical reasons. cDNA libraries and expressed sequence tag (EST) collections also discriminate against small cDNAs, perhaps losing many potential transcripts as well [44]. In the rare cases in which smORFs have been identified in longer, polycistronic messengers, studies have centred on the regulatory effect of the 5′ smORFs and resulting peptides on a standard, longer 3′ ORF. Thus, the possibility of smORFs producing peptides with important, independent functions has been largely overlooked outside of yeast, in which there is firm evidence for their existence [19]. Here we identify tal as a functional gene encoding only smORFs, which are translated. The tal type-A peptides define an ancient gene family with at least a crustacean representative (in Daphnia), and thus is not restricted to insects and is older than 440 million years (the estimated time for the origin of insects). We suspect that this new gene family may in fact be a representative of a new and widespread class of genes and that more genes encoding smORFs, either alone or in polycistronic messengers, await isolation and characterisation. Our analysis shows that a good cross-species sample of sequences is required to predict noncanonical peptide-coding genes, but also that these predictions must be validated by functional data, because in its absence, wrong predictions can be made. We expect that a combination of bioinformatic and functional methods tailored to the search of peptides and smORFs will identify and characterize more new gene products and eukaryotic coding genes. Preliminary results in Drosophila (unpublished data), yeast [19], and Hydra [45] suggest that hundreds of such genes may exist.

Materials and Methods

Fly stocks.

A synthetic deficiency for the 87F13–15 region was generated in heterozygous Df(3)urd /Df(3)red31 flies. dpp-Gal4 and Dll-Gal4 were used to drive ectopic transgene expression in flies and embryos, respectively. These stocks plus l(3)S011041 ([46]) and KG1680 ([47]) are available from stock centres (http://flybase.bio.indiana.edu). The svb107 enhancer trap line, which reproduces the shaven-baby pattern of expression [28], and the mutant allele svb2 were a gift from F. Payre. Flies and embryos were mounted in Hoyer's for microscopy.

Generation of the P{GaWB}talKG (tal-Gal4) line and tal alleles.

Replacement of the P{SuPor}KG1680 insertion by a P{GaWB} transposable element was done by mobilisation in omb-Gal4; +/CyO Δ2–3; KG1680/TM3Sb flies [23]. The progeny from possible replacements were screened following UAS-GFP expression. All replacements were precise. Mobilisation of P{lacW}l(3)S011041 and P{GaWB}talKG was carried out with the Δ2–3 transgene. Revertants lacking white and yellow markers as appropriate were isolated. Molecular characterisation of these revertants and replacements was done by PCR, Southern blot, and sequencing as needed. talS68 and talS18 are deletions obtained by mobilisation of P{lacW}l(3)S011041, and talK40 from mobilisation of P{GaWB}talKG.

Immunohistochemistry and microscopy.

Developing trachea were revealed with the rhodamine-conjugated Chitin-Binding Protein (CBP at 1:500; New England Biolabs, Beverly, Massachusetts, United States). Other antibodies used were anti-β-galactosidase (1:1,000; Sigma, St. Louis, Missouri, United States; and 1:5,000; Cappel, MP Biomedicals, Solon, Ohio, United States); anti-cleaved-Caspase-3 (Asp 175: Cell Signaling Tech. at 1:250), anti-αtubulin (DM1A at 1:500; Sigma), anti-Wingless (1:50; Developmental Studies Hybridoma Bank [DSHB], Iowa City, Iowa, United States), anti-Ubx FP388 (1:20; R. White), and anti-Dll (1:2,000; I. Duncan). In developing leg discs, the actin cytoskeleton was revealed by phalloidin-rhodamine (1:40; Molecular Probes, Eugene, Oregon, United States) and basal membranes by anti-β-integrin (1:500; DSHB). Secondary antibodies conjugated to biotin, rhodamine, and FITC were used (Jackson ImmunoResearch, West Grove, Pennsylvania, United States, and Vector Laboratories, Burlingame, California, United States). Standard protocols for embryo and imaginal disc staining were followed [27]. Images were acquired and processed using a Zeiss LSM 510 confocal microscope (Carl Zeiss, Oberkochen, Germany) and LSM image software.

In situ hybridisation.

Standard procedures were followed. DIG-labelled LP10384 was used as a tal RNA probe, and DIG-labelled 4H-3 rn cDNA fragment was used as a rn probe [25].

Constructs.

The tal constructs are based on the LP10384 cDNA cloned in the pOT2 vector. Primer sequences and detailed strategies are available on request. The AB construct was made by digestion of the LP10384 cDNA with BamHI, which cuts in equivalent positions within the conserved regions of the ORF 1A and the last LDPTGXY motif of the ORF AA. The fragment containing the vector and most of the LP10384 sequence was ligated, resulting in a single type-A ORF that codes for a peptide identical to 1A. The rest of the mutant constructs were made by PCR, with primers containing directed mutations and/or restriction sites for ligation. With this strategy, we avoid any alterations to the rest of the cDNA, including UTRs and regions between the ORFs. For the Bombyx construct, the wdS20994 cDNA has been cloned into pPUASt. For the 1A-GFP construct, the sequence of GFP was amplified by PCR from the pEGFP vector with internal primers so that the fragment did not contain start or stop codons, and with a BamHI adapter site. This fragment was BamHI digested and cloned into BamHI linearised AB construct. For the 2A-GFP and 3A-GFP, a SpeI site was introduced at the end of the LP10384 ORF 2A and ORF 3A by directed mutagenesis, then linearised, and the GFP sequence flanked by SpeI adaptors was introduced in frame. For the AA-GFP, a SpeI site was introduced in the middle of the ORF AA, between the two conserved LDPTGXY motifs, by directed mutagenesis, then linearised, and the GFP sequence flanked by SpeI adaptors was introduced in frame in LP10384. For the B-GFP construct, a similar strategy was employed, by introducing a KpnI site in ORF-B. For the generation of transgenic flies or transfection into S2R+ cells, these constructs were excised by double digestion with EcoRI and XhoI, and directionally cloned into pPUASt.

In vitro transcription and translation experiments.

These were carried out using the TNT Quick Coupled Transcription/Translation reticulocyte system (Promega, Madison, Wisconsin, United States). The pool of proteins was separated by PAGE, and incorporation of [35S]-Met allowed the detection of the translated products by autoradiography.

Cell culture and in vivo translation experiments.

Drosophila S2R+ cells were grown in Schneider's Drosophila medium (Invitrogen, Carlsbad, California, United States) with 10% heat-inactivated foetal bovine serum, 50-units/ml penicillin, 50-μg/ml streptomycin (Invitrogen) at 24 °C. S2R+ cells were removed from the culture flask with Trypsin-EDTA (Invitrogen). Cells were transiently transfected with 2 μg of DNA using FuGene HD (Roche, Basel, Switzerland). Plasmids transfected were pActin-Gal4, pPUASt-DsRedT4NLS, and the appropriate pPUASt-tal-GFP construct. At 48 h after transfection, cells were washed in PBS, fixed for 20 min in 4% paraformaldehyde, washed twice, stained for 10 min with DAPI (Sigma), washed, and then mounted in Vectashield medium.

DNAs and sequences.

Drosophila melanogaster cDNAs were obtained from the Berkeley Drosophila Genome Project (BDGP) collection [22]. tal cDNAs are LD11162 and LP10384. LP10384 sequencing revealed it to be identical to LD11162, with a 5′ UTR just 8 bp longer. For the phylogenetic analysis, homologous sequences were identified with the BLAST engine against several databases and obtained by different strategies. We used the following: for Anopheles gambiae, we obtained from the MR4 Anopheles repository, the cDNA 19600449643540 from the MRA-467–43 library [48]; for Lutzomyia longipalpis, two sequenced cDNAs; Bombyx mori cDNA brP0760 and EST wdS20994, which we obtained from the Silkbase EST collection [49] and sequenced; Apis mellifera genomic contig 15.24; and Tribolium castaneum gene mlpt. For the following species, we assembled contigs from the mentioned sequences: four Bicyclus aniana ESTs; three Homalodisca coagulata ESTs; two Aphis gossypii ESTs; three Acyrthosiphon pisum ESTs; a Locusta migratoria EST; a Daphnia pulex EST; and three genomic traces from the NCBI archive.

Supporting Information

Figure S1

Conceptual Translation of the tal ORFs in Arthropod Species:

(22 KB DOC)

Figure S2

tal Is not Involved in Segmentation or Regulation of Segment Identity during Embryogenesis:

(4.5MB TIF)

Accession Numbers

The National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov) accession numbers for the genes and gene products discussed in this paper are as follows: Acyrthosiphon pisum ESTs (CV844847, CV848262, and DY229958); Anopheles gambiae cDNA 19600449643540 (EF427621); Aphis gossypii ESTs (DR391935 and DR396643); Apis mellifera genomic contig 15.24 (NW_001253127); Bicyclus aniana ESTs (DY768921, DY768985, DY769016, and DY770310); Bombyx mori cDNA brP0760 (BP115320); Bombyx mori cDNA wdS20994 (EF427620); Daphnia pulex EST (EE682928); Daphnia pulex genomic traces from the NCBI archive (AZSH294914, AZWZ371589, and AZWZ484121); Drosophila melanogaster cDNA LD11162 (AY070879); Drosophila melanogaster cDNA LP10384 (EF427619); Homalodisca coagulata ESTs (CO641298, DN197711, and DN197836); Locusta migratoria EST (DY229958); Lutzomyia longipalpis cDNAs (AM108347 and AM108346); and Tribolium castaneum mlpt (AM269505).

Acknowledgments

We thank Rose Phillips for technical support, Javier Terriente, Mandi Butler, and other members of the lab, and A. Bailey for unpublished results and discussions. We thank Rob Ray for comments on the manuscript, and Simon Morley for his help. We would like to thank BDGP for the Drosophila melanogaster cDNAs; R. A Holt and MR4 for the Anopheles gambiae cDNA; Toru Shimada for the Bombyx mori cDNAs; and John Colbourne for assistance with the Daphnia pulex sequences.

Abbreviations

aa
amino acid
EST
expressed sequence tag
GFP
green fluorescent protein
ORF
open reading frame
smORF
small open reading frame
UAS
upstream activating sequence

Footnotes

Author contributions. MIG, JIP, SF, and JPC conceived and designed the experiments. MIG, JIP, SF, SAB, and JPC performed the experiments and analyzed the data. MIG, JIP, SF, and JPC contributed reagents/materials/analysis tools. MIG, JIP, and JPC wrote the paper.

Funding. This work was funded by a Wellcome Trust Senior Research Fellowship (057730/Z/99/B) to JPC.

Competing interests. The authors have declared that no competing interests exist.

References

  • Consortium IHGS. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. [PubMed]
  • Kessler MM, Zeng Q, Hogan S, Cook R, Morales AJ, et al. Systematic discovery of new genes in the Saccharomyces cerevisiae genome. Genome Res. 2003;13:264–271. [PMC free article] [PubMed]
  • Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, et al. Comparative genomics of the eukaryotes. Science. 2000;287:2204–2215. [PMC free article] [PubMed]
  • Pollard KS, Salama SR, Lambert N, Lambot M-A, Coppens S, et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature. 2006;443:167–172. [PubMed]
  • Couso JP, Bishop SA. Proximo-distal development in the legs of Drosophila. Int J Dev Biol. 1998;42:345–352. [PubMed]
  • Kojima T. The mechanism of Drosophila leg development along the proximodistal axis. Dev Growth Diffn. 2004;46:115–129. [PubMed]
  • Bryant PJ. Pattern formation in imaginal discs. In: Ashburner M, Wright TRF, editors. The genetics and biology of Drosophila. London: Academic Press; 1978. pp. 230–335.
  • Cohen SM. Imaginal disc development. In: Bate M, Martinez Arias A, editors. The development of Drosophila melanogaster. Plainview (New York): Cold Spring Harbor Laboratory Press; 1993. pp. 747–841.
  • Galindo MI, Bishop SA, Greig S, Couso JP. Leg patterning driven by proximal-distal interactions and EGFR signaling. Science. 2002;297:256–259. [PubMed]
  • Campbell GL. Distalization of the Drosophila leg by graded EGF-receptor activity. Nature. 2002;418:781–785. [PubMed]
  • de Celis JF, Tyler DM, de Celis J, Bray SJ. Notch signalling mediates segmentation of the Drosophila leg. Development. 1998;125:4617–4626. [PubMed]
  • Bishop SA, Klein T, Arias AM, Couso JP. Composite signalling from Serrate and Delta establishes leg segments in Drosophila through Notch. Development. 1999;126:2993–3003. [PubMed]
  • Rauskolb C, Irvine KD. Notch-mediated segmentation and growth control of the Drosophila leg. Dev Biol. 1999;210:339–350. [PubMed]
  • Mirth C, Akam M. Joint development in the Drosophila leg: Cell movements and cell populations. Dev Biol. 2002;246:391–406. [PubMed]
  • Hao I, Green RB, Dunaevsky O, Lengyel JA, Rauskolb C. The odd-skipped family of zinc finger genes promotes Drosophila leg segmentation. Dev Biol. 2003;263:282–295. [PubMed]
  • von Kalm L, Fristrom D, Fristrom JW. The making of a fly leg: A model for epithelial morphogenesis. Bioessays. 1995;17:693–702. [PubMed]
  • Tupy JL, Bailey AM, Dailey G, Evans-Holm M, Siebel CW, et al. Identification of putative noncoding polyadenylated transcripts in Drosophila melanogaster. Proc Natl Acad Sci U S A. 2005;102:5495–5500. [PMC free article] [PubMed]
  • Inagaki S, Numata K, Kondo T, Tomita M, Yasuda K, et al. Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila. Genes Cells. 2005;10:1163–1173. [PubMed]
  • Kastenmayer JP, Ni L, Chu A, Kitchen LE, Au W-C, et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006;16:365–373. [PMC free article] [PubMed]
  • Rudd KE, Humphery-Smith I, Wasinger VC, Bairoch A. Low molecular weight proteins: A challenge for post-genomic research. Electrophoresis. 1998;19:536–544. [PubMed]
  • Galindo MI, Couso JP. Intercalation of cell fates during tarsal development in Drosophila. BioEssays. 2000;22:777–780. [PubMed]
  • Stapleton M, Liao GC, Brokstein P, Hong L, Carninci P, et al. The Drosophila gene collection: Identification of putative full-length cDNAs for 70% of D. melanogaster genes. Genome Res. 2002;12:1294–1300. [PMC free article] [PubMed]
  • Sepp KJ, Auld VJ. Conversion of lacZ enhancer trap lines to GAL4 lines using targeted transposition in Drosophila melanogaster. Genetics. 1999;151:1093–1101. [PMC free article] [PubMed]
  • Brand AH, Perrimon N. Targeted gene expression as a means of altering cell fates and generating dominant phenotypes. Development. 1993;118:401–415. [PubMed]
  • St Pierre SE, Galindo MI, Couso JP, Thor S. Control of Drosophila imaginal disc development by rotund and roughened eye: differentially expressed transcripts of the same gene encoding functionally distinct zinc finger proteins. Development. 2002;129:1273–1281. [PubMed]
  • Kojima T, Sato M, Saigo K. Formation and specification of distal leg segments in Drosophila by dual Bar homeobox genes, BarH1 and BarH2. Development. 2000;127:769–778. [PubMed]
  • Bejsovec A, Martinez Arias A. Roles of wingless in patterning the larval epidermis of Drosophila. Development. 1991;113:471–485. [PubMed]
  • Delon I, Chanut-Delalande H, Payre F. The Ovo/Shavenbaby transcription factor specifies actin remodelling during epidermal differentiation in Drosophila. Mech Dev. 2003;120:747–758. [PubMed]
  • Payre F. Genetic control of epidermis differentiation in Drosophila. International Journal of Developmental Biology. 2004;48:207–215. [PubMed]
  • Martinez-Arias A. Development and patterning of the larval epidermis of Drosophila. In: Bate M, Martinez Arias A, editors. The development of Drosophila melanogaster. Plainview (New York): Cold Spring Harbour Laboratory Press; 1993. pp. 517–608.
  • Li XQ, Zhang GH, Ngo N, Zhao XN, Kain SR, et al. Deletions of the Aequorea victoria green fluorescent protein define the minimal domain required for fluorescence. J Biol Chem. 1997;272:28545–28549. [PubMed]
  • Hewes RS, Taghert PH. Neuropeptides and neuropeptide receptors in the Drosophila melanogaster genome. Genome Res. 2001;11:1126–1142. [PMC free article] [PubMed]
  • Lecuit T, Cohen SM. Proximal-distal axis formation in the Drosophila leg. Nature. 1997;388:139–145. [PubMed]
  • Pueyo JI, Couso JP. Chip-mediated partnerships of the homeodomain proteins Bar and Aristaless with the LIM-HOM proteins Apterous and Lim1 regulate distal leg development. Development. 2004;131:3107–3120. [PubMed]
  • Campbell G. Regulation of gene expression in the distal region of the Drosophila leg by the Hox11 homolog, C15. Dev Biol. 2005;278:607–618. [PubMed]
  • Fristrom DK, Fristrom JW. The metamorphic development of the adult epidermis. In: Bate M, Martinez Arias A, editors. The development of Drosophila melanogaster. Plainview (New York): Cold Spring Harbor Laboratory Press; 1993. pp. 843–897.
  • Hartenstein V, Campos-Ortega JA. Fate-mapping in wild-type Drosophila melanogaster. I. The spatio-temporal pattern of embryonic cell divisions. Roux Arch dev Biol. 1985;194:181–195.
  • Brogna S, Ashburner M. The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: Multigenic transcription in higher organisms. EMBO J. 1997;16:2023–2031. [PMC free article] [PubMed]
  • Estes PS, Jackson TC, Stiimson DT, Sanyal S, Kelly LE, et al. Functional dissection of a eukaryotic dicistronic gene: Transgenic stonedB, but not stonedA, restores normal synaptic properties to Drosophila stoned mutants. Genetics. 2003;165:185–196. [PMC free article] [PubMed]
  • Ben-Shahar Y, Nannapaneni K, Casavant TL, Scheetz TE, Welsh MJ. Eukaryotic operon-like transcription of functionally related genes in Drosophila. Proc Natl Acad Sci U SA. 2007;104:222–227. [PMC free article] [PubMed]
  • Kozak M. Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005;361:13–37. [PubMed]
  • Savard J, Marques-Souza H, Aranda M, Tautz D. A segmentation gene in Tribollium produces a polycistronic mRNA that codes for multiple conserved peptides. Cell. 2006;126:559–569. [PubMed]
  • Davis GK, Patel NH. Short, long, and beyond: Molecular and embryological approaches to insect segmentation. Annu Rev Entomol. 2002;47:669–699. [PubMed]
  • Manak JR, Dike S, Sementchenko V, Kapranov P, Biemar F, et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nature Genetics. 2006;38:1151–1158. [PubMed]
  • Bosch TCG, Fujisawa T. Polyps, peptides and patterning. Bioessays. 2001;23:420–427. [PubMed]
  • Deak P, Omar MM, Saunders RDC, Pal M, Komonyi O, et al. P-element insertion alleles of essential genes on the third chromosome of Drosophila melanogaster: Correlation of physical and cytogenetic maps in chromosomal region 86E-87F. Genetics. 1997;147:1697–1722. [PMC free article] [PubMed]
  • Bellen HJ, Levis RW, Liao GC, He YC, Carlson JW, et al. The BDGP gene disruption project: Single transposon insertions associated with 40% of Drosophila genes. Genetics. 2004;167:761–781. [PMC free article] [PubMed]
  • Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002;298:129–149. [PubMed]
  • Mita K, Morimyo M, Okano K, Koike Y, Nohata J, et al. Construction of an EST database for Bombyx mori and its applications. Curr Sci. 2002;83:426–431.

Articles from PLoS Biology are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...