• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Mar 2006; 16(3): 365–373.
PMCID: PMC1415214

Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae


Genes with small open reading frames (sORFs; <100 amino acids) represent an untapped source of important biology. sORFs largely escaped analysis because they were difficult to predict computationally and less likely to be targeted by genetic screens. Thus, the substantial number of sORFs and their potential importance have only recently become clear. To investigate sORF function, we undertook the first functional studies of sORFs in any system, using the model eukaryote Saccharomyces cerevisiae. Based on independent experimental approaches and computational analyses, evidence exists for 299 sORFs in the S. cerevisiae genome, representing ~5% of the annotated ORFs. We determined that a similar percentage of sORFs are annotated in other eukaryotes, including humans, and 184 of the S. cerevisiae sORFs exhibit similarity with ORFs in other organisms. To investigate sORF function, we constructed a collection of gene-deletion mutants of 140 newly identified sORFs, each of which contains a strain-specific “molecular barcode,” bringing the total number of sORF deletion strains to 247. Phenotypic analyses of the new gene-deletion strains identified 22 sORFs required for haploid growth, growth at high temperature, growth in the presence of a nonfermentable carbon source, or growth in the presence of DNA damage and replication-arrest agents. We provide a collection of sORF deletion strains that can be integrated into the existing deletion collection as a resource for the yeast community for elucidating gene function. Moreover, our analyses of the S. cerevisiae sORFs establish that sORFs are conserved across eukaryotes and have important biological functions.

The initial Saccharomyces cerevisiae genome sequencing effort annotated all ORFs of at least 100 contiguous codons (including the first ATG) not contained entirely within a longer ORF (Goffeau et al. 1996). Knowledge of sORF (small open reading frame; <100 amino acids) function is limited compared to that of larger genes, although small proteins include members of important classes such as mating pheromones, proteins involved in energy metabolism, proteolipids, chaperonins, stress proteins, transporters, transcriptional regulators, nucleases, ribosomal proteins, thioredoxins, and metal ion chelators (for review, see Basrai et al. 1997). Computational discovery of sORFs is difficult because they are “buried” in an enormous pile of meaningless short ORFs that arise by chance. In addition, sORFs are not favorable targets for random mutagenesis. Similar challenges plague attempts to identify non-coding RNAs (ncRNAs), transcripts that function at the level of RNA rather than as templates for translation (for review, see Eddy 2001). Despite the challenges of sORF identification, reports since the publication of the S. cerevisiae genome indicate that sORFs are quite numerous in S. cerevisiae and many are evolutionarily conserved from distantly related fungi to humans.

Many S. cerevisiae sORFs were discovered through expression-based analyses. Velculescu and colleagues used serial analysis of gene expression (SAGE) to identify, quantitate, and compare global gene expression patterns in S. cerevisiae (Velculescu et al. 1995, 1997; Basrai and Hieter 2002). The SAGE technique is based on two principles: (1) a 9–10-bp sequence tag derived from a defined region in any poly(A)+ transcript that uniquely identifies the transcript; and (2) multiple sequence tags that are concatenated and sequenced in a single sequencing lane. In addition to confirming expression of annotated genes, the SAGE study provided the first evidence that hundreds of non-annotated reading frames (NORFs), including many sORFs, are transcribed in S. cerevisiae. We subsequently characterized one of these sORFs, NORF5/HUG1, and determined that it is a downstream target of the MEC1-mediated pathway for DNA damage and replication arrest (Basrai et al. 1999). These results validated the functional significance of sORFs found through systems biology approaches and suggested that other sORFs may have important functions.

Since the SAGE study, additional studies provided expression-based evidence for sORFs. Transcripts for potential sORFs or ncRNAs from intergenic regions were detected by Northern blotting (Olivas et al. 1997). A combined microarray and proteomics approach confirmed transcription of many sORFs discovered by SAGE and detected peptides corresponding to numerous sORFs, including some not reported by SAGE (Oshiro et al. 2002). Additional sORFs were discovered using a gene-trap strategy based on genomic integration of a modified bacterial transposon, and their expression was confirmed by strand-specific oligonucleotide dot-blot arrays (Kumar et al. 2002). Interestingly, some of the sORFs discovered by gene-trap are antisense to coding genes (Kumar et al. 2002).

Potential sORF homologs were identified for many of the sORFs discovered in the expression-based studies, and recent comparative genomic studies have expanded the number of sORFs with potential orthologs. Conserved sORFs were reported from comparisons of the S. cerevisiae genome to partial genome sequences from 13 hemiascomycetes and the complete genome sequences from distantly and closely related fungi (Blandin et al. 2000; Brachat et al. 2003; Cliften et al. 2003; Kellis et al. 2003). A recent study that combined homology searching with RT-PCR identified conserved sORFs whose expression was detected at the level of RNA (Kessler et al. 2003).

Based on the published literature, at least 299 genes in S. cerevisiae likely encode sORFs. We discovered that a similar percentage of sORFs are annotated in multiple eukaryotes and that many of the S. cerevisiae sORFs have potential orthologs in other eukaryotes. We constructed gene-deletion strains for 140 sORFs, bringing the total number of sORF deletion strains to 247. We analyzed these 140 new sORF deletion strains for growth phenotypes and identified sORFs that are essential for haploid growth and for growth at high temperature. We also identified sORFs required for growth under genotoxic conditions including exposure to hydroxyurea (HU), bleomycin, methyl methane sulfonate (MMS), or ultraviolet (UV) radiation. These data highlight the value of expression analyses and comparative genomics to identify sORFs and the advantages of S. cerevisiae genetics in investigating sORF function.

Results and Discussion

Evidence of S. cerevisiae sORFs

The S. cerevisiae genome has 299 annotated sORFs (Saccharomyces Genome Database; http://www.yeastgenome.org/) (Fig. 1A; Supplemental Table A). By comparing the sORFs reported since the publication of the S. cerevisiae genome, we determined that the majority of sORFs (170) were discovered in the gene expression and homology studies mentioned above, while the remainder were previously reported in the literature (Fig. 1A, “129 previously known”). We analyzed the literature for reports of transcription, translation, or homology for the 170 new sORFs. Those that were reported by SAGE (Velculescu et al. 1997), microarrays (Kumar et al. 2002; Oshiro et al. 2002), RT-PCR (Kessler et al. 2003), Northern blot (Olivas et al. 1997), or gene-trap (Kumar et al. 2002) were considered transcribed. The sORFs detected by gene-trapping were considered transcribed and translated because the β-galactosidase assays used to detect integration require transcription and translation. The mass-spectrometry study also identified sORFs with evidence of translation (Oshiro et al. 2002). Finally, sORFs reported in homology searches were classified as supported by homology (Velculescu et al. 1997; Blandin et al. 2000; Kumar et al. 2002; Oshiro et al. 2002; Brachat et al. 2003; Cliften et al. 2003; Kessler et al. 2003).

Figure 1.
Evidence of S. cerevisiae sORFs. (A) Gene expression-based analyses and homology searching reveal 170 potential sORFs, bringing the total number of annotated sORFs in S. cerevisiae to 299. Reports in the literature provided empirical evidence of transcription ...

Many of the new sORFs were detected by more than one approach (Fig. 1B; Supplemental Table A). For example, a large number of sORFs were discovered as both transcribed and translated (43 sORFs) or transcribed and with potential orthologs (67 sORFs), while several (15 sORFs) show evidence of transcription, translation, and homology. sORFs discovered only by transcription-based assays (18 sORFs) may represent ncRNAs, rather than protein-coding genes. sORFs detected at the level of RNA and homology may also be ncRNAs rather than protein-coding genes if the homology is the result of conservation of an RNA rather than protein-coding sequence. The sORFs discovered only by homology (21 sORFs) may represent genes expressed under certain conditions not used in the gene expression studies or could represent conserved sequences such as regulatory elements that are not expressed (Cliften et al. 2003). Most of the sORFs were detected by two or more techniques and likely represent bona fide genes.

Small proteins constitute a significant percentage of annotated proteins in eukaryotes

The 299 sORFs constitute ~5% of the 5865 genes annotated for S. cerevisiae in the NCBI RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/) (Fig. 2; Pruitt et al. 2005). We determined the percentage of annotated small proteins from additional eukaryotes in the NCBI RefSeq database (see Methods). We selected representative eukaryotes including another fungus (Schizosaccharomyces pombe), worms (Caenorhabditis elegans), plants (Arabidopsis thaliana), insects (Drosophila melanogaster), and mammals (Mus musculus and Homo sapiens). Interestingly, a similar percentage of sORFs are annotated for these organisms (~5%), including multicellular eukaryotes that have much larger genomes and a greater number of ORFs (Fig. 2). These results suggest that sORFs are not favored in single-celled eukaryotes or in those with smaller genomes and fewer genes. However, the evidence for the sORFs of S. cerevisiae comes from multiple analyses that may not have been used for all the representative eukaryotes (Fig. 2), and future experiments may reveal additional sORFs in these and other systems. Nevertheless, sORFs represent hundreds and in some cases >1000 ORFs in eukaryotes, and likely contribute significantly to the biology of eukaryotes.

Figure 2.
sORFs constitute a similar percentage of annotated ORFs in representative eukaryotes. The percentage of sORFs for S. cerevisiae and representative eukaryotes was calculated and is depicted in the bar graph. The genome size (megabases) and the number of ...

sORFs are evolutionarily conserved

Many of the new sORFs were discovered based on homology (103 of 170 sORFs) (Fig. 1B), indicating that sORFs likely have fundamental functions across eukaryotes. However, the databases used to search for sORF orthologs differed between reports (e.g., Kessler et al. 2003 used the NCBI fungi sequences, while Oshiro et al. 2002 used the nonredundant sequences from all species), and a search for orthologs of the complete set of 299 sORFs had not been reported. We conducted two searches using the entire set of 299 sORFs. First, we conducted BLAST analyses to examine the conservation of the sORFs in the representative eukaryotes (Fig. 2). Second, to examine sORF conservation more broadly, we examined the data on the sORFs in the HomoloGene database, which was built with genome sequences from a wide variety of eukaryotes (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=homologene) (Wheeler et al. 2005).

For our BLAST analyses, we compared the sORFs to the annotated proteins from the representative eukaryotes (Fig. 2) and to a database derived from genomic and EST sequences of these organisms, the UniGene database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene) (Pontius et al. 2003; see Supplemental material). We discovered that 46 of the sORFs exhibit significant alignments with annotated proteins from two or more of the representative eukaryotes, with BLAST bit scores ranging from 48 to 147 (Table 1). In the representative eukaryotes, ~60% of the proteins that are similar to the S. cerevisiae sORFs are less than ~100 amino acids (data not shown). We also discovered 44 sORFs that align with transcripts from two or more representative eukaryotes in the UniGene database (Supplemental Table B; see Supplemental material).

Table 1.
BLAST bit scores of S. cerevisiae sORFs that exhibit significant alignments with annotated proteins from other eukaryotes

Our analysis of HomoloGene revealed additional conserved sORFs. HomoloGene is a system that automatically detects homologs among the annotated genes of several completely sequenced eukaryotic genomes including H. sapiens and M. musculus (Supplemental material). Seventy-one sORFs were found in HomoloGene clusters conserved at several taxonomic levels, and 55 of the clusters have an assignment from the Conserved Domain Database (Marchler-Bauer et al. 2005), a collection of multiple sequence alignments for ancient domains and full-length proteins (Supplemental Table B). The conserved domains cover a broad spectrum; however, a conserved domain derived from SMART domain 00651, annotated as “small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing,” was represented most frequently, occurring five times. Zinc-finger, ubiquitin-like, and ribosomal protein domains were also encountered multiple times.

In summary, our results, combined with previously published reports, establish that 184 of the S. cerevisiae sORFs may have potential orthologs in other organisms (Supplemental Table B), including distantly related organisms, such as humans, and ~60% of these orthologs may themselves be sORFs (data not shown). Therefore, functional analysis of the S. cerevisiae sORFs has the potential to yield insight into the functions of the S. cerevisiae sORFs and those of other eukaryotes.

Generation of sORF deletion strains

Gene-deletion strain collections of S. cerevisiae have revolutionized functional analyses of genes (e.g., Winzeler et al. 1999). Since only 106 of the 299 sORFs are represented in the previous collection (version 1.0), we attempted to construct gene-deletion strains of the remaining sORFs in the same genetic background as described for the initial yeast gene-deletion strains (Winzeler et al. 1999; Supplemental Fig. 1) (see Methods).

Using homologous recombination, we constructed individual strains in which sequences from the start codon to the stop codon of the sORF were replaced by a kanMX cassette in a diploid strain (Methods) (Supplemental Fig. 1). Each sORF gene-deletion mutant is publicly available either as haploids (MATa and MATα) or as diploids (homozygous or heterozygous; see http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html). The gene-deletion strains contain “molecular barcodes” that will facilitate rapid identification and analysis of genes in genome-wide approaches to analyze gene function (Winzeler et al. 1999; for review, see Pan et al. 2004). We determined that the molecular barcodes corresponding to the sORF deletions are detectable in microarray experiments using the Tag3 arrays (Affymetrix) (data not shown). In total, we constructed 140 sORF heterozygous deletion strains (~93% of the 151 attempted), bringing the total number of sORF deletion strains available to the yeast community to 247 (Supplemental Table A). The remaining sORF deletion strains were not constructed because of technical problems including the inability to design gene-specific primers or to recover transformants.

Identification of essential sORFs

Sporulation of eight of the 140 new sORF heterozygous deletion strains resulted in two viable (genticin-sensitive) and two inviable (sORF deletion) spores, indicating that the corresponding sORFs are essential for viability (Table 2). Three of these genes were previously uncharacterized—YLR099W-A, YNL024C-A, and YNL138W-A—while the remaining five sORFs were previously shown to be essential, which we confirmed in the gene-deletion strain background. These sORFs are required for functions such as kinetochore or spindle integrity (Cheeseman et al. 2002; Li et al. 2005), ER to golgi transport (Heidtman et al. 2005), and pseudouridine biosynthesis (Henras et al. 1998). Combined with the results from Version 1.0, 21 of the sORFs are essential, representing ~8% of sORFs analyzed. The percentage of essential sORFs differs from that of larger ORFs in which ~20% of ORFs tested are essential (Winzeler et al. 1999). This difference may reflect redundancy in sORF function, or indicate that sORFs have regulatory roles rather than essential functions.

Table 2.
sORFs that result in a lethal or a slow-growth phenotype when deleted in haploid strains

Phenotypic analyses of haploid sORF deletion strains

Six of the new haploid sORF deletion strains exhibit slow-growth phenotypes when grown at 30°C (Table 2), including strains deleted for YBL071W-A/KTI11 and YPL096C-A/ERI1, which are known to exhibit slow growth (Fichtner and Schaffrath 2002; Sobering et al. 2003). We further analyzed the growth of the new haploid sORF deletion strains in pilot screens under the following conditions: (1) at high (37°C) or low (11°C) temperatures; (2) in the presence of a sole nonfermentable carbon source; and (3) in the presence of the replication-arrest agent HU and DNA-damaging agents including MMS, bleomycin, and UV radiation. To confirm the results of the pilot screen, we sporulated heterozygous strains corresponding to the haploid strains with phenotypes and characterized the meiotic progeny. For each strain, we analyzed three independent sORF deletion spores and confirmed that the phenotype was linked to the sORF deletion. Upon verification of the phenotypes, we confirmed chromosomal deletion of the sORF by PCR and sequence analysis of the genomic locus at the site of integration of the kanMX cassette (see Methods).

We observed that three of the sORF deletion strains are temperature-sensitive (Ts) for growth at 37°C (Fig. 3A), while none of the sORF deletion strains showed a cold-sensitive growth phenotype at 11°C (data not shown). A Ts allele of the kinetochore mutant (ndc10-1) served as a control (Goh and Kilmartin 1993). We discovered a new gene required for growth at the nonpermissive temperature of 37°C, YKL096C-B, and confirmed the previously reported Ts growth phenotypes of strains with a mutation in sORFs YBR058C-A/TSC3 and YDR079C-A/TFB5, which are involved in sphingolipid biosynthesis and transcription regulation, respectively (Fig. 3A; Gable et al. 2000; Ranish et al. 2004).

Figure 3.
sORFs required for growth at 37°C (Ts) or in the presence of nonfermentable carbon source (petite phenotype). (A) sORFs required for growth at 37°C. Growth assays of 3 μL of fivefold serial dilutions of logarithmic-phase cells ...

We tested the sORF deletion strains for a “petite” phenotype, which refers to an inability to grow in the presence of a nonfermentable carbon source and is an attribute of several mutants including mitochondrial mutants (for review, see Chen and Clark-Walker 2000). We determined that three sORF deletion strains exhibit a “petite” phenotype (Fig. 3B; Table 3). Our results confirm the role of YEL059C-A/SOM1 for mitochondrial function (Esser et al. 1996) and suggest a similar function for the previously uncharacterized sORFs YJL062W-A and YPL189C-A. Consistent with a role in mitochondrial function, Yjl062w-ap fused to GFP has been localized to the mitochondrion (Huh et al. 2003).

Table 3.
sORFs with phenotypes when deleted

To investigate the potential role of sORFs in response to genotoxic stress, we assayed the sORF deletion strains for sensitivity to the replication-arrest agent HU and to DNA-damaging agents bleomycin, MMS, and UV radiation. Sensitivity to these genotoxic agents can provide important clues about the roles of the genes in replication, transcription, cell-cycle progression, and chromosome segregation (Chang et al. 2002; Aouida et al. 2004; Parsons et al. 2004). In addition, many S. cerevisiae genes required for responding to DNA damage and replication arrest have human orthologs, mutations in which lead to human diseases (for review, see Zhou and Elledge 2000).

For these studies, strains grown to logarithmic phase were serially diluted, spotted on medium containing the appropriate drug, or exposed to UV, and incubated for 2–3 d at 30°C. The S. cerevisiae checkpoint mutant mec1Δ sml1Δ, which exhibits sensitivity to HU, bleomycin, MMS, and UV radiation, served as a control (Kiser and Weinert 1996). HU inhibits ribonucleotide reductase, an enzyme that is required for synthesis of dNTPs in S. cerevisiae and other systems, and leads to an arrest in S-phase of the cell cycle (Elledge et al. 1993). As shown in Figure 4A, three sORF deletion strains exhibit varying degrees of sensitivity to growth on HU-containing media, with ybr058c-aΔ/tsc3Δ being the most sensitive. Our results suggest new roles for Tsc3p, a sphingolipid biosynthetic enzyme (Gable et al. 2000); Sus1p, a component of the SAGA and Sac3p–Tthp1p complexes (Rodriguez-Navarro et al. 2004); and the uncharacterized YBR196C-A, in responding to replication arrest.

Figure 4.
sORFs required for growth on media containing replication-arrest and DNA-damaging agents. Growth assays of 3 μL of fivefold serial dilutions of logarithmic-phase cells of the sORF deletion strains spotted on YPD plates (control) or spotted on ...

Next, we tested bleomycin, a radiomimetic drug that leads to both single- and double-stranded DNA damage (Chen and Stubbe 2005), and discovered that four sORF deletion strains are sensitive to bleomycin. sORF deletion strains lacking YBR058C-A/TSC3 showed the most sensitivity, while ykl096c-bΔ and ydr524w-cΔ strains were only moderately sensitive to bleomycin (10 mU/mL) (Fig. 4B). Our results extend the role of Tsc3p in responding to replication arrest caused by HU (Fig. 4A) to an additional role in responding to DNA damage caused by bleomycin.

We also discovered a new sORF required for growth in the presence of MMS. MMS is a DNA-alkylating agent that primarily methylates DNA on N7-deoxyguanine and N3-deoxyadenine (Pegg 1984). Resistance to MMS requires genes from the bypass, post-replication, recombination, base excision repair, and/or checkpoint pathways (Weinert et al. 1994; Xiao et al. 1996; Tercero and Diffley 2001). The sORF deletion strain ybr111w-aΔ/sus1Δ is sensitive to growth on MMS medium (Fig. 4C), a phenotype that, to our knowledge, has not been previously reported for strains deleted for YBR111W-A/SUS1. Ybr111w-ap/Sus1p is a component of the SAGA complex and the Sac3p–Tthp1p mRNA export complex (Rodriguez-Navarro et al. 2004). These results, combined with earlier results (Fig. 4A), suggest a novel role for Sus1p in response to DNA damage induced by MMS and replication arrest induced by HU. Finally, we confirmed a UV-sensitivity phenotype previously reported for the ydr079c-aΔ/tfb5Δ strain in a different genetic background (Fig. 4D; Ranish et al. 2004).

The sORF deletion strains exhibit overlapping and distinct phenotypes

Taken together, we observed conditional phenotypes for nine sORF deletion strains (Table 3). Not surprisingly, we observed that several of the sORF deletion strains exhibit overlapping phenotypes when subjected to DNA damage or replication arrest, an observation made with other ORF deletion strains (Chang et al. 2002; Table 3). For example, two of the HU-sensitive strains also exhibit sensitivity to bleomycin and MMS. Interestingly, all three Ts sORF deletion strains are also sensitive to DNA-damage or replication-arrest agents. These results may suggest that the role of these genes in response to DNA damage and replication arrest may be essential for haploid growth at the nonpermissive temperature of 37°C.

Phenotypic analyses of deletion strains for genes flanking the sORFs

Six of the sORFs that exhibited phenotypes distinct from wild type when deleted (YBR058C-A/TSC3, YBR111W-A/SUS1, YDR079C-A/TFB5, YEL059C-A/SOM1, YJL062W-A, and YKL096C-B) are within 300 bp of larger ORFs. The phenotypes we observed may be due to altered expression of the neighboring ORFs caused by disruptions in their promoters or 5′- or 3′-untranslated regions rather than loss of function of the deleted sORFs. We therefore examined the phenotypes of strains with deletions of genes that are within 300 bp of the sORFs, a conservative approach, as ~60% of ORFs, both large and small, are within 300 bp of another ORF. In all but two cases (YBR111W-A/SUS1, YJL062W-A), deletion of the neighboring genes did not produce the phenotypes we observed for the sorfΔ strain (Supplemental Table C). For these two deletions strains, their phenotypes could be due to interference of expression of a neighboring ORF, loss of the sORF, or both.

We determined that the deletion strain for YGR271C-A showed slow growth, Ts, and an HU-sensitivity phenotype and that a deletion strain for YGR272C, which is 51 bp away from YGR271C-A, also exhibits such phenotypes (Fig. 5A). Sequence analysis of the genomic locus revealed that YGR271C-A is contiguous with YGR272C, forming a single ORF, consistent with the similarity of these two predicted ORFs to a single ORF (PABR143C) from Ashbya gossypii (Fig. 5B; Brachat et al. 2003). We constructed a new gene-deletion strain for the larger ORF, which we denote as ygr271c-aΔ/ygr272cΔ, and determined that this strain showed a more severe slow growth, Ts, and HU-sensitivity phenotype compared to the ygr271c-aΔ or ygr272cΔ strains (Fig. 5A). Further analysis in a cell cycle arrest–release experiment showed that the ygr271c-aΔ/ygr272cΔ strain exhibits a significant delay of at least 40 min in exiting from the G1 phase of the cell cycle after an arrest with α-factor (Fig. 5C). Our results, combined with the analysis of protein expression described below, establish that YGR271C-A and YGR272C constitute a single ORF, which we have named EFG1 (Exit from G1).

Figure 5.
YGR271C-A and YGR272C constitute a contiguous ORF required for growth at 37°C, growth on HU-containing media and cell cycle progression. (A) The ygr271c-aΔ, ygr272cΔ and ygr271c-aΔ/ygr272cΔ strains exhibit slow ...

Protein expression analysis of the sORFs

Recent evidence of expression at the protein level for sORFs has come from genome-wide TAP- and GFP-tagging experiments (Ghaemmaghami et al. 2003; Huh et al. 2003; Supplemental Table A). Protein expression for some of the sORFs detected in our screens has been reported in these (Supplemental Table A) and other studies (Table 3). We epitope-tagged a subset of sORFs identified in our phenotypic analyses by introducing a haemagglutinin epitope (HA) at the C-terminus in their chromosomal context and examined expression of the tagged protein by Western blot analysis. We detected expression of proteins from strains expressing HA-tagged YJL062W-A, YPL189C-A, and YDR524W-C (Fig. 6, lanes 1,2,4). We also detected a band of expected size for EFG1, further confirming that YGR271C-A and YGR272C constitute a single ORF (Fig. 6, lane 3).

Figure 6.
Protein expression analysis of HA-tagged sORFs. Western blot analysis of protein extracts prepared from strains expressing HA-tagged ORFs (YJL062W-A, YPL189C-A, YGR271C-A/YGR272C, YDR524W-C) and the wild-type strain (BY4741) not expressing a HA-tagged ...


In the past, the function of sORFs has been elusive owing to inherent difficulties in identifying them based on genetic, biochemical, or solely computational approaches. S. cerevisiae represents one of the few systems with a wealth of data derived from several functional genomic and comparative genomic studies. Using the strengths of S. cerevisiae as a model, we provide the first systematic investigation of sORF function in any system. Our analysis of the literature combined with our genetic analyses for sORF function presents a comprehensive database for the 299 sORFs in S. cerevisiae. Of the S. cerevisiae sORFs, 184 are related to sequences in other eukaryotes, suggesting the evolutionary conservation of the structure and perhaps function of these sORFs. Although relatively little is known about sORF functions, they have been implicated in key cellular processes including transport, intermediary metabolism, chromosome segregation, genome stability, and other functions. The sORF gene-deletion collection should lead to the discovery of additional functions for sORFs in S. cerevisiae. Moreover, our results, which emphasize the biological significance of sORFs in S. cerevisiae that are conserved across eukaryotes, should provide an impetus for the identification and characterization of sORFs in other systems, including humans.


Analysis of sORF percentage in representative eukaryotes

The number of sORFs coding for proteins of 100 amino acids in length or less, annotated on the transcripts of model organisms in the NCBI RefSeq database, was determined using a query of the Entrez Protein database of the form: srcdb refseq[prop] AND homo sapiens[orgn] AND 0:100[slen]. The total number of ORFs in each set was counted using a query of the form: srcdb refseq[prop] AND homo sapiens[orgn]. The version of RefSeq used was that present in Entrez on 3/15/2005 corresponding to RefSeq release 10, available on 3/6/2005 with updates from 3/6/2005 to 3/15/2005.

Homology searches


For this study, HomoloGene build 38.1, dating from November 23, 2004, was used (ftp://ftp.ncbi.nih.gov/pub/HomoloGene/).

BLAST of sORFs with annotated proteins and UniGene

Single sequence representatives of the UniGene clusters, the “seq-uniques” described in Supplemental material, were downloaded for each organism from the NCBI FTP site (ftp://ftp.ncbi.nih.gov/) and compared to the sORFs using BLAST (see Supplemental material; Altschul et al. 1997). The best BLAST hit was extracted for each sORF only if the hit spanned at least one-third of the translated ORF with an amino acid identity of at least 40%; otherwise, no hit was extracted. The results are summarized in Supplemental Table B.

Media and yeast strains

The media and methodology for yeast growth were as described (Gietz et al. 1992, 1995; Adams et al. 1997; Brachmann et al. 1998). The deletion strain was generated in diploid strain BY4743 (MATa/α his3Δ1/his3Δ1 leu2Δ0/leu2Δ0 lys2Δ0/LYS2 MET15/met15Δ0 ura3Δ0/ura3Δ0), and the haploid spores isogenic with BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0) and BY4742 (MATα his3Δ1 leu2Δ0 lys2Δ0 ura3Δ0) were identified (Winzeler et al. 1999). Other strains include the temperature-sensitive control strain JK421 (MATa ade2-1 ura3-1 his3-11,1 trp1-1 leu2-3,112 can1-100 ndc10-1) (Goh and Kilmartin 1993) and the checkpoint mutant U953–61D (MATa leu2-3,112 ade2-1 can1-100 his3-11,15 ura3-1 trp1-1 RAD5 mec1Δ::TRP1 sml1Δ::HIS3) (Zhao et al. 1998).

Gene-deletion strain construction and confirmation

A PCR-generated (Baudin et al. 1993; Wach et al. 1994) deletion strategy was used to systematically replace each sORF from its start to its stop codon with a kanMX module and two unique 20-mer molecular barcodes as done previously for the gene-deletion strain collection (Winzeler et al. 1999; Giaever et al. 2002; Supplemental Fig. 1; Supplemental material; barcode sequences are given in Supplemental Table A). Each sORF gene-deletion mutant is publicly available either as haploids (MATa and MATα) or as diploids (homozygous or heterozygous; see http://www-sequence.stanford.edu/group/yeast_deletion_project/deletions3.html).

Phenotypic analyses of sORF deletion strains

For sensitivity to HU, MMS, bleomycin, UV, and nonpermissive growth temperatures, we assayed serial dilutions of the sORF strains on YPD or YPD containing 200 mM hydroxyurea (HU; H8627; Sigma), 0.02% methane methylsulfonate (MMS; 64294; Fluka Chemika), or 10 mU/mL bleomycin (BLM; 3154-01; Bristol-Myers Squibb Co.). For sensitivity to UV-radiation, we irradiated strains spotted on YPD with 20 mJ/m2 using a Stratalinker (Stratagene). For growth at the nonpermissive temperatures, we incubated plates at either 11°C or 37°C. A “petite” phenotype was determined by plating strains on a modified YPD medium in which dextrose was substituted with 2% glycerol and 2% ethanol.

Protein expression analysis of HA-tagged ORFs

ORFs were fused in-frame at the genomic locus with three copies of the HA epitope at their C-terminus as previously described (Longtine et al. 1998; Supplemental material). Protein extracts of ORF-HA-expressing strains were analyzed by Western blot analysis as described previously (Crotti and Basrai 2004). The primary antibody was anti-HA (clone 12CA5-Roche) or anti-Tub2p (polyclonal antibody, Basrai lab), and the secondary antibody was HRP-conjugated sheep anti-mouse IgG (NA931V; Amersham).

α-Factor arrest/release experiments

Strains were grown overnight at 30°C in YPD medium and then diluted into fresh medium to obtain a logarithmic-phase culture. Cells were arrested in the presence of 3 μM α-factor (T-6901; Sigma) at 30°C for 90 min, washed twice with water, and resuspended in fresh YPD medium and incubated at 30°C. DNA content was assayed every 20 min after release from the α-factor arrest for a total of 3 h as described previously (Doheny et al. 1993; Basrai et al. 1996) using a Becton-Dickinson FACSort flow cytometer and CellQuest software (BD Biosciences).


The authors thank Anand Sethuraman and Mike Cherry for help with compiling the sequences of the sORFs; Anuj Kumar for sharing unpublished data; Lucy Liu and Xiuquiong Zhou for tetrad dissections; Keith Anderson, Ana Aparicio, and Mike Jensen of the SGTC (Stanford Genome Technology Center) for assistance with the A.M.O.S. primers; and Mark Johnston for advice and support of this work. This work was supported in part by NIH grant R01-HG02432 to J.D.B. and by the Intramural Research Program of the NIH and NCI.


[Supplemental material is available online at www.genome.org.]

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4355406.


  • Adams, A., Gottschling, D.E., Kaiser, C.A., and Stearns, T. 1997. Methods in yeast genetics. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
  • Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. [PMC free article] [PubMed]
  • Aouida, M., Tounekti, O., Leduc, A., Belhadj, O., Mir, L., and Ramotar, D. 2004. Isolation and characterization of Saccharomyces cerevisiae mutants with enhanced resistance to the anticancer drug bleomycin. Curr. Genet. 45 265–272. [PubMed]
  • Basrai, M.A. and Hieter, P. 2002. Transcriptome analysis of Saccharomyces cerevisiae using serial analysis of gene expression. Methods Enzymol. 350 414–444. [PubMed]
  • Basrai, M.A., Kingsbury, J., Koshland, D., Spencer, F., and Hieter, P. 1996. Faithful chromosome transmission requires Spt4p, a putative regulator of chromatin structure in Saccharomyces cerevisiae. Mol. Cell. Biol. 16 2838–2847. [PMC free article] [PubMed]
  • Basrai, M.A., Hieter, P., and Boeke, J.D. 1997. Small open reading frames: Beautiful needles in the haystack. Genome Res. 7 768–771. [PubMed]
  • Basrai, M.A., Velculescu, V.E., Kinzler, K.W., and Hieter, P. 1999. NORF5/HUG1 is a component of the MEC1-mediated checkpoint response to DNA damage and replication arrest in Saccharomyces cerevisiae. Mol. Cell. Biol. 19 7041–7049. [PMC free article] [PubMed]
  • Baudin, A., Ozier-Kalogeropoulos, O., Denouel, A., Lacroute, F., and Cullin, C. 1993. A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res. 21 3329–3330. [PMC free article] [PubMed]
  • Blandin, G., Durrens, P., Tekaia, F., Aigle, M., Bolotin-Fukuhara, M., Bon, E., Casaregola, S., de Montigny, J., Gaillardin, C., Lepingle, A., et al. 2000. Genomic exploration of the hemiascomycetous yeasts: 4. The genome of Saccharomyces cerevisiae revisited. FEBS Lett. 487 31–36. [PubMed]
  • Brachat, S., Dietrich, F.S., Voegeli, S., Zhang, Z., Stuart, L., Lerch, A., Gates, K., Gaffney, T., and Philippsen, P. 2003. Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol. 4 R45. [PMC free article] [PubMed]
  • Brachmann, C.B., Davies, A., Cost, G.J., Caputo, E., Li, J., Hieter, P., and Boeke, J.D. 1998. Designer deletion strains derived from Saccharomyces cerevisiae S288C: A useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14 115–132. [PubMed]
  • Chang, M., Bellaoui, M., Boone, C., and Brown, G.W. 2002. A genome-wide screen for methyl methanesulfonate-sensitive mutants reveals genes required for S phase progression in the presence of DNA damage. Proc. Natl. Acad. Sci. 99 16934–16939. [PMC free article] [PubMed]
  • Cheeseman, I.M., Anderson, S., Jwa, M., Green, E.M., Kang, J., Yates III, J.R., Chan, C.S., Drubin, D.G., and Barnes, G. 2002. Phospho-regulation of kinetochore-microtubule attachments by the Aurora kinase Ipl1p. Cell 111 163–172. [PubMed]
  • Chen, X.J. and Clark-Walker, G.D. 2000. The petite mutation in yeasts: 50 years on. Int. Rev. Cytol. 194 197–238. [PubMed]
  • Chen, J. and Stubbe, J. 2005. Bleomycins: Towards better therapeutics. Nat. Rev. Cancer 5 102–112. [PubMed]
  • Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301 71–76. [PubMed]
  • Crotti, L.B. and Basrai, M.A. 2004. Functional roles for evolutionarily conserved Spt4p at centromeres and heterochromatin in Saccharomyces cerevisiae. EMBO J. 23 1804–1814. [PMC free article] [PubMed]
  • Doheny, K.F., Sorger, P.K., Hyman, A.A., Tugendreich, S., Spencer, F., and Hieter, P. 1993. Identification of essential components of the S. cerevisiae kinetochore. Cell 73 761–774. [PubMed]
  • Eddy, S.R. 2001. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2 919–929. [PubMed]
  • Elledge, S.J., Zhou, Z., Allen, J.B., and Navas, T.A. 1993. DNA damage and cell cycle regulation of ribonucleotide reductase. Bioessays 15 333–339. [PubMed]
  • Esser, K., Pratje, E., and Michaelis, G. 1996. SOM 1, a small new gene required for mitochondrial inner membrane peptidase function in Saccharomyces cerevisiae. Mol. Gen. Genet. 252 437–445. [PubMed]
  • Fichtner, L. and Schaffrath, R. 2002. KTI11 and KTI13, Saccharomyces cerevisiae genes controlling sensitivity to G1 arrest induced by Kluyveromyces lactis zymocin. Mol. Microbiol. 44 865–875. [PubMed]
  • Gable, K., Slife, H., Bacikova, D., Monaghan, E., and Dunn, T.M. 2000. Tsc3p is an 80-amino acid protein associated with serine palmitoyltransferase and required for optimal enzyme activity. J. Biol. Chem. 275 7597–7603. [PubMed]
  • Ghaemmaghami, S., Huh, W.K., Bower, K., Howson, R.W., Belle, A., Dephoure, N., O'Shea, E.K., and Weissman, J.S. 2003. Global analysis of protein expression in yeast. Nature 425 737–741. [PubMed]
  • Giaever, G., Chu, A.M., Ni, L., Connelly, C., Riles, L., Veronneau, S., Dow, S., Lucau-Danila, A., Anderson, K., Andre, B., et al. 2002. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418 387–391. [PubMed]
  • Gietz, D., St Jean, A., Woods, R.A., and Schiestl, R.H. 1992. Improved method for high efficiency transformation of intact yeast cells. Nucleic Acids Res. 20 1425. [PMC free article] [PubMed]
  • Gietz, R.D., Schiestl, R.H., Willems, A.R., and Woods, R.A. 1995. Studies on the transformation of intact yeast cells by the LiAc/SS-DNA/PEG procedure. Yeast 11 355–360. [PubMed]
  • Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldman, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. 1996. Life with 6000 genes. Science 274 546–567. [PubMed]
  • Goh, P.-Y. and Kilmartin, J.V. 1993. NDC10: A gene involved in chromosome segregation in S. cerevisiae. J. Cell Biol. 121 503–512. [PMC free article] [PubMed]
  • Heidtman, M., Chen, C.Z., Collins, R.N., and Barlowe, C. 2005. Yos1p is a novel subunit of the Yip1p–Yif1p complex and is required for transport between the endoplasmic reticulum and the Golgi complex. Mol. Biol. Cell 16 1673–1683. [PMC free article] [PubMed]
  • Henras, A., Henry, Y., Bousquet-Antonelli, C., Noaillac-Depeyre, J., Gelugne, J.P., and Caizergues-Ferrer, M. 1998. Nhp2p and Nop10p are essential for the function of H/ACA snoRNPs. EMBO J. 17 7078–7090. [PMC free article] [PubMed]
  • Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., and O'Shea, E.K. 2003. Global analysis of protein localization in budding yeast. Nature 425 686–691. [PubMed]
  • Jan, P.S., Esser, K., Pratje, E., and Michaelis, G. 2000. Som1, a third component of the yeast mitochondrial inner membrane peptidase complex that contains Imp1 and Imp2. Mol. Gen. Genet. 263 483–491. [PubMed]
  • Kellis, M., Patterson, N., Endrizzi, M., Birren, B., and Lander, E.S. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423 241–254. [PubMed]
  • Kessler, M.M., Zeng, Q., Hogan, S., Cook, R., Morales, A.J., and Cottarel, G. 2003. Systematic discovery of new genes in the Saccharomyces cerevisiae genome. Genome Res. 13 264–271. [PMC free article] [PubMed]
  • Kiser, G.L. and Weinert, T.A. 1996. Distinct roles of yeast MEC and RAD checkpoint genes in transcriptional induction after DNA damage and implications for function. Mol. Biol. Cell 7 703–718. [PMC free article] [PubMed]
  • Kumar, A., Harrison, P.M., Cheung, K.H., Lan, N., Echols, N., Bertone, P., Miller, P., Gerstein, M.B., and Snyder, M. 2002. An integrated approach for finding overlooked genes in yeast. Nat. Biotechnol. 20 58–63. [PubMed]
  • Li, J.M., Li, Y., and Elledge, S.J. 2005. Genetic analysis of the kinetochore DASH complex reveals an antagonistic relationship with the ras/protein kinase A pathway and a novel subunit required for Ask1 association. Mol. Cell. Biol. 25 767–778. [PMC free article] [PubMed]
  • Longtine, M.S., McKenzie III, A., Demarini, D.J., Shah, N.G., Wach, A., Brachat, A., Philippsen, P., and Pringle, J.R. 1998. Additional modules for versatile and economical PCR-based gene deletion and modification in Saccharomyces cerevisiae. Yeast 14 953–961. [PubMed]
  • Marchler-Bauer, A., Anderson, J.B., Cherukuri, P.F., DeWeese-Scott, C., Geer, L.Y., Gwadz, M., He, S., Hurwitz, D.I., Jackson, J.D., Ke, Z., et al. 2005. CDD: A Conserved Domain Database for protein classification. Nucleic Acids Res. 33 D192–D196. [PMC free article] [PubMed]
  • Olivas, W.M., Muhlrad, D., and Parker, R. 1997. Analysis of the yeast genome: Identification of new non-coding and small ORF-containing RNAs. Nucleic Acids Res. 25 4619–4625. [PMC free article] [PubMed]
  • Oshiro, G., Wodicka, L.M., Washburn, M.P., Yates III, J.R., Lockhart, D.J., and Winzeler, E.A. 2002. Parallel identification of new genes in Saccharomyces cerevisiae. Genome Res. 12 1210–1220. [PMC free article] [PubMed]
  • Pan, X., Yuan, D.S., Xiang, D., Wang, X., Sookhai-Mahadeo, S., Bader, J.S., Hieter, P., Spencer, F., and Boeke, J.D. 2004. A robust toolkit for functional profiling of the yeast genome. Mol. Cell 16 487–496. [PubMed]
  • Parsons, A.B., Brost, R.L., Ding, H., Li, Z., Zhang, C., Sheikh, B., Brown, G.W., Kane, P.M., Hughes, T.R., and Boone, C. 2004. Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways. Nat. Biotechnol. 22 62–69. [PubMed]
  • Pegg, A.E. 1984. Properties of the O6-alkylguanine-DNA repair system of mammalian cells. IARC Sci. Publ. 575–580. [PubMed]
  • Pontius, J.U., Wagner, L., and Schuler, G.D. 2003. UniGene: A unified view of the transcriptome. National Center for Biotechnology Information, Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.857
  • Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33 D501–D504. [PMC free article] [PubMed]
  • Ranish, J.A., Hahn, S., Lu, Y., Yi, E.C., Li, X.J., Eng, J., and Aebersold, R. 2004. Identification of TFB5, a new component of general transcription and DNA repair factor IIH. Nat. Genet. 36 707–713. [PubMed]
  • Rodriguez-Navarro, S., Fischer, T., Luo, M.J., Antunez, O., Brettschneider, S., Lechner, J., Perez-Ortin, J.E., Reed, R., and Hurt, E. 2004. Sus1, a functional component of the SAGA histone acetylase complex and the nuclear pore-associated mRNA export machinery. Cell 116 75–86. [PubMed]
  • Sobering, A.K., Romeo, M.J., Vay, H.A., and Levin, D.E. 2003. A novel Ras inhibitor, Eri1, engages yeast Ras at the endoplasmic reticulum. Mol. Cell. Biol. 23 4983–4990. [PMC free article] [PubMed]
  • Tercero, J.A. and Diffley, J.F. 2001. Regulation of DNA replication fork progression through damaged DNA by the Mec1/Rad53 checkpoint. Nature 412 553–557. [PubMed]
  • Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K. 1995. Serial analysis of gene expression. Science 270 484–487. [PubMed]
  • Velculescu, V., Zhang, L., Zhou, W., Vogelstein, J., Basrai, M.A., Bassett Jr., D.E., Hieter, P., Vogelstein, B., and Kinzler, K. 1997. Characterization of the yeast transcriptome. Cell 88 243–251. [PubMed]
  • Wach, A., Brachat, A., Pohlmann, R., and Philippsen, P. 1994. New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10 1793–1808. [PubMed]
  • Weinert, T.A., Kiser, G.L., and Hartwell, L.H. 1994. Mitotic checkpoint genes in budding yeast and the dependence of mitosis on DNA replication and repair. Genes & Dev. 8 652–665. [PubMed]
  • Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., et al. 2005. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33 D39–D45. [PMC free article] [PubMed]
  • Winzeler, E.A., Shoemaker, D.D., Astromoff, A., Liang, H., Anderson, K., Andre, B., Bangham, R., Benito, R., Boeke, J.D., Bussey, H., et al. 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285 901–906. [PubMed]
  • Xiao, W., Chow, B.L., and Rathgeber, L. 1996. The repair of DNA methylation damage in Saccharomyces cerevisiae. Curr. Genet. 30 461–468. [PubMed]
  • Zhao, X., Muller, E.G., and Rothstein, R. 1998. A suppressor of two essential checkpoint genes identifies a novel protein that negatively affects dNTP pools. Mol. Cell 2 329–340. [PubMed]
  • Zhou, B.B. and Elledge, S.J. 2000. The DNA damage response: Putting checkpoints in perspective. Nature 408 433–439. [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Gene
    Gene links
  • GEO Profiles
    GEO Profiles
    Related GEO records
  • HomoloGene
    HomoloGene links
  • MedGen
    Related information in MedGen
  • Pathways + GO
    Pathways + GO
    Pathways, annotations and biological systems (BioSystems) that cite the current article.
  • Protein
    Published protein sequences
  • PubMed
    PubMed citations for these articles
  • Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree