• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of aemPermissionsJournals.ASM.orgJournalAEM ArticleJournal InfoAuthorsReviewers
Appl Environ Microbiol. May 2008; 74(10): 3257–3265.
Published online Mar 31, 2008. doi:  10.1128/AEM.02720-07
PMCID: PMC2394961

Identification of Mobile Elements and Pseudogenes in the Shewanella oneidensis MR-1 Genome[down-pointing small open triangle]

Abstract

Shewanella oneidensis MR-1 is the first of 22 different Shewanella spp. whose genomes have been or are being sequenced and thus serves as the model organism for studying the functional repertoire of the Shewanella genus. The original MR-1 genome annotation revealed a large number of transposase genes and pseudogenes, indicating that many of the genome's functions may be decaying. Comparative analyses of the sequenced Shewanella strains suggest that 209 genes in MR-1 have in-frame stop codons, frameshifts, or interruptions and/or are truncated and that 65 of the original pseudogene predictions were erroneous. Among the decaying functions are that of one of three chemotaxis clusters, type I pilus production, starch utilization, and nitrite respiration. Many of the mutations could be attributed to members of 41 different types of insertion sequence (IS) elements and three types of miniature inverted-repeat transposable elements identified here for the first time. The high copy numbers of individual mobile elements (up to 71) are expected to promote large-scale genome recombination events, as evidenced by the displacement of the algA promoter. The ability of MR-1 to acquire foreign genes via reactions catalyzed by both the integron integrase and the ISSod25-encoded integrases is suggested by the presence of attC sites and genes whose sequences are characteristic of other species downstream of each site. This large number of mobile elements and multiple potential sites for integrase-mediated acquisition of foreign DNA indicate that the MR-1 genome is exceptionally dynamic, with many functions and regulatory control points in the process of decay or reinvention.

Shewanella oneidensis MR-1 (formerly Alteromonas putrefaciens and Shewanella putrefaciens) is a gammaproteobacterium best known for its respiratory versatility, including the ability to reduce metal oxides and radionuclides. Like many other members of the Shewanella genus, it can also grow aerobically or use any of a broad variety of organic (fumarate, trimethylamine oxide, dimethyl sulfoxide, and glycine) and inorganic (nitrate, elemental sulfur, thiosulfate, and sulfite) compounds as terminal electron acceptors in the absence of O2. This strain was originally isolated from anaerobic sediments of Oneida Lake in New York through the selective enrichment of bacteria that could respire Mn(IV) (19). Members of the Shewanella genus are frequently isolated from redox-stratified freshwater and marine environments, where they are proposed to play an important role in the geochemical cycling of nitrogen (5), metals (20), and sulfur (10, 18).

In order to facilitate the investigation of the underlying processes that enable these respiratory specialists to thrive in such environments, the chromosome and plasmid of S. oneidensis MR-1 were sequenced, the genes and functions were predicted (12), and the data were deposited in GenBank (accession no. AE014299 and AE014300). A second version of the protein-encoding-gene calls for the chromosome was later produced (accession no. NC_004347) based on an alternative gene-calling strategy that resulted in the removal of 429 protein-encoding-gene predictions and the addition of 108 new ones (7). These predictions serve as an essential resource for hypothesis development and the interpretation of experimental results produced by numerous institutions around the world that are conducting research on this bacterium. However, as is likely true of any genome annotation, especially those for species like S. oneidensis MR-1 that were among the first to have their genomes sequenced, there are numerous errors in gene calling and inaccurate functional predictions. As a primary objective of sequencing centers is to provide researchers with rapid access to sequence data, the subsequent refinement of the annotation has generally been left to the research community.

Maintenance of the genome annotation requires continuous fine-tuning of the predicted positions of coding sequences in the genomes and of the functions ascribed to them. The process involves the manual evaluation of results from automated bioinformatics analyses (e.g., sequence comparison, the detection of conserved domains, and the prediction of protein localization), extensive mining of the literature for evidence of functions of homologs, the review of experimental results produced from high-throughput analyses (microarray and global proteomics analyses), and physiological and biochemical characterization, both of the organism sequenced and of mutants. While these activities are interdependent, our first major effort was focused on improving gene and pseudogene predictions. In this study, we report the discovery of three repeated elements that we propose to be miniature inverted-repeat transposable elements (MITEs) that can be mobilized by transposases encoded by insertion sequence (IS) elements within the MR-1 genome, the mapping of the positions of over 200 IS elements, and changes to the predicted gene and pseudogene counts.

MATERIALS AND METHODS

Pseudogene annotations.

DNA sequences of pseudogenes identified in the original (12) and subsequent (7) annotations of the MR-1 genome were compared to entries in both the GenBank protein database and a local database that comprises all available (public and currently restricted) Shewanella protein sequences by using BlastX (1) to identify the closest homologs. Resulting sequence alignments were manually reviewed to assess whether the genes were disrupted and, if so, to approximate the position within each gene where a mutation leading to the premature truncation of the deduced protein had likely occurred. With so many Shewanella genome sequences available for comparison, it was typically relatively easy to pinpoint the precise position of gene disruption. Genes predicted to have one or more in-frame stop codons or frameshifts were then evaluated for sequencing errors by manually reviewing raw sequence chromatograms that corresponded to the disrupted gene positions. This task was accomplished by BlastN analysis of a segment of DNA sequence that spanned the disrupted sequence against the NCBI Trace Archive available in the specialized BLAST area at NCBI (http://www.ncbi.nlm.nih.gov/BLAST/BLAST.cgi). The chromatograms associated with resulting hits were then manually reviewed to determine whether a base-calling error had occurred at or near the positions of gene disruption.

Identification of mobile elements.

The termini of the IS elements were identified manually by employing several different strategies. First, each MR-1 transposase was assigned to a transposase family by BlastP analysis against protein sequences in the ISfinder database (http://www-is.biotoul.fr/) (25, 26). Based on information provided at the ISfinder site, the family identity could be used to predict the characteristics (e.g., the element size, the presence of direct repeats, and the expected sequences of insertion sites) of the associated IS element for each transposase. In rare instances in which an MR-1 transposase had high levels of sequence identity to an ISfinder entry, it was possible to precisely identify MR-1 IS termini by simply searching for terminal repeats that were located at positions equidistant from a transposase open reading frame (ORF) and had high levels of identity to IS element sequences deposited in ISfinder.

Artemis (24) was routinely used to record, view, and adjust assigned IS element positions within the MR-1 genome. Two types of data were typically used to approximate the positions of the IS termini. One involved identifying positions at which a flanking ORF was truncated or interrupted (as described above). The other involved using BlastN to analyze DNA sequences flanking the transposase gene for identity to sequences flanking identical transposase genes in MR-1 or similar genes in other Shewanella species. Where possible, elements encoding paralogous or orthologous transposases were aligned using T-coffee (21) and trimmed to produce a consensus IS (except in cases in which the element was disrupted). Where necessary, the IS termini were then further adjusted to include terminal inverted repeats, exclude flanking direct repeats, and conform to general characteristics expected for elements encoding transposases of the same family. A representative of each new type of IS element, except those that were degenerate, was deposited in the ISfinder database.

The termini of the MITEs were defined by identifying aligning repeated regions to define the core conserved repeat and, where possible, identifying positions of gene interruption or truncation. Once the terminal inverted repeats were identified, it was possible to identify additional shorter versions of these MITEs that either lacked a sequence between the inverted repeats or matched one end or the other of the cognate full-length MITE. Secondary structures of full-length MITEs were determined using Sfold, available at http://sfold.wadsworth.org/srna.pl (8).

Identification of candidate laterally transferred genes.

BlastP was utilized to identify best-hit matches in the nonredundant database. Matches to MR-1 were ignored, and the next best match was identified. Information regarding the phylogenetic origin of the top hit as well as sequence identity scores and protein sizes (for the query sequence and the hit) were extracted from the BlastP output. The DNA sequence of each MR-1 coding region was also analyzed using a custom Perl script to determine the percent G+C contents at the third position of the codons, and then the mean and standard deviation for all genes were determined. Genes adjacent to ISSod25 and the integron integrase gene were manually analyzed in Artemis for the presence of putative attC sites bounded by conserved 5′-RYYYAAC and 3′-GTTRRRY motifs. In addition, sequences of attC sites available in the literature were compared, via BlastN analysis, to the MR-1 genome to assist in the identification of additional attC sites.

Proteome analysis.

Global analyses of proteins using the accurate mass and time (AMT) tag technique have been described in detail previously (16). The current database for S. oneidensis MR-1 was created based on the use of a modified FASTA file containing sequences for 4,198 proteins, including 146 deduced from “repaired” pseudogenes. Two hundred thirty proteins belong to 24 paralogous families containing identical or near-identical sequences. Hence, the count of 4,198 includes one representative of each of these paralogous families, making it easier to identify peptides that uniquely identify either a single protein or a group of nearly identical proteins. Also not included in this file are translations for 54 pseudogenes that are either represented by an intact paralog or are highly degenerate and five newly identified or otherwise missing genes (SO_0461, SO_4814, SO_4816, SO_4817, and SO_A0186). The AMT tag database included 1,545 data sets containing tandem mass spectrometry (MS-MS) data from a combination of linear trap quadrupole (LTQ), LTQ-Fourier transform, and LTQ Orbitrap instruments. These S. oneidensis MR-1 data sets were generated from samples collected from 33 different cultures prepared under different growth conditions. Included in this database were 88,040 peptide identifications associated with 3,579 proteins after filtering by previously defined methods (27). The SEQUEST score filters used included a minimum required discriminant score of 0.85 and a minimum required peptide length of 6 amino acids. For analyses discussed herein, only peptides which were observed at least three times were considered. At this observation count, a total of 3,118 proteins were represented by at least one unique peptide and 2,424 were represented by at least three unique peptides. An additional 128 proteins were also detected with at least one peptide (95 with three peptides) but are members of paralogous protein families with identical or near-identical sequences, making it impossible to distinguish from which genes they were expressed.

Accession numbers.

The updated gene annotations and genome sequence for the chromosome have been submitted to GenBank via the J. Craig Venter Institute under the original accession number (AE014299) assigned to this organism. The plasmid annotation updates were submitted by The Institute for Genome Research under the original accession number (AE014300) over 1 year ago, but the annotation has changed since that time, and therefore, it is suggested that the reader use data provided in the supplemental material to obtain current genome locations and annotations for genes described herein.

RESULTS

Identification of erroneous pseudogene annotations.

From the perspective of genome annotation, pseudogenes are simply genes that, compared to other genes with similar sequences, appear to comprise only fragments of the genes (with 5′- or 3′-terminal gene truncations), carry point mutations that would result in the premature termination of the product, or are interrupted by mobile elements (e.g., IS elements). Pseudogene prediction is a challenging process but is greatly enhanced by the availability of closely related genome sequences (14). Currently, genome sequences from 20 Shewanella strains (see Table S1 in the supplemental material) are available, making the assessment of pseudogene occurrence in MR-1 considerably more robust than that in many other organisms. We proceeded by first reanalyzing the 145 pseudogenes previously predicted in the original (12) and subsequent (7) annotations of MR-1. Comparisons of the protein sequences deduced from these genes suggest that 26 are not pseudogenes because orthologs of similar sizes have been found in other bacteria (usually Shewanella species), the gene could be split into two separate full-length genes, or signals for programmed frameshifting were detected (see Table S2 in the supplemental material). Further evidence that these genes encode functional proteins is demonstrated by the finding that the translated products of the majority of these genes have been detected by proteome analyses. The sequences of the remaining 119 pseudogenes were further reviewed by manually checking the raw sequence chromatograms for potential base-calling errors that may have occurred during high-throughput automated processing of raw sequence data. In addition, proteome data were mined for identified peptides corresponding to sequences that spanned or flanked the predicted mutations (11). This analysis led to the identification of sequencing mistakes for 13 predicted pseudogenes for which the elimination of these mistakes repaired the disrupted gene reading frames (see Table S2 in the supplemental material). In agreement with the RefSeq annotation, we dropped the SO_2003 pseudogene from the genome annotation based on the lack of good evidence that it encodes a protein. An additional 21 pseudogenes were also dropped because their coding sequences were joined to those of an upstream gene (e.g., genes interrupted by IS elements). Overall, this analysis reduced the original count of pseudogenes to 83 and increased the count of intact genes by 42.

Mapping the termini of IS elements and interrupted genes.

S. oneidensis MR-1 encodes a large number of transposases, suggesting that IS element interruption would likely account for many pseudogene predictions. While only 59 transposases were noted in the original MR-1 genome publication (12), a reassessment of gene functions suggests that a total of 219 genes encode transposases, 54 of which are in themselves pseudogenes and many of which are identical or nearly identical in sequence. Four of these genes (SO_0643/SO_0644 and SO_2654/SO_2655) are associated with the transposition of the two Mu prophages in MR-1 and, thus, are not considered further here as potential sources of gene disruption. The mapping of the termini of the IS elements was facilitated by the comparison of sequences that flank paralogous transposase genes in MR-1 or their orthologs in other sequenced Shewanella genomes and by the identification of the breakpoints in interrupted genes. By using this strategy, it was possible to predict the termini for all but four of the IS elements (see Table S3 in the supplemental material). Comparisons to transposase gene sequences deposited in the ISfinder database revealed that the diversity of IS elements carried by the MR-1 genome is quite broad, the over 40 types of IS elements found most closely matching 15 of the total of 19 IS families described in ISfinder.

All but six of these types of IS elements carry a single ORF predicted to encode the IS-mobilizing transposase. IS elements ISSod1, ISSod2, ISSod10, and ISSod15 each encode a transposase that is predicted to be activated by programmed translational frameshifting of the two ORFs found in the element, a frequently used strategy that limits the expression of transposase activity (2). ISSod9 is a class II transposon belonging to the Tn3 family and comprises four ORFs; there are two copies of ISSod9 on the MR-1 megaplasmid. Orthologs of the ISSod9 transposase occur in Shewanella sp. strain ANA-3, S. baltica OS155, and S. frigidimarina. Mapping of the conserved IS termini in each genome revealed that the five copies in strain OS155 encode only a transposase and a resolvase and therefore would be classified as an IS element, not a transposon. In the remaining strains, including MR-1, the IS element also encodes passenger proteins whose function is not related to IS element mobilization (see Fig. S1 in the supplemental material) and, hence, would be classified as a transposon. While the S. frigidimarina and Shewanella sp. strain ANA-3 transposons encode cation efflux pumps predicted to mediate resistance to Cd2+, Co2+, Zn2+, or Pb2+, the S. oneidensis MR-1 transposon encodes two functionally uncharacterized cytoplasmic proteins, one of which possesses a nucleotidyltransferase domain commonly found in kanamycin nucleotidyltransferases, suggesting that the acquired function may be related to antibiotic resistance.

The remaining IS element that comprises multiple ORFs is ISSod25, an IS91 family member which is found in five copies (one truncated) on the chromosome. This IS element encodes both a transposase and a phage integrase family protein and is flanked on the 5′ side by GAAC and on the 3′ side by CAAG (with two exceptions), as expected for members of this family. The element carrying SO_2035-SO_2036 (ISSod25_2) is of particular interest because it has previously been proposed to be a component of the MR-1 superintegron (9). A putative recombination site (attI) and integron integrase gene (SO_2037), whose activity was experimentally validated (9), are found immediately upstream of ISSod25_2, as well as immediately upstream of the three other full-length ISSod25 elements. Drouin et al. (9) identified three attC sites downstream of the SO_2037 integron integrase gene, one that was described as characteristic of S. oneidensis MR-1 [called attC (Son type) in reference 9], one that is similar to the VCR repeat in Vibrio cholerae [called attC (VCR-like) in reference 9] (6), and a third that resembles the 59-bp attC site associated with the aadA and aadB genes, which confer aminoglycoside resistance (22). This superintegron locus was identified by the analysis of sequences available prior to the final assembly of the genome sequence of S. oneidensis MR-1. However, in the final assembly, it is apparent that this region is no longer contiguous but is instead now split into two separate sites on the chromosome, each containing a copy of ISSod25 (Fig. (Fig.11).

FIG. 1.
(A) Map of ISSod25-associated integron adapted from Drouin et al. (9); (B) map of corresponding loci in the final genome assembly; and (C) map of loci adjacent to other ISSod25 elements. IS elements encoding transposases and integrases are depicted as ...

Analyses of sequences downstream of each of the ISSod25 elements revealed the presence of a putative attC site downstream of all five ISSod25 elements. All five ISSod5 elements, except ISSod25_4, also were adjacent to at least one additional gene with a putative 3′ attC site, suggesting that they may have been acquired via integrase activity. Furthermore, BlastP analysis of the proteins encoded by the genes adjacent to each attC site revealed that, with the exception of the top hit for the SO_2034-encoded protein, the top hits were not Shewanella proteins. In light of the large number of Shewanella genomes that have been sequenced, it is unusual for any MR-1 proteins (with the exception of transposases) to be more similar in sequence to proteins from another genus than to Shewanella proteins. Indeed, the majority of the top hits for all MR-1 proteins are other Shewanella proteins with greater than 90% identity (see Fig. S2a in the supplemental material), which suggests that it is likely that proteins for which top hits are proteins of different phylogenetic origins are encoded by genes that have been acquired by lateral transfer or are rapidly evolving. The SO_2034-encoded protein has only two homologs, the top hit (the S. baltica OS195 Sbal195_4260 protein) having a lower level of identity (81%) than most Shewanella top hits and the second-best hit (the Vibrio shilonii AK1 VSAK1_25380 protein) having a higher level identity (59%) than most non-Shewanella protein hits. An analysis of GC usage at codon position three also supports the hypothesis that genes downstream of ISSod25 have been acquired by lateral transfer (see Fig. S2b in the supplemental material).

The algA (SO_2213) gene, which is found downstream of ISSod25_4, is not predicted to have been acquired by integrase activity and is only 28 bp downstream of the ISSod25_4-associated attC site, suggesting that the native promoter for this gene was lost as a consequence of ISSod25_4 insertion at this site. Because algA is conserved, it was possible to investigate whether there was evidence to support this hypothesis by comparing the upstream sequences of the orthologous algA genes in other Shewanella strains to sequences in MR-1. Interestingly, we found the algA promoter locus upstream of ISSod25_3, rather than separated from algA by ISSod25_4 (see Fig. S3 in the supplemental material). This observation, combined with the results of neighborhood analyses around the algA locus of Shewanella, suggests that a recombination event between sequences upstream of ISSod25_3 and downstream of ISSod25_4 occurred, resulting in the displacement of the MR-1 algA promoter from the expected site downstream of ISSod25_4 to the position upstream of ISSod25_3. Eight different peptides of AlgA have been detected, suggesting that ISSod25_4 carries a promoter capable of controlling the expression of this gene in MR-1.

Several additional small 3′ fragments of ISSod25 occur in the genome at positions upstream of SO_0911, SO_4816, SO_1081, SO_1888, SO_3617, SO_3775, SO_4341, and SO_4704 and downstream of SO_3453. Again, genes found downstream of these ISSod25 fragments either are more similar to genes of species outside the Shewanella genus than to Shewanella genes or have low levels of similarity to Shewanella genes, suggesting that they may have been acquired by a mechanism similar to that of the acquisition of the genes found downstream of the full-length ISSod25 elements (see Table S4 in the supplemental material).

Other IS-like elements in the genome.

Three features characteristic of IS elements include the frequent occurrence of exact or near-exact copies throughout the genome, the presence of short terminal inverted repeats, and the interruption or truncation of genes. Identical conserved hypothetical proteins with a DUF1568 domain are encoded by 18 genes on the chromosome, suggesting that perhaps these proteins too are transposases or are associated with an IS element. Analyses of the regions that flank the genes encoding these proteins revealed that the proteins correspond to a conserved sequence that has terminal inverted repeats and, in seven instances, occurs adjacent to truncated genes. We therefore propose that this conserved sequence is a mobilizable element (ISSod41) and that either the associated DUF1568-containing proteins are involved in its mobilization or other transposases encoded by MR-1 are responsible for its mobilization. An unusual feature of this element is that it frequently occurs in pairs in the genome, sometimes even colocalized with an additional ISSod41 fragment. In addition, copies of a 59-bp sequence are found near the 5′ end of ISSod41 and between the pair of attC sites that resemble the V. cholerae VCR repeat and the 59-bp site adjacent to the aadA and aadB genes, respectively (Fig. (Fig.1B).1B). A perfect match to the conserved GTTRRRY integrase insert site is found near the 3′ end of this conserved 59-bp sequence (5′-GACACCCATCCTTAATAGTGCGGTAGTTAACCTCCTACTATGCTTTGGTTAAGCAT TGA; the matching sequence is in boldface). While this observation may be coincidental, it does raise the possibility that this site is an attI site that can serve as a site for the capture of foreign DNA. Indeed, many of the ISSod41 elements are adjacent to genes that are not conserved in Shewanella and have values for GC usage at codon position 3 that differ from the MR-1 mean value by at least 10% (see Table S4 in the supplemental material). These observations suggest that the potential roles of ISSod41- and ISSod25-encoded proteins in the acquisition of foreign genes into genomes warrant further study.

The analysis of the MR-1 genome for additional repeated DNA sequences revealed the presence of three elements, called MITEs, that have characteristics of the class II transposons (13); specifically, these MITEs are short (see Table S5 in the supplemental material), have no coding potential, and have the potential to form a stable RNA secondary structure (Fig. (Fig.2;2; see Fig. S2 in the supplemental material). A comparison of the terminal inverted repeats for these proposed MITEs with those of the MR-1 IS elements (Fig. (Fig.3)3) revealed that SonMITE_1, SonMITE_2, and SonMITE_3 termini are homologous to the termini of ISSod6, ISSod10, and ISSod22, respectively, suggesting that the respective transposases encoded by these IS elements may be able to mobilize the MITEs with similar terminal repeats. While no obvious gene disruptions resulting from SonMITE_2 insertion were found, representatives of both SonMITE_1 and SonMite_3 interrupt genes. SonMite1 interrupts five genes (SO_0790, SO_0793, SO_1591, SO_2158, and SO_4423) once and one gene (SO_3976) three times, and SonMite_3 interrupts one gene (SO_0911), further supporting the hypothesis that these elements can be mobilized by other transposases in MR-1. Several additional copies of SonMITE_1 overlap the 3′ ends of genes, by as much as 92 bp in the case of SO_2196 (see Table S5 in the supplemental material). However, comparative analysis with other Shewanella orthologs suggested that this overlap would result in no significant loss of encoded protein, and hence, we chose not to annotate these genes as being truncated by SonMITE_1.

FIG. 2.
Ensemble centroid structure for a full-length SonMITE_1 element. Structures of SonMITE_2 and SonMITE_3 are provided in Fig. S4a and b, respectively, in the supplemental material. ΔG°37, Gibbs free energy calculated at a folding temperature ...
FIG. 3.
The alignment of SonMITE and IS element termini demonstrates high levels of sequence identity. Asterisks indicate identical residues.

Overview of degenerate genes.

Mapping the S. oneidensis MR-1 IS elements and MITEs greatly enhanced our ability to identify additional truncated and interrupted genes. In all, a total of 207 degenerate genes (see Table S6 in the supplemental material) were identified and categorized as having frameshift (18 genes), in-frame stop codon (18 genes), truncation (59 genes), interruption (63 genes), or degenerate (48 genes) mutations. By definition, a pseudogene is an inactive gene derived from an ancestral active gene. Our sequence-based comparison provides merely a prediction of whether the gene is different from the ancestral gene and does not necessarily report on whether the gene or gene fragment is able to produce an active protein. Since the inability to express a protein is sometimes used as a criterion to define a pseudogene, we surveyed our S. oneidensis MR-1 AMT tag MS-MS database (17, 23) for evidence that any of these 207 genes have been translated into proteins. With the current total number of predicted protein-encoding genes at 4,400, the AMT tag database currently includes peptides (observed in at least three scans) that confirm the detection of 74% of the corresponding proteins by at least one peptide and 57% by at least three peptides. Thus, the database has sufficient coverage to evaluate the expression of pseudogenes.

Because some of the genes encode proteins that are identical or nearly identical to proteins produced by other genes in the cell (and hence have no unique peptides), 53 of the predicted pseudogenes could not be evaluated by this analysis. Of the remaining pseudogenes, 40 were matched to only one peptide, which is generally not considered sufficiently robust to validate protein expression (4). However, many of these single-hit peptides were observed in multiple MS-MS scans, suggesting that their parent proteins may in fact be expressed. It is also interesting that several of the peptides matched positions in the proteins that corresponded to sequences after the predicted mutation sites, suggesting that under at least some culture conditions, the cells were producing full-length proteins. If true, this observation would indicate either that the IS element had been excised from the site or that a subpopulation of cells lacking the interruption existed within the culture.

In addition to interrupting genes, IS element insertions can lead to the separation of promoters from nearby genes, thereby inactivating them. A total of 67 genes were identified as potentially being impacted by a nearby IS or MITE insertion event (see Table S8 in the supplemental material) at close proximity (~50 bp or less) to the 5′ gene end. High peptide counts were observed for proteins encoded by genes close to SonMITE_1, ISSod25, and ISSod10_3 elements, suggesting that these elements comprise promoters that can drive the expression of neighboring genes. The large chemotaxis operon (SO_2317-SO_2327), one of three found in MR-1 (15), is interrupted by ISSod4_17 (which, in turn, is interrupted by ISSod1_22) and ISSod4_18, which would suggest that this operon is nonfunctional. However, peptides from both the interrupted cheA_2 gene (SO_2320) and three of five of the downstream genes were detected, with peptides from CheA_2 corresponding to positions before and after the site of IS insertion (albeit only one peptide for each side, with each peptide observed only six to seven times). While these observations are not significant enough to validate the expression of this chemotaxis locus, they do suggest that this locus should not be regarded as being degenerate without further study.

As a result of this annotation refinement and additional assessment of coding potential, 685 genes were dropped from the annotation (see Table S7 in the supplemental material). Most of the genes that were dropped were small, with 399 predicted to encode polypeptides of less than 50 amino acids in length. Additional reasons to drop genes included the joining together of disrupted gene fragments (172 genes), an overlap with mobile elements (72 genes), an overlap with genes or other elements on the same or the opposite strand (111 genes), and the assessment that the genes were too close to or even overlapping a bidirectionally transcribed gene (81 genes). The majority of the remaining 249 genes were small (200 genes) and/or started with the rare TTG codon (100 genes) and, hence, were considered unlikely to encode proteins. It should be noted that 474 of these dropped genes are included in only one version (RefSeq or GenBank) of the MR-1 annotation, demonstrating the extent of difference in gene predictions that arises simply by employing different ORF-calling algorithms and cutoff criteria.

DISCUSSION

Our analysis revealed that the occurrence of pseudogenes and mobile elements in the S. oneidensis MR-1 genome is much more widespread than previously recognized. A total of 207 putative pseudogenes, 41 types of IS elements, and 3 different MITEs in the MR-1 genome were identified. Multiple copies of several of these mobile elements occur, suggesting that rearrangement (the duplication, inversion, or translocation of intervening DNA) of the MR-1 genome via homologous recombination may have been a frequent occurrence. Full-length copies of ISSod1, for example, occur 41 times on the chromosome and 6 times on the plasmid, while 71 full-length copies of SonMITE_1 occur on the chromosome. The presence of numerous truncated mobile elements and genes, as well as gene fusions (SO_3875 and SO_A0152), provides evidence that recombination events in the MR-1 genome have occurred previously. Even more compelling is our observation that the algA gene and promoter are adjacent to two different ISSod25 elements and are separated by over 47,000 bp. Whether this large number of potential recombination sites is advantageous to MR-1, providing a means to readily evolve new functions or alter gene expression, or is instead an indicator that the MR-1 genome is decaying is unknown. A cursory analysis of the other sequenced Shewanella genomes indicates that numerous IS elements are also present in other Shewanella spp., especially in the S. baltica strains. Strain OS155, for example, has at least 156 full-length or truncated IS elements. Most of the 22 different OS155 IS elements are repeated many times in the genome, with the highest copy number for a single IS element at 38. Hence, the high propensity for genome recombination is not unique to S. oneidensis MR-1.

Our analysis also revealed that the integron previously discovered in the partially assembled genome (9) is split into two different sites in the final genome assembly, each carrying a copy of ISSod25. An additional 2 full-length copies and 10 fragments of ISSod25 were also found in the genome. Surprisingly, most of these IS elements were in the 5′ direction from one or more putative attC sites and adjacent to genes with 3′ attC sites whose sequences were more characteristic of other bacterial phyla than of Shewanella species and that had 3′ attC sites. ISSod25-like elements are found in other Shewanella species, including all three S. putrefaciens strains and all S. baltica strains except OS155. Among these strains, homologs of the MR-1 integron integrase are present only in S. putrefaciens 200, S. baltica OS185, and S. baltica OS233. An analysis of the two ISSod25-like elements in S. putrefaciens W3-18-1, which lacks the integron integrase, revealed flanking attI/attC sites as well as downstream genes having 3′ attC sites. Numerous additional sites identical to the ISSod25 attC and VCR-associated attC repeats are also present in this genome, often in the 3′ direction from multiple adjacent genes. These observations lend additional credence to the hypothesis that the ISSod25 integrase can mediate the integration of foreign DNA into the MR-1 chromosome. They also demonstrate that several other, if not all, sequenced Shewanella spp. have a means to use integrase-mediated capture of foreign DNA.

Proteome data provided evidence that only 40 of the pseudogenes are translated into proteins (see Table S6 in the supplemental material). In most cases, however, both the number of unique peptides observed and the maximum number of times any one peptide was observed were low. This finding suggests that few, if any, of these genes have expressed significant levels of protein under the culture conditions used to generate the proteome sample. It is possible that the absence of peptides for these pseudogenes reflects simply experimental limitations, specifically, that conditions required for their expression have not been tested or that expression levels are below the level of detection. However, the currently available evidence is more indicative of these genes' no longer being functional. Over one-third of the pseudogenes encode a transposase or recombinase, and many of these sequences are fragments of full-length copies carried elsewhere in the genome. Just under 20% of the predicted functions of the remaining pseudogenes are associated with environmental sensing, the control of gene expression, or transport. In some instances, multiple genes within a single functional subsystem are mutated, providing a clear indication that entire systems are decaying. Examples include one of the three chemotaxis gene clusters present in the MR-1 genome, genes with functions associated with the degradation of starch, genes for C4-dicarboxylate sensing and uptake and nitrite respiration, and genes encoding components of the type I pilus (Table (Table11).

TABLE 1.
Selected degenerate functions in MR-1

During the refinement of the gene predictions for MR-1, the removal of overcalled genes, and the modification of the overall prediction of the pseudogene count, it became clear that identifying degenerate functions from the genome sequence required significant manual curation. Automated pseudogene predictions can be erroneous due to the presence of sequencing mistakes in the genome or misleading sequence comparisons, as was the case for 64 genes in MR-1. Mobile elements have been found in all the Shewanella genomes sequenced to date, often in high numbers with multiple duplications, as reported here for the MR-1 genome as well as elsewhere for other genomes (25). Because automated annotations do not account for the occurrence of these elements, the number of pseudogenes present in each genome is certainly underestimated. Furthermore, because the organization and content of the genome are clearly dynamic, we cannot be sure that these genes (or others in the genome) are not repaired through selective pressure mediated by laboratory culturing conditions or, conversely, further mutated or even that additional genes have not been inactivated. Indeed, the movement of an IS element in response to laboratory culture conditions has been documented previously, when the attempted Tn5 mutagenesis of a trimethylamine oxide respiratory function instead yielded three mutants in which the ISSod2 element was inserted in the locus encoding the trimethylamine oxide reductase (3). Furthermore, we have detected peptides from transposases encoded by ISSod1 (six peptides observed up to 58 times), ISSod4 (four peptides observed up to 41 times), and ISSod9 (four peptides observed up to 18 times). These findings suggest that the elements are being duplicated or mobilized under one or more conditions that have been used to grow and or maintain MR-1 and that a consequence of this event (the activation or inactivation of gene function) may impact the behavior of cells in a way that cannot be understood if one considers the genome sequence to be static.

Having mapped the mobile elements in S. oneidensis MR-1, we are now better poised to investigate their roles in the evolution of the MR-1 genome and to do the same with the other sequenced Shewanella genomes. Future research efforts that capitalize on the availability of sequences from related genomes to study genome evolution hold considerable promise for developing new insights into the roles of mobile elements and DNA recombination in the adaptation of organisms to their environment.

Supplementary Material

[Supplemental material]

Acknowledgments

We thank Jim Fredrickson for reviewing the manuscript, Miriam Land for providing and maintaining the database that we use to store the genome annotation, Bill Nelson and Anthony Durkin for formatting and depositing the updated plasmid and chromosome annotations, respectively, in GenBank, Natalia Maltsev and Dina Sulakhe for setting up and providing use of the Gnare annotation editor, and the U.S. Department of Energy (DOE) Joint Genome Institute for providing access to genome sequences for Shewanella genomes other than that of S. oneidensis MR-1.

Genome sequencing efforts were funded by the DOE Office of Biological and Environmental Research (OBER). This research was supported by the DOE OBER Genomics: Genomes to Life program. Proteomics analysis was performed at the W. R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored by OBER and located at Pacific Northwest National Laboratory.

Footnotes

[down-pointing small open triangle]Published ahead of print on 31 March 2008.

Supplemental material for this article may be found at http://aem.asm.org/.

REFERENCES

1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. [PubMed]
2. Baranov, P. V., O. Fayet, R. W. Hendrix, and J. F. Atkins. 2006. Recoding in bacteriophages and bacterial IS elements. Trends Genet. 22:174-181. [PubMed]
3. Bordi, C., C. Iobbi-Nivol, V. Mejean, and J. C. Patte. 2003. Effects of ISSo2 insertions in structural and regulatory genes of the trimethylamine oxide reductase of Shewanella oneidensis. J. Bacteriol. 185:2042-2045. [PMC free article] [PubMed]
4. Bradshaw, R. A., A. L. Burlingame, S. Carr, and R. Aebersold. 2006. Reporting protein identification data: the next generation of guidelines. Mol. Cell. Proteomics 5:787-788. [PubMed]
5. Brettar, I., R. Christen, and M. G. Hofle. 2002. Shewanella denitrificans sp. nov., a vigorously denitrifying bacterium isolated from the oxic-anoxic interface of the Gotland Deep in the central Baltic Sea. Int. J. Syst. Evol. Microbiol. 52:2211-2217. [PubMed]
6. Clark, C. A., L. Purins, P. Kaewrakon, and P. A. Manning. 1997. VCR repetitive sequence elements in the Vibrio cholerae chromosome constitute a mega-integron. Mol. Microbiol. 26:1137-1138. [PubMed]
7. Daraselia, N., D. Dernovoy, Y. Tian, M. Borodovsky, R. Tatusov, and T. Tatusova. 2003. Reannotation of Shewanella oneidensis genome. OMICS 7:171-175. [PubMed]
8. Ding, Y., C. Y. Chan, and C. E. Lawrence. 2004. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. 32:W135-W141. [PMC free article] [PubMed]
9. Drouin, F., J. Melancon, and P. H. Roy. 2002. The IntI-like tyrosine recombinase of Shewanella oneidensis is active as an integron integrase. J. Bacteriol. 184:1811-1815. [PMC free article] [PubMed]
10. Gralnick, J. A., H. Vali, D. P. Lies, and D. K. Newman. 2006. Extracellular respiration of dimethyl sulfoxide by Shewanella oneidensis strain MR-1. Proc. Natl. Acad. Sci. USA 103:4669-4674. [PMC free article] [PubMed]
11. Gupta, N., S. Tanner, N. Jaitly, J. N. Adkins, M. Lipton, R. Edwards, M. Romine, A. Osterman, V. Bafna, R. D. Smith, and P. A. Pevzner. 2007. Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation. Genome Res. 17:1362-1377. [PMC free article] [PubMed]
12. Heidelberg, J. F., I. T. Paulsen, K. E. Nelson, E. J. Gaidos, W. C. Nelson, T. D. Read, J. A. Eisen, R. Seshadri, N. Ward, B. Methe, R. A. Clayton, T. Meyer, A. Tsapin, J. Scott, M. Beanan, L. Brinkac, S. Daugherty, R. T. DeBoy, R. J. Dodson, A. S. Durkin, D. H. Haft, J. F. Kolonay, R. Madupu, J. D. Peterson, L. A. Umayam, O. White, A. M. Wolf, J. Vamathevan, J. Weidman, M. Impraim, K. Lee, K. Berry, C. Lee, J. Mueller, H. Khouri, J. Gill, T. R. Utterback, L. A. McDonald, T. V. Feldblyum, H. O. Smith, J. C. Venter, K. H. Nealson, and C. M. Fraser. 2002. Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nat. Biotechnol. 20:1118-1123. [PubMed]
13. Kidwell, M. G. 2002. Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49-63. [PubMed]
14. Lerat, E., and H. Ochman. 2005. Recognizing the pseudogenes in bacterial genomes. Nucleic Acids Res. 33:3125-3132. [PMC free article] [PubMed]
15. Li, J., M. F. Romine, and M. J. Ward. 2007. Identification and analysis of a highly conserved chemotaxis gene cluster in Shewanella species. FEMS Microbiol. Lett. 273:180-186. [PubMed]
16. Lipton, M. S., L. Pasa-Tolic', G. A. Anderson, D. J. Anderson, D. L. Auberry, J. R. Battista, M. J. Daly, J. Fredrickson, K. K. Hixson, H. Kostandarithes, C. Masselon, L. M. Markillie, R. J. Moore, M. F. Romine, Y. Shen, E. Stritmatter, N. Tolic', H. R. Udseth, A. Venkateswaran, K. K. Wong, R. Zhao, and R. D. Smith. 2002. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc. Natl. Acad. Sci. USA 99:11049-11054. [PMC free article] [PubMed]
17. Lipton, M. S., M. F. Romine, M. E. Monroe, D. A. Elias, L. Pasa-Tolic, G. A. Anderson, D. J. Anderson, J. Fredrickson, K. K. Hixson, C. Masselon, H. Mottaz, N. Tolic, and R. D. Smith. 2006. AMT tag approach to proteomic characterization of Deinococcus radiodurans and Shewanella oneidensis. Methods Biochem. Anal. 49:113-134. [PubMed]
18. Moser, D. P., and K. H. Nealson. 1996. Growth of the facultative anaerobe Shewanella putrefaciens by elemental sulfur reduction. Appl. Environ. Microbiol. 62:2100-2105. [PMC free article] [PubMed]
19. Myers, C. R., and K. H. Nealson. 1988. Bacterial manganese reduction and growth with manganese oxide as the sole electron acceptor. Science 240:1319-1321. [PubMed]
20. Nealson, K. H., and J. Scott. 2006. Ecophysiology of the genus Shewanella, p. 1133-1151. In M. Dworkin (ed.), The prokaryotes. Springer-Verlag, New York, NY.
21. Notredame, C., D. Higgins, and J. Heringa. 2000. T-Coffee: a novel method for multiple sequence alignments. J. Mol. Biol. 302:205-217. [PubMed]
22. Recchia, G. D., and R. M. Hall. 1997. Origins of the mobile gene cassettes found in integrons. Trends Microbiol. 5:389-394. [PubMed]
23. Romine, M. F., D. A. Elias, M. E. Monroe, K. Auberry, R. Fang, J. K. Fredrickson, G. A. Anderson, R. D. Smith, and M. S. Lipton. 2004. Validation of Shewanella oneidensis MR-1 small proteins by AMT tag-based proteome analysis. OMICS 8:239-254. [PubMed]
24. Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream, and B. Barrell. 2000. Artemis: sequence visualization and annotation. Bioinformatics 16:944-945. [PubMed]
25. Siguier, P., J. Filee, and M. Chandler. 2006. Insertion sequences in prokaryotic genomes. Curr. Opin. Microbiol. 9:526-531. [PubMed]
26. Siguier, P., J. Perochon, L. Lestrade, J. Mahillon, and M. Chandler. 2006. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 34:D32-D36. [PMC free article] [PubMed]
27. Strittmatter, E. F., L. J. Kangas, K. Petritis, H. M. Mottaz, G. A. Anderson, Y. Shen, J. M. Jacobs, D. G. Camp, and R. D. Smith. 2004. Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res. 3:760-769. [PubMed]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...