![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||
Copyright Zhang, Gladyshev. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Trends in Selenium Utilization in Marine Microbial World Revealed through the Analysis of the Global Ocean Sampling (GOS) Project Department of Biochemistry, University of Nebraska, Lincoln, Nebraska, United States of America Paul M. Richardson, Editor Progentech, United States of America * E-mail: vgladyshev1/at/unl.edu Conceived and designed the experiments: YZ VG. Performed the experiments: YZ. Analyzed the data: YZ VG. Wrote the paper: YZ VG. Received February 22, 2008; Accepted May 12, 2008. This article has been cited by other articles in PMC.Abstract Selenium is an important trace element that occurs in proteins in the form of selenocysteine (Sec) and in tRNAs in the form of selenouridine. Recent large-scale metagenomics projects provide an opportunity for understanding global trends in trace element utilization. Herein, we characterized the selenoproteome of the microbial marine community derived from the Global Ocean Sampling (GOS) expedition. More than 3,600 selenoprotein gene sequences belonging to 58 protein families were detected, including sequences representing 7 newly identified selenoprotein families, such as homologs of ferredoxin–thioredoxin reductase and serine protease. In addition, a new eukaryotic selenoprotein family, thiol reductase GILT, was identified. Most GOS selenoprotein families originated from Cys-containing thiol oxidoreductases. In both Pacific and Atlantic microbial communities, SelW-like and SelD were the most widespread selenoproteins. Geographic location had little influence on Sec utilization as measured by selenoprotein variety and the number of selenoprotein genes detected; however, both higher temperature and marine (as opposed to freshwater and other aquatic) environment were associated with increased use of this amino acid. Selenoproteins were also detected with preference for either environment. We identified novel fusion forms of several selenoproteins that highlight redox activities of these proteins. Almost half of Cys-containing SelDs were fused with NADH dehydrogenase, whereas such SelD forms were rare in terrestrial organisms. The selenouridine utilization trait was also analyzed and showed an independent evolutionary relationship with Sec utilization. Overall, our study provides insights into global trends in microbial selenium utilization in marine environments. Author Summary Selenium (Se) is an essential micronutrient due to its requirement for biosynthesis and function of the 21st amino acid, selenocysteine (Sec). Sec is found in the active sites of selenoproteins, most of which exhibit redox function, in all three domains of life. In recent years, genome sequencing projects provided a large volume of nucleotide and protein sequence information. Identification of complete sets of selenoproteins (selenoproteomes) of individual organisms and environmental samples is important for better understanding of Se utilization, biological functions of this element, and changes in Se use during evolution. Here, we describe a comprehensive analysis of the selenoproteome of the microbial marine community derived from the Global Ocean Sampling (GOS) expedition. More than 3,600 selenoprotein gene sequences belonging to 58 protein families were detected and analyzed. Our study generated the largest selenoproteome reported to date and provided important insights into microbial Se utilization and its evolutionary trends in marine environments. Introduction Selenium (Se) is an essential trace element that exerts a number of health benefits yet is required only in small amounts [1]–[3]. It is incorporated into selenoproteins, many of which are important antioxidant enzymes, in all three domains of life, and occurs in these proteins in the form of selenocysteine (Sec), the twenty-first amino acid in the genetic code [4]–[6]. Sec insertion is specified by a UGA codon, which is normally read as a stop signal. The decoding of UGA as Sec requires a translational recoding process that reprograms in-frame UGA codons to serve as Sec codons [5]–[8]. The mechanisms of selenoprotein biosynthesis have been the subject of numerous studies [5], [7]–[12]. The translation of selenoprotein mRNAs requires both a cis-acting selenocysteine insertion sequence (SECIS) element, which is a hairpin structure residing in 3′-untranslated regions (3′-UTRs) of selenoprotein mRNAs in eukaryota and archaea, or immediately downstream of Sec-encoding UGA codons in bacteria [7], [13]–[16], and several trans-acting factors dedicated to Sec incorporation [7],[17]. In recent years, an increase in the number of genome sequencing projects combined with the rapidly emerging area of microbial metagenomics provided vast amount of gene and protein sequence data. However, selenoprotein genes are almost universally misannotated in these datasets because of the dual function of UGA codon. To address this problem, a variety of bioinformatics approaches have been developed and used for selenoprotein searches in both prokaryotes and eukaryotes [18]–[24]. With these programs, researchers successfully identified complete sets of selenoproteins (selenoproteomes) of individual organisms and environmental samples [20]–[26]. In early 2007, three papers from the J. Craig Venter Institute were published reporting the results of the first phase of the large-scale metagenomic sequencing project – Global Ocean Sampling (GOS) expedition, a comprehensive survey of bacterial, archaeal and viral diversity of the world's oceans [27]–[29]. The general objective of this project was to expand our understanding of the microbial world by studying the gene complement of marine microbial communities. A metagenomics approach was used to sequence DNA isolated from selected sites of the aquatic microbial world. The previous Sargasso Sea project [30], which reported environmental shotgun sequencing of marine microbes in the nutrient-limited Sargasso Sea, was considered as a pilot study for the subsequent GOS project. The GOS dataset encompasses 44 sequenced samples from diverse aquatic (largely marine) locations which contain a total of ~7.7 million sequencing reads and more than 8 billion nucleotides [29]. These data not only provide opportunities for the identification and characterization of genes, species and communities, but have potentially far-reaching implications for biological energy production, bioremediation, and creating solutions for reduction/management of greenhouse gas levels. Within this framework, identification and characterization of selenoproteins in such a huge metagenomic dataset can shed light on the roles of Se in marine microbial communities. Previously, we examined the microbial selenoproteome of the Sargasso Sea via searches for Sec/cysteine (Cys) pairs in homologous sequences [25]. This method performed well and further research has shown that it is reliable in identifying selenoproteins in both organism-specific and environmental genomes [24],[26],[31]. In this study, we utilized a similar approach to analyze the distribution and composition of marine selenoproteins in the GOS shotgun dataset. More than 3,600 selenoprotein genes were detected, which is ten times the number of selenoproteins in the Sargasso Sea study. Several novel prokaryotic selenoprotein families were predicted. We further analyzed the dataset in various ways deriving insights into global trends in Se utilization. Results General Features of the GOS Selenoproteome Computational analysis of 44 sequenced GOS samples identified 3,506 selenoprotein sequences that belonged to previously described selenoprotein families (Table 1, all sequences are available in supplemental Dataset S1). We also identified 58,225 Cys-containing homologs of these selenoproteins in the GOS sequences. Canonical correlation analysis of their occurrence based on sample size (i.e., total number of sequenced reads for each sample) showed a strong correlation between the number of Cys-containing homologs and sample size (correlation coefficient, CC, is 0.98), but selenoproteins showed a weak correlation (CC is 0.59), suggesting widely different utilization of Sec in GOS samples (Figure 1
It has been reported that GOS samples grouped based on sequence similarity and taxonomy correlate with environmental parameters of GOS sites, particularly with regard to water temperature and salinity [29]. We found that except for sample GS09, all selenoprotein-rich samples belonged to the marine “tropical & Sargasso” group which had an average sampling temperature at 25.5°C. Also, all samples from the Gulf of Mexico and Caribbean Sea (GS15–GS19) showed elevated levels of selenoproteins (Figure 2B Selenoproteins detected through the homology-based procedure (see details in Materials and Methods) belonged to 51 previously described selenoprotein families (Table 2, details are shown in Table S1). Most of these families had much more Cys-containing homologs than selenoproteins in the GOS dataset. All selenoprotein families previously detected in the Sargasso Sea were identified in the current GOS dataset, including prominent selenoproteins: SelW-like, selenophosphate synthetase (SelD), proline reductase PrdB subunit, peroxiredoxin (Prx), thioredoxin (Trx), glutaredoxin (Grx) and a variety of Prx-like/Trx-like/Grx-like proteins [25]. Other selenoproteins included a UGSC-containing protein (one of the major selenoprotein families in GOS samples, U is a one letter designation for Sec) and several selenoproteins identified in various metagenomic sequencing projects [26],[31]. In addition, we identified a large number of distant homologs of Prx-like/Trx-like selenoproteins. In order to analyze them against previously identified Prx-like/Trx-like proteins, we clustered these proteins into different subfamilies based on conserved domain classification (Pfam/COG), motif features and phylogenetic analysis. Several selenoproteins were represented by single sequences only, e.g., glycine reductase selenoprotein A (GrdA) and heterodisulfide reductase subunit A (HdrA). In this case, sequencing errors that generated in-frame TGA codons could not be excluded; however, the fact that they corresponded to known selenoproteins and also possessed strong SECIS elements strongly suggested that they were true selenoproteins. 20 selenoprotein families were represented by more than 40 selenoprotein sequences and accounted for more than 94% of all selenoprotein sequences. Similar to the selenoproteome of the Sargasso Sea, the most abundant selenoprotein families were SelW-like, SelD, UGSC-containing protein, Prx, PrdB, and different subfamilies of Prx-like/Trx-like/Grx-like proteins. The current version of GOS selenoproteome has become the largest selenoproteome identified to date, and its analysis greatly expands our understanding of Sec utilization in microbial marine communities.
Most selenoproteins with known function are oxidoreductases, and among 51 selenoprotein families detected in GOS samples, 33 (2887 sequences, 82.3%) were homologs of known thiol oxidoreductases or possessed Trx-like fold (Table 2). Many of these selenoproteins contained a conserved UxxC/UxxS/CxxU/TxxU redox motif. In a small number of known selenoprotein genes, new Sec positions were identified. For example, a new redox motif (CxxU) was detected in Trx-like 1 family (COG0526, TrxA, thiol-disulfide isomerase and thioredoxins) which normally contains a UxxC motif (i.e., in all previously identified sequences) (Figure 4A
To further investigate the relationship between occurrence of selenoprotein families and sample features (e.g., marine versus nonmarine), we analyzed the most abundant selenoprotein families in each GOS sample separately (Table 3). Excluding the samples containing a small number of selenoproteins (≤15), the majority selenoprotein families showed a similar occurrence in marine and nonmarine aquatic samples. In contrast, several selenoprotein families appeared to be differentially distributed. For example, SelW-like protein was generally the most abundant selenoprotein family in marine samples, whereas the UGSC-containing protein was most frequently utilized in nonmarine samples. As discussed above, salinity appears to be a factor that influences (perhaps indirectly) selenoprotein utilization. Figure 5
Identification of New Selenoproteins in GOS Samples Besides known selenoproteins, we identified 7 new selenoprotein families (Table 4, all sequences are available in supplemental Dataset S2). They were represented by 2–11 individual TGA-containing sequences except for a hypothetical protein GOS_C which had 74 selenoprotein sequences. Among 7 new families, four either contained a domain of known function or were homologous to protein families with known/predicted functions. Particularly interesting was identification of ferredoxin-thioredoxin reductase (FTR) catalytic subunit and trypsin-like serine protease homologs. FTR is a key enzyme of the ferredoxin/thioredoxin system, which catalyzes reduction of thioredoxins with light-generated electrons [34]–[36]. Two Cys residues constitute a redox-active disulfide bridge functional in the reduction of Trx [37]. We identified two FTR selenoprotein sequences, including one (JCVI_READ_1093012271142) which contained two predicted Sec residues exactly corresponding to the two redox-active Cys residues (Figure 6A
Trypsin is a well-known serine protease which catalyzes the hydrolysis of peptide bonds. No redox function has been reported for members of this family. We found 9 selenoprotein sequences containing the trypsin-like domain (COG5640, secreted trypsin-like serine protease) and the predicted Sec corresponded to a conserved Cys residue within this domain, suggesting a potential redox function for this Cys (Figure 6B
Distinguishing Marine Prokaryotic and Eukaryotic Selenoproteins Previous analyses revealed that several selenoprotein families occur in both prokaryotes and eukaryotes, e.g., SelW-like, GPx and deiodinase [25]. Recently, additional such selenoprotein families were identified, e.g., methionine sulfoxide reductase A (MsrA), Prx, SelL (a Prx-like protein), arsenite S-adenosylmethyltransferase (PRK11873, arsM) and several Prx-like/Trx-like proteins [31], [38]–[40]. Most eukaryotic species containing these selenoproteins are aquatic organisms (such as green algae and fish). In the GOS sequence dataset, more than 90% sequences are derived from bacteria whereas only 2.8% could be definitively assigned to the eukaryotic domain [27]. To distinguish bacterial and eukaryotic selenoproteins, we employed several approaches including phylogenetic analyses and investigation of eukaryotic SECIS elements. Our results suggested that all detected new and known selenoproteins that occur in both prokaryotes and eukaryotes could be assigned to the bacterial domain. In addition, several eukaryotic selenoproteins were detected in different GOS samples by homology analysis using known eukaryotic selenoproteins, including protein disulfide isomerase (PDI), SelM, SelT, SelU and thioredoxin reductase (data not shown). Although most of the reads containing these selenoprotein genes were too short to investigate the presence of eukaryotic SECIS element in 3′-UTR, phylogenetic analyses and the absence of bacterial SECIS elements suggested that these sequences are eukaryotic. Interestingly, a new eukaryotic selenoprotein family, gamma-interferon-inducible lysosomal thiol reductase (GILT), was also detected. GILT is a key enzyme to facilitate complete unfolding of proteins destined for lysosomal degradation by releasing structural constraints imposed by intra- and inter-chain disulfide bonds [41],[42]. No homologs of this protein are known in prokaryotes. In this study, we identified three selenoprotein sequences for this family. A eukaryotic SECIS element predicted by SECISearch [18] was found in the 3′-UTR of one selenoprotein gene, providing additional evidence that they are eukaryotic GILT selenoproteins. Multiple alignment of GILT sequences and the predicted eukaryotic SECIS element are shown in Figure 8
Novel Domain Fusions Involving Selenoproteins We identified novel domain fusions in several selenoprotein families. One example involved Prx that was fused with a distant homolog of PP2C-type phosphatase (smart00331, PP2C_SIG, Figure 9A
Additional examples of domain fusions are shown in Figure S1. Functions of most of these domains are not clear. However, as a rule, at least one conserved Cys was present in these sequences, suggesting a potential redox activity. For example, the UGSC-containing protein which likely has a Trx-like fold was fused with a conserved domain (designated Unknown_1, Figure S1A). Unknown_1 protein was also present in a limited number of aquatic organisms. Another example involved the fusion of a Prx-like 3 and Unknown_3 domain (Figure S1D). There were three conserved Cys residues in Unknown_3, including a conserved CxxC motif which may have a thiol-based redox function. Previously, we detected two fusions of SelD: (i) NADH dehydrogenase (COG1252, Ndh, FAD-containing subunit) fusion [32] and (ii) Cys sulfinate desulfinase (COG1104, NifS) fusion (unpublished data). The Ndh-SelD fusion proteins were detected in several bacteria most of which were aquatic organisms. Such fusions were also observed in several lower eukaryotes, such as in Ostreococcus. In all detected fusion sequences, a conserved CxxC motif was present in the predicted active site of the SelD domain. However, this motif is very rare (<5%) in single-domain SelD proteins. The NifS-SelD fusion was only detected in Geobacter sp. FRC-32 (an anaerobic, iron- and uranium-reducing deltaproteobacterium), and a CxxU motif was present in the active site of the SelD domain. Functions of the two fusion SelDs are not fully clear, but are expected to be involved in selenophosphate synthesis. In the GOS dataset, we detected hundreds of Ndh-SelD fusion proteins (all containing the CxxC motif), which accounted for approximately 40% of all detected Cys-containing SelDs. In contrast, no NifS-SelD fusion was detected. Interestingly, we found that ~5.6% of single-domain selenoprotein SelDs contained a CxxU motif. Figure 12
We also found several sequence reads containing two neighboring selenoprotein genes, including ten Prx/SelW sequences, one Prx/Prx-like 2 and one Prx-like 1/AhpD-like 2 sequences. Phylogenetic analysis showed that these Prx and SelW sequences were clustered in a small phylogenetic group, suggesting that they come from closely related organisms. Further analyses are needed to examine a possible functional link between these selenoproteins. Occurrence of the Selenouridine Utilization Trait in GOS Samples In some prokaryotes, Se (in the form of selenophosphate) is also used for biosynthesis of a modified tRNA nucleoside, 5-methylaminomethyl-2-selenouridine (mnm5Se2U), which is located in the wobble position of the anticodons of tRNALys, tRNAGlu, and tRNA1Gln [50]–[52]. The proposed function of mnm5Se2U involves codon-anticodon interactions that help base pair discrimination at the wobble position and/or translation efficiency [52],[53]. A 2-selenouridine synthase (YbbB) is necessary to replace a sulfur atom in 2-thiouridine in these tRNAs with selenium [54]. Here, we investigated the occurrence of YbbB to assess the selenouridine utilization trait in the GOS samples. A total of 865 YbbB genes were identified in GOS sequences. Occurrence of YbbB in individual samples is shown in Figure 13A
Discussion In recent years, a number of metagenomic sequencing projects were carried out that enabled researchers to identify genes in both abundant and non-abundant microbes in a particular environment, providing a powerful tool to examine community organization and metabolism in natural microbial communities [30], [55]–[57]. Similarly, identification of selenoprotein genes in these datasets may help in understanding the role of Se in microbial populations. In this study, we have used shotgun data from a recent GOS expedition [27]–[29] to characterize the distribution and composition of the selenoproteome in this largest to date marine metagenomic dataset. Our results highlight importance of Se utilization within marine microbial communities and provide a comprehensive analysis of Se-dependent proteins which are utilized by marine microorganisms. The GOS project produced a total of 7.7 million random sequence reads from the North Atlantic Ocean, the Panama Canal, and East and central Pacific Ocean gyre. In order to identify all selenoproteins in the GOS dataset we employed a procedure that analyzed Sec/Cys pairs in homologous sequences. A total of 3,506 sequences which belonged to 51 previously described prokaryotic selenoprotein families, and 102 sequences that corresponded to 7 new selenoprotein families were identified. Compared to smaller scale non-aquatic metagenomic projects, such as whale fall community and Waseca County farm soil metagenome [56] and human distal gut microbiome [57], the GOS project produced hundreds of times more selenoproteins. Our current study generated by far the largest selenoproteome reported to date. If selenoproteins and their Cys-containing homologs are randomly used in marine microbes, the number of selenoproteins would be expected to be proportional to the number of sequence reads in GOS samples. However, whereas the correlation was good for Cys homologs, it was weak for selenoproteins. We normalized the occurrence of selenoproteins in each sample and found that all selenoprotein-rich samples originated from the sea water and almost all from the tropical sea areas. In contrast, half of the selenoprotein-poor samples were obtained from nonmarine aquatic environments (including fresh and hypersaline water), and half of the marine selenoprotein-poor samples came from temperate sea areas. Thus, our data suggest that water salinity and temperature may influence Sec utilization. However, the fact that the occurrence of selenoproteins in some samples collected from sites with similar temperature and salinity was somewhat different suggests that additional factors may also affect Sec utilization. Moreover, other features of GOS samples (e.g., water depth, fraction filter and light intensity) may also result in bias when comparing the samples. Among 51 previously characterized selenoprotein families, most were homologs of known thiol oxidoreductases or possessed Trx-like fold, consistent with the idea of redox function for selenoproteins in marine microorganisms. Twenty selenoprotein families, including SelW-like, SelD, Trx-like 1 and UGSC-containing proteins, were found to be the major selenoprotein families in GOS samples and represented approximately 95% of all detected selenoprotein sequences. Except for SelD, FdhA and UshA-like (COG0737, UshA, 5′-nucleotidase/2′,3′-cyclic phosphodiesterase and related esterases), all of these families contained conserved Cys-based redox motifs which are involved in a variety of redox functions. Comparison of the distributions of these major selenoprotein families in marine and nonmarine environments showed that a small number of selenoproteins exhibited significantly different occurrence in the two types of habitat. For example, SelW-like, DsbA 1, Prx-like 2, Prx-like 3 and Trx-like 3 were much more abundant in marine samples whereas UGSC-containing, AhpD-like 2 and Prx-like (UGC-containing) proteins were more abundant in nonmarine samples. Therefore, salinity and other factors affected the use of Sec, but this influence is not necessarily unidirectional and depends on specific selenoproteins affected. Seven new selenoprotein families were identified. Except for hypothetical protein GOS_C, which was represented by 74 selenoprotein sequences in the GOS dataset, occurrence of other new selenoprotein families was limited. Among these new families, FTR is a well-characterized enzyme involved in disulfide reduction in Trx. However, previous studies could not detect any Sec-containing form for this enzyme. In addition, several Sec-containing sequences were predicted for a trypsin-like family, suggesting a potential redox function for a particular Cys residue in this well-known serine protease family. Although functions of other new families are unclear, the fact that a CxxU motif was present in both FmdB putative regulatory protein family and putative secreted serine protease MucD, and that a UxxC motif was present in a hypothetical protein GOS_C, implied a thiol-related redox function. It has been reported that a small fraction (less than 3%) of reads in the GOS dataset is of eukaryotic origin (e.g., small-sized green algae). We did detect several eukaryotic selenoproteins, including a new selenoprotein family, GILT. Homologs of this protein family were only detected in eukaryotes. A eukaryotic SECIS element was detected in the 3′-UTR in one selenoprotein sequence. Although eukaryotic organisms containing the Sec-containing GILT are not known, future studies will likely identify such organisms. Domain fusions could help identify functionally-related proteins. We identified several new fusion events involving selenoproteins. Compared to their more common forms present in most organisms, these selenoproteins contained additional upstream or downstream domains fused into a single protein chain. Fusion events were observed for a variety of Trx-fold-containing selenoproteins, including Prx, Prx-like 2, Prx-like 3 and UGSC-containing protein. Function of most of these fused domains is not clear; however, single or multiple conserved Cys residues were present in these domains, suggesting a potential redox function of these residues. In addition, almost half of the Cys-containing SelDs detected in the current GOS dataset were Ndh-SelD fusion proteins, all of which contained a conserved CxxC motif in the active sites. The abundance of Ndh-SelD fusion proteins in GOS samples suggests that this fusion SelD plays an important role in selenophosphate biosynthesis in marine/aquatic organisms. Given that Se is also utilized for biosynthesis of selenouridine in bacteria, distribution of the selenouridine trait was assessed by analyzing occurrence of YbbB in GOS samples. We identified selenouridine-rich and selenouridine-poor samples, which were not the same as Sec-rich/poor samples, suggesting that the two Se utilization traits are functionally independent (but of course both depend on supply of Se). This observation is consistent with the previous hypothesis that Sec and selenouridine utilization traits are relatively independent even though both traits require SelD for selenophosphate biosynthesis [32]. In addition, no strong relationship was found between selenouridine utilization and habitat types (marine or nonmarine) or geographic location. Although both Se traits require Se supply or thus could influence evolution of each other, additional factors appear to play more important roles in the evolution and utilization of individual Se utilization traits. In this study, we report a comprehensive analysis of Sec utilization in marine microbial samples of the GOS expedition by characterizing the GOS selenoproteome. This is the first time that the microbial selenoprotein population is described in a global biogeographical context. Our analysis yielded the largest selenoprotein dataset to date, provided a variety of new insights into Sec utilization and revealed environmental factors that influence Sec utilization in the marine microbial world. Materials and Methods GOS Sequence Resource Shotgun reads of the GOS project were downloaded from the CAMERA (Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis) website at http://camera.calit2.net. This dataset contains a total of 7,709,422 genomic sequences derived from 57 samples (13 samples are not fully sequenced), which cover a wide range of distinct surface marine environments as well as a few nonmarine aquatic samples [29]. The genomic sequences combined had 8.148 billion nucleotides. In addition, we downloaded the non-redundant (NR) protein database from the NCBI ftp server. It contained a total of 4,644,764 protein sequences. BLAST [58] was also obtained from the NCBI. High Throughput Computing Resource Previously, we developed and employed a set of programs for automated selenoprotein searches [24]–[26]. However, since this approach is based on an exhaustive search of all possible Cys/Sec pairs for each Cys-containing sequence in the NR database, the computation procedure can become very expensive when the target sequence dataset is very large, as is the case in the GOS database. Therefore, we utilized an Open Science Grid (OSG) management system which is dedicated to supporting scientific research through the use of advanced networking technology and high performance computing [59]. We employed Condor-G software [60], a powerful and full-featured task broker, to manage such a high throughput computing project on large collections of distributively owned computing resources. In addition, we used the Prairiefire, a 128-node, 256-processor Beowulf cluster supercomputer at the Research Computing Facility of the University of Nebraska – Lincoln. Search Procedure We used a strategy which we had successfully used in selenoprotein searches in other metagenomic datasets: Sargasso Sea and symbiotic microbial consortium of the marine oligochaete Olavius algarvensis [24]–[26]. Briefly, each Cys-containing sequence in the NR protein database was searched against the GOS dataset for top 1000 homologs using TBLASTN with E-value below 10 (this step is the most time-consuming and was performed completely on the OSG system). Cys/TGA pairs were then selected and a minimum open reading frame (ORF) for each TGA-containing nucleotide sequence (TGA was translated to Sec, U) was obtained. After that, BLASTX and RPS-BLAST programs were used to analyze the conservation of TGA-flanking regions in all six reading frames as well as the presence of conserved domains. Remaining sequences were clustered based on sequence similarity and location of predicted Sec using BL2SEQ with an E-value below 10−4. All clusters were further searched against NCBI NR protein and microbial genomic databases to identify conserved Cys-containing homologs. Sequences in the remaining clusters were manually analyzed for occurrence of bacterial SECIS elements using bSECISearch program [21], and were classified into known selenoproteins and candidate selenoproteins (i.e., clusters having at least two Sec-containing sequences). In addition, an independent BLAST homology search for selected Sec-containing representatives of all previously identified prokaryotic selenoprotein families was performed. Finally, distinct representatives of all identified selenoprotein sequences were used to iteratively search against the GOS dataset for identification of additional distant Sec-containing homologs. Multiple Sequence Alignment and Phylogenetic Analysis We used CLUSTALW [61] with default parameters for multiple sequence alignments. Phylogeny was analyzed by PHYLIP programs [62]. Neighbor-joining (NJ) trees were obtained with NEIGHBOR and the most parsimonious trees were determined with PROTPARS. Robustness of these phylogenies was evaluated by two additional algorithms: maximum likelihood (ML) analysis with PHYML [63] and Bayesian estimation of phylogeny with MrBayes [64]. Figure S1 Additional fusion selenoproteins. A. UGSC/Unknown_1 fusion; B. Prx-like 2/Distant Secretin_N fusion; C. Unknown_2/Prx-like 2 fusion; D. Prx-like 3/Unknown_3 fusion. Only the alignments of fused domains are shown. The conserved Cys residues in different domains are highlighted in pink background. (0.11 MB PDF) Click here for additional data file.(103K, pdf) Table S1 Distribution of selenoproteins in individual samples. (0.04 MB XLS) Click here for additional data file.(39K, xls) Dataset S1 Sequences of homologs of known selenoproteins. (0.60 MB TXT) Click here for additional data file.(582K, txt) Dataset S2 Sequences of new selenoproteins. (0.02 MB TXT) Click here for additional data file.(23K, txt) Acknowledgments We thank the Research Computing Facility of the University of Nebraska – Lincoln for the use of Prairiefire supercomputer, and Brian Bockelman and Dr. Alexey Lobanov for help in utilizing Condor-G client tools for OSG high performance computing. We also thank Dr. Dmitri Fomenko for comments on the manuscript. Footnotes The authors have declared that no competing interests exist. This work was supported by NIH grant GM061603. References 1. Hatfield DL, Berry MJ, Gladyshev VN. Selenium: a historical perspective. In: Hatfield DL, Berry MJ, Gladyshev VN, editors. Selenium: Its molecular biology and role in human health, 2nd edition. New York: Springer; 2006. pp. 1–6. 2. Thomson CD. Assessment of requirements for selenium and adequacy of selenium status: a review. Eur J Clin Nutr. 2004;58:391–402. [PubMed] 3. Dodig S, Cepelak I. The facts and controversies about selenium. Acta Pharm. 2004;54:261–276. [PubMed] 4. Stadtman TC. Selenocysteine. Annu Rev Biochem. 1996;65:83–100. [PubMed] 5. Driscoll DM, Copeland PR. Mechanism and regulation of selenoprotein synthesis. Annu Rev Nutr. 2003;23:17–40. [PubMed] 6. Papp LV, Lu J, Holmgren A, Khanna KK. From selenium to selenoproteins: synthesis, identity, and their role in human health. Antioxid Redox Signal. 2007;9:775–806. [PubMed] 7. Low S, Berry MJ. Knowing when not to stop: selenocysteine incorporation in eukaryotes. Trends Biochem Sci. 1996;21:203–208. [PubMed] 8. Böck A. Biosynthesis of selenoproteins–an overview. Biofactors. 2000;11:77–78. [PubMed] 9. Böck A, Forchhammer K, Heider J, Leinfelder W, Sawers G, et al. Selenocysteine: the 21st amino acid. Mol Microbiol. 1991;5:515–520. [PubMed] 10. Rother M, Resch A, Wilting R, Böck A. Selenoprotein synthesis in archaea. Biofactors. 2001;14:75–83. [PubMed] 11. Yuan J, Palioura S, Salazar JC, Su D, O'Donoghue P, et al. RNA-dependent conversion of phosphoserine forms selenocysteine in eukaryotes and archaea. Proc Natl Acad Sci U S A. 2006;103:18923–18927. [PubMed] 12. Xu XM, Carlson BA, Mix H, Zhang Y, Saira K, et al. Biosynthesis of selenocysteine on its tRNA in eukaryotes. PloS Biol. 2007;5:e4. [PubMed] 13. Böck A, Forchhammer K, Heider J, Baron C. Selenoprotein synthesis: an expansion of the genetic code. Trends Biochem Sci. 1991;16:463–467. [PubMed] 14. Berry MJ, Banu L, Harney JW, Larsen PR. Functional characterization of the eukaryotic SECIS elements which direct selenocysteine insertion at UGA codons. EMBO J. 1993;12:3315–3322. [PubMed] 15. Thanbichler M, Böck A. The function of SECIS RNA in translational control of gene expression in Escherichia coli. EMBO J. 2002;21:6925–6934. [PubMed] 16. Liu Z, Reches M, Groisman I, Engelberg-Kulka H. The nature of the minimal ‘selenocysteine insertion sequence’ (SECIS) in Escherichia coli. Nucleic Acids Res. 1998;26:896–902. [PubMed] 17. Hatfield DL, Gladyshev VN. How selenium has altered our understanding of the genetic code. Mol Cell Biol. 2002;22:3565–3576. [PubMed] 18. Kryukov GV, Kryukov VM, Gladyshev VN. New mammalian selenocysteine-containing proteins identified with an algorithm that searches for selenocysteine insertion sequence elements. J Biol Chem. 1999;274:33888–33897. [PubMed] 19. Lescure A, Gautheret D, Carbon P, Krol A. Novel selenoproteins identified in silico and in vivo by using a conserved RNA structural motif. J Biol Chem. 1999;274:38147–38154. [PubMed] 20. Castellano S, Morozova N, Morey M, Berry MJ, Serras F, et al. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep. 2001;2:697–702. [PubMed] 21. Zhang Y, Gladyshev VN. An algorithm for identification of bacterial selenocysteine insertion sequence elements and selenoprotein genes. Bioinformatics. 2005;21:2580–2589. [PubMed] 22. Kryukov GV, Castellano S, Novoselov SV, Lobanov AV, Zehtab O, et al. Characterization of mammalian selenoproteomes. Science. 2003;300:1439–1443. [PubMed] 23. Castellano S, Novoselov SV, Kryukov GV, Lescure A, Blanco E, et al. Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep. 2004;5:71–77. [PubMed] 24. Kryukov GV, Gladyshev VN. The prokaryotic selenoproteome. EMBO Rep. 2004;5:538–543. [PubMed] 25. Zhang Y, Fomenko DE, Gladyshev VN. The microbial selenoproteome of the Sargasso Sea. Genome Biol. 2005;6:R37. [PubMed] 26. Zhang Y, Gladyshev VN. High content of proteins containing 21st and 22nd amino acids, selenocysteine and pyrrolysine, in a symbiotic deltaproteobacterium of gutless worm Olavius algarvensis. Nucleic Acids Res. 2007;35:4952–4963. [PubMed] 27. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PloS Biol. 2007;5:e16. [PubMed] 28. Kannan N, Taylor SS, Zhai Y, Venter JC, Manning G. Structural and functional diversity of the microbial kinome. PloS Biol. 2007;5:e17. [PubMed] 29. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PloS Biol. 2007;5:e77. [PubMed] 30. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. [PubMed] 31. Fomenko DE, Xing W, Adair BM, Thomas DJ, Gladyshev VN. High-throughput identification of catalytic redox-active cysteine residues. Science. 2007;315:387–389. [PubMed] 32. Zhang Y, Romero H, Salinas G, Gladyshev VN. Dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues. Genome Biol. 2006;7:R94. [PubMed] 33. Yutin N, Suzuki MT, Teeling H, Weber M, Venter JC, et al. Assessing diversity and biogeography of aerobic anoxygenic phototrophic bacteria in surface waters of the Atlantic and Pacific Oceans using the Global Ocean Sampling expedition metagenomes. Environ Microbiol. 2007;9:1464–1475. [PubMed] 34. Schürmann P. Redox signaling in the chloroplast: the ferredoxin/thioredoxin system. Antioxid Redox Signal. 2003;5:69–78. [PubMed] 35. Droux M, Jacquot JP, Miginac-Maslow M, Gadal P, Huet JC, et al. Ferredoxin-thioredoxin reductase, an iron-sulfur enzyme linking light to enzyme regulation in oxygenic photosynthesis: purification and properties of the enzyme from C3, C4, and cyanobacterial species. Arch Biochem Biophys. 1987;252:426–439. [PubMed] 36. Buchanan BB. The ferredoxin/thioredoxin system: a key element in the regulatory function of light in photosynthesis. Bioscience. 1984;34:378–383. [PubMed] 37. Dai S, Friemann R, Glauser DA, Bourquin F, Manieri W, et al. Structural snapshots along the reaction pathway of ferredoxin-thioredoxin reductase. Nature. 2007;448:92–96. [PubMed] 38. Kim HY, Fomenko DE, Yoon YE, Gladyshev VN. Catalytic advantages provided by selenocysteine in methionine-S-sulfoxide reductases. Biochemistry. 2006;45:13697–13704. [PubMed] 39. Lobanov AV, Fomenko DE, Zhang Y, Sengupta A, Hatfield DL, et al. Evolutionary dynamics of eukaryotic selenoproteomes: large selenoproteomes may associate with aquatic life and small with terrestrial life. Genome Biol. 2007;8:R198. [PubMed] 40. Shchedrina VA, Novoselov SV, Malinouski MY, Gladyshev VN. Identification and characterization of a selenoprotein family containing a diselenide bond in a redox motif. Proc Natl Acad Sci U S A. 2007;104:13919–13924. [PubMed] 41. Arunachalam B, Phan UT, Geuze HJ, Cresswell P. Enzymatic reduction of disulfide bonds in lysosomes: characterization of a gamma-interferon-inducible lysosomal thiol reductase (GILT). Proc Natl Acad Sci U S A. 2000;97:745–750. [PubMed] 42. Phan UT, Arunachalam B, Cresswell P. Gamma-interferon-inducible lysosomal thiol reductase (GILT). Maturation, activity, and mechanism of action. J Biol Chem. 2000;275:25907–25914. [PubMed] 43. Pané-Farré J, Lewis RJ, Stülke J. The RsbRST stress module in bacteria: a signaling system that may interact with different output modules. J Mol Microbiol Biotechnol. 2005;9:65–76. [PubMed] 44. Delumeau O, Dutta S, Brigulla M, Kuhnke G, Hardwick SW, et al. Functional and structural characterization of RsbU, a stress signaling protein phosphatase 2C. J Biol Chem. 2004;279:40927–40937. [PubMed] 45. Obuchowski M, Madec E, Delattre D, Boël G, Iwanicki A, et al. Characterization of PrpC from Bacillus subtilis, a member of the PPM phosphatase family. J Bacteriol. 2000;182:5634–5638. [PubMed] 46. Duncan L, Alper S, Arigoni F, Losick R, Stragier P. Activation of cell-specific transcription by a serine phosphatase at the site of asymmetric division. Science. 1995;270:641–644. [PubMed] 47. Vijay K, Brody MS, Fredlund E, Price CW. A PP2C phosphatase containing a PAS domain is required to convey signals of energy stress to the sigmaB transcription factor of Bacillus subtilis. Mol Microbiol. 2000;35:180–188. [PubMed] 48. Yang X, Kang CM, Brody MS, Price CW. Opposing pairs of serine protein kinases and phosphatases transmit signals of environmental stress to activate a bacterial transcription factor. Genes Dev. 1996;10:2265–2275. [PubMed] 49. Carniol K, Ben-Yehuda S, King N, Losick R. Genetic dissection of the sporulation protein SpoIIE and its role in asymmetric division in Bacillus subtilis. J Bacteriol. 2005;187:3511–3520. [PubMed] 50. Ching WM, Alzner-DeWeerd B, Stadtman TC. A selenium-containing nucleoside at the first position of the anticodon in seleno-tRNAGlu from Clostridium sticklandii. Proc Natl Acad Sci U S A. 1985;82:347–350. [PubMed] 51. Wittwer AJ, Ching WM. Selenium-containing tRNA(Glu) and tRNA(Lys) from Escherichia coli: purification, codon specificity and translational activity. Biofactors. 1989;2:27–34. [PubMed] 52. Romero H, Zhang Y, Gladyshev VN, Salinas G. Evolution of selenium utilization traits. Genome Biol. 2005;6:R66. [PubMed] 53. Kramer GF, Ames BN. Isolation and characterization of a selenium metabolism mutant of Salmonella typhimurium. J Bacteriol. 1988;170:736–743. [PubMed] 54. Wolfe MD, Ahmed F, Lacourciere GM, Lauhon CT, Stadtman TC, et al. Functional diversity of the rhodanese homology domain: the Escherichia coli ybbB gene encodes a selenophosphate-dependent tRNA 2-selenouridine synthase. J Biol Chem. 2004;279:1801–1809. [PubMed] 55. Hallam SJ, Putnam N, Preston CM, Detter JC, Rokhsar D, et al. Reverse methanogenesis: testing the hypothesis with environmental genomics. Science. 2004;305:1457–1462. [PubMed] 56. Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, et al. Comparative metagenomics of microbial communities. Science. 2005;308:554–557. [PubMed] 57. Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. [PubMed] 58. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed] 59. Thain D, Tannenbaum T, Livny M. Condor and the Grid. In: Berman F, Hey AJG, Fox G, editors. Grid Computing: Making The Global Infrastructure a Reality. New York: John Wiley & Sons; 2003. pp. 299–332. 60. Frey J, Tannenbaum T, Foster I, Livny M, Tuecke S. Condor-G: A Computation Management Agent for Multi-Institutional Grids. 2001. In Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC10): 7–9 August, 2001; San Francisco, CA. 61. Higgins D, Thompson J, Gibson T, Thompson JD, Higgins DG, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. [PubMed] 62. Felsenstein J. PHYLIP – Phylogeny Inference Package (Version 3.2). Cladistics. 1989;5:164–166. 63. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003;52:696–704. [PubMed] 64. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||
Acta Pharm. 2004 Dec; 54(4):261-76.
[Acta Pharm. 2004]Annu Rev Biochem. 1996; 65():83-100.
[Annu Rev Biochem. 1996]Antioxid Redox Signal. 2007 Jul; 9(7):775-806.
[Antioxid Redox Signal. 2007]Annu Rev Nutr. 2003; 23():17-40.
[Annu Rev Nutr. 2003]Biofactors. 2000; 11(1-2):77-8.
[Biofactors. 2000]J Biol Chem. 1999 Nov 26; 274(48):33888-97.
[J Biol Chem. 1999]EMBO Rep. 2004 May; 5(5):538-43.
[EMBO Rep. 2004]EMBO Rep. 2001 Aug; 2(8):697-702.
[EMBO Rep. 2001]Nucleic Acids Res. 2007; 35(15):4952-63.
[Nucleic Acids Res. 2007]PLoS Biol. 2007 Mar; 5(3):e16.
[PLoS Biol. 2007]PLoS Biol. 2007 Mar; 5(3):e77.
[PLoS Biol. 2007]Science. 2004 Apr 2; 304(5667):66-74.
[Science. 2004]Genome Biol. 2005; 6(4):R37.
[Genome Biol. 2005]EMBO Rep. 2004 May; 5(5):538-43.
[EMBO Rep. 2004]Nucleic Acids Res. 2007; 35(15):4952-63.
[Nucleic Acids Res. 2007]Science. 2007 Jan 19; 315(5810):387-9.
[Science. 2007]PLoS Biol. 2007 Mar; 5(3):e77.
[PLoS Biol. 2007]Genome Biol. 2006; 7(10):R94.
[Genome Biol. 2006]Environ Microbiol. 2007 Jun; 9(6):1464-75.
[Environ Microbiol. 2007]Genome Biol. 2005; 6(4):R37.
[Genome Biol. 2005]Nucleic Acids Res. 2007; 35(15):4952-63.
[Nucleic Acids Res. 2007]Science. 2007 Jan 19; 315(5810):387-9.
[Science. 2007]Antioxid Redox Signal. 2003 Feb; 5(1):69-78.
[Antioxid Redox Signal. 2003]Bioscience. 1984 Jun; 34(6):378-83.
[Bioscience. 1984]Nature. 2007 Jul 5; 448(7149):92-6.
[Nature. 2007]Genome Biol. 2005; 6(4):R37.
[Genome Biol. 2005]Science. 2007 Jan 19; 315(5810):387-9.
[Science. 2007]Biochemistry. 2006 Nov 21; 45(46):13697-704.
[Biochemistry. 2006]Proc Natl Acad Sci U S A. 2007 Aug 28; 104(35):13919-24.
[Proc Natl Acad Sci U S A. 2007]PLoS Biol. 2007 Mar; 5(3):e16.
[PLoS Biol. 2007]Proc Natl Acad Sci U S A. 2000 Jan 18; 97(2):745-50.
[Proc Natl Acad Sci U S A. 2000]J Biol Chem. 2000 Aug 25; 275(34):25907-14.
[J Biol Chem. 2000]J Biol Chem. 1999 Nov 26; 274(48):33888-97.
[J Biol Chem. 1999]J Mol Microbiol Biotechnol. 2005; 9(2):65-76.
[J Mol Microbiol Biotechnol. 2005]J Biol Chem. 2004 Sep 24; 279(39):40927-37.
[J Biol Chem. 2004]J Bacteriol. 2000 Oct; 182(19):5634-8.
[J Bacteriol. 2000]J Bacteriol. 2005 May; 187(10):3511-20.
[J Bacteriol. 2005]Genome Biol. 2006; 7(10):R94.
[Genome Biol. 2006]Proc Natl Acad Sci U S A. 1985 Jan; 82(2):347-50.
[Proc Natl Acad Sci U S A. 1985]Genome Biol. 2005; 6(8):R66.
[Genome Biol. 2005]J Bacteriol. 1988 Feb; 170(2):736-43.
[J Bacteriol. 1988]J Biol Chem. 2004 Jan 16; 279(3):1801-9.
[J Biol Chem. 2004]Genome Biol. 2006; 7(10):R94.
[Genome Biol. 2006]Science. 2004 Apr 2; 304(5667):66-74.
[Science. 2004]Science. 2004 Sep 3; 305(5689):1457-62.
[Science. 2004]Science. 2006 Jun 2; 312(5778):1355-9.
[Science. 2006]PLoS Biol. 2007 Mar; 5(3):e16.
[PLoS Biol. 2007]PLoS Biol. 2007 Mar; 5(3):e77.
[PLoS Biol. 2007]Science. 2005 Apr 22; 308(5721):554-7.
[Science. 2005]Science. 2006 Jun 2; 312(5778):1355-9.
[Science. 2006]Genome Biol. 2006; 7(10):R94.
[Genome Biol. 2006]PLoS Biol. 2007 Mar; 5(3):e77.
[PLoS Biol. 2007]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]EMBO Rep. 2004 May; 5(5):538-43.
[EMBO Rep. 2004]Nucleic Acids Res. 2007; 35(15):4952-63.
[Nucleic Acids Res. 2007]EMBO Rep. 2004 May; 5(5):538-43.
[EMBO Rep. 2004]Nucleic Acids Res. 2007; 35(15):4952-63.
[Nucleic Acids Res. 2007]Bioinformatics. 2005 Jun 1; 21(11):2580-9.
[Bioinformatics. 2005]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Syst Biol. 2003 Oct; 52(5):696-704.
[Syst Biol. 2003]Bioinformatics. 2003 Aug 12; 19(12):1572-4.
[Bioinformatics. 2003]