![]() | ![]() |
Formats:
|
||||||||||||||||||||
Copyright © 2009 Lu et al; licensee BioMed Central Ltd. 7TMRmine: a Web server for hierarchical mining of 7TMR proteins 1Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA 2Department of Biology, University of Nebraska at Omaha, Omaha, NE 68182, USA 3Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0660, USA 4Departments of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA 5Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA 6School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588-0118, USA 7Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588-0118, USA Corresponding author.Guoqing Lu: glu3/at/mail.unomaha.edu; Zhifang Wang: wangzfus/at/yahoo.com; Alan M Jones: alanjones/at/unc.edu; Etsuko N Moriyama: emoriyama2/at/unl.edu Received January 8, 2009; Accepted June 19, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. Description We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. Conclusion 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla. Background Seven-transmembrane-region containing receptors (7TMRs), often referred to as G protein-coupled receptors (GPCRs), constitute the largest receptor superfamily in vertebrates and other metazoans [1-3]. GPCRs, activated by a diverse array of ligands, are the central players in eukaryotic signal transduction and are involved in a wide variety of physiological processes. Mutations in genes encoding GPCRs are associated with major diseases (e.g., hypertension, cardiac dysfunction, depression, pain). Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes is an active endeavor of bioinformatics and pharmacogenomics research. However, efforts to identify all member proteins in this superfamily from diverse genomes are hindered by their extreme sequence divergence. In order to facilitate more sensitive and thorough mining, many computational methods, both alignment-based and alignment-free classification methods, were developed particularly for these proteins. Protein classification methods Computational methods of predicting protein functions rely on detecting similarities among proteins. The majority of protein classification methods rely on alignment to known protein sequences to identify the similarities and to build various forms of models (e.g., regular expression patterns [4], protein fingerprints [5], position-specific scoring matrices [6], and profile hidden Markov models [7]). However, generating reliable alignments of divergent candidate 7TMR sequences is practically not possible. Another disadvantage of alignment-based methods is that the resulting models are built only from known "positives" (protein sequences of interest) without incorporating information that discriminates positives from "negatives" (unrelated protein sequences). Consequently, these classifiers are affected by sampling bias, which is propagated and/or amplified during subsequent re-training. In contrast, alignment-free protein classification methods overcome these problems. Instead of alignments, various descriptors are extracted from each sequence (e.g., amino acid composition, dipeptide frequencies, and physico-chemical properties), and pattern recognition or multivariate statistical methods are trained to discriminate positive protein samples from negative samples. Our recent comparative analyses showed that alignment-free classifiers are more sensitive to remote similarities than alignment-based profile hidden Markov model (profile HMM) methods [8-10]. They can also identify weak similarities from short subsequences. We observed also that these alignment-free classifiers are better than profile-HMM methods when a sufficiently large training set is unavailable [9]. For example, one alignment-free method was successfully used to identify extremely divergent 7TMRs (odorant and gustatory receptors) for the first time from the Drosophila melanogaster genome [11-13]. One disadvantage of alignment-free classifiers is their relatively high false-positive rate. Profile-HMM classifiers, on the other hand, are accurate in identifying well-established protein family with few false positives. Combining both approaches hierarchically provides greater sensitivity with fewer false positives. Hierarchical classification strategy Our study for mining 7TMR protein candidates from the Arabidopsis thaliana genome showed the power of hierarchically combining multiple classifiers, including both traditional alignment-based and newer alignment-free methods [14]. We identified 394 Arabidopsis thaliana proteins as 7TMR candidates and selected 54 proteins as those prioritized for further investigation. More recently, Gookin et al. [15] used a similar strategy by combining several methods hierarchically and identified a small number of GPCR candidates from three plant genomes including A. thaliana. They showed that a subset of the Arabidopsis proteins predicted to be GPCR candidates can interact with the Arabidopsis G-protein α subunit (AtGPA1) in a yeast complementation assay. In order to facilitate hierarchical identification of 7TMR proteins, we developed the Web server, 7TMRmine. 7TMRmine permits users to customize the integration of both alignment-based and alignment-free classifiers in any combination and order. 7TMRmine is a Web-based mining system as well as a database for 7TMR candidates from a growing collection of diverse genomes. It allows researchers to generate and explore prioritized lists of 7TMR candidates. It also allows researchers to examine the performance of various methods. Furthermore, 7TMRmine can be used for other transmembrane protein identification. 7TMR proteins While all known GPCR proteins have seven transmembrane (TM) regions, an increasing number of alternative 'G protein-independent' signaling mechanisms are associated with some 7TM protein groups. For example, plant-specific mildew resistance locus O (MLO) protein family is one of the most divergent 'GPCR' families [16,17], and, not surprisingly, MLO's interaction with Gα has not been shown despite great effort (AM Jones and R Panstruga, unpublished data). Another problem is that none of the candidate plant GPCRs was shown to activate the Gα subunit; therefore they do not fulfill the most important criterion for GPCR classification. A third problem is represented by the odorant receptor (OR) family in insects, another extremely diverged group of 7TM proteins. These proteins act independently of known G-protein-coupled second messenger pathways [18,19]. With these problems acknowledged, it is no longer appropriate to label the entire 7TM protein group as GPCRs because this group includes 'G protein-dependent', 'G protein-independent' signaling proteins, and putative scaffolds. Following the notation used in our previous study [14], we designate these proteins as candidate 7-transmembrane receptors (7TMRs), not GPCRs. Our goal here is to provide a tool capable of identifying the entire set of 7TMRs from diverse genomes. Having a comprehensive inventory of 7TMRs from diverse organisms will facilitate studies on the evolution of GPCRs and to address functionality of the large number of orphaned GPCRs, many critical to human health. Construction and content Overview of the 7TMRmine Web server 7TMRmine Web server includes protein classifiers and the database of the classification results. The Web interface is developed in HTML, PHP, and PERL. The database is managed in MySQL [20]. The user interface is available through standard Web browsers (tested for Safari, Firefox, and Internet Explorer). The Web server and all classifier programs run on the Linux operating system with the Apache HTTP server (tested on Red Hat Linux 9 and CentOS 4.2/5.1). The database currently includes classification results for 70 complete genomes from 68 different organisms across major eukaryotic phyla (For A. thaliana, three versions of genomes, TAIR5, TAIR7, and TAIR8, are included [21,22]). We plan on adding more genomes with regular updates as well as upon user requests. The classification results for user-submitted protein sequences are stored as temporary records in a database table. Figure Figure11
Protein classifiers Fourteen classifiers (four alignment-based and ten alignment-free) were trained to identify 7TMR candidates and are included in the current 7TMRmine (Figure (Figure2A2A Profile HMM This is an alignment-based classifier, and provides full probabilistic representation of protein families [e.g., [23]]. The program package, Sequence Alignment and Modeling System (SAM, version 3.5) [24,25] is used for implementing profile HMMs. The expect values (E-values) for SAM are calculated based on the constant sample size, 30,000, regardless of the genome size. Therefore, the E-values can be directly compared between different genomes. Strope and Moriyama [10] reported that when the E-value threshold of 0.05 was used, profile-HMM classifiers were highly accurate (nearly 100% accurate) for identifying proteins belonging to the same 7TMR classes (within-class prediction). However, at the same E-value threshold, these classifiers performed much poorly (70% or lower accuracy) in identifying distant 7TMRs (between-class prediction). Therefore, in 7TMRmine, we chose three E-value thresholds to provide different levels of identification stringency. They are listed as three different classifiers: SAM, SAM1, and SAM2. The SAM classifier uses the most stringent E-value threshold, E = 0.05. The SAM1 classifier uses E = 4.23 as the threshold, which is based on the highest E-value given to Arabidopsis MLOs (specifically, MLO3). The SAM2 classifier is the least stringent with the threshold E = 6.52, which is obtained at the minimum error point [26] based on the classification of the training set (total errors: 4 out of 2,030 training samples: no false positive and 4 false negatives). GPCRHMM This method was developed by Wistrand et al. [27]. These authors constructed a compartmentalized HMM incorporating distinct loop length patterns and differences in amino acid composition between cytosolic loops, extracellular loops, and membrane regions based on a diverse set of GPCR sequences. Their training set included eleven of 13 PFAM GPCR protein families [7]. They considered the remaining two divergent families: Drosophila odorant receptor family 7tm_6 (PF02949) and the plant family Mlo (PF03094) as the outliers and excluded from their training set. The sensitivity (against 1,706 positives obtained from GPCRDB [28,29]) and false positive rates (against 1,071 negatives) of GPCRHMM are reported as 92.8% and 0–1.18%, respectively [27]. LDA, QDA, LOG, and KNN These classifiers are parametric and non-parametric discrimination methods (linear, quadratic, and logistic discriminant analyses, as well as nonparametric K-nearest neighbor) described by Moriyama and Kim [8]. These classifiers use amino acid composition and physico-chemical properties as sequence descriptors. For KNN classifiers, the number of neighbors, K, is chosen from 5, 10, 15, or 20 and the classifiers are designated KNN5, KNN10, KNN15, and KNN20, respectively. Based on the training set including 1,000 positives (obtained from GPCRDB) and 750 negatives, cross-validation tests showed that these methods have 97.7–98.7% and 2.9–3.6% of true and false positive rates, respectively [8]. S-PLUS statistical package version 8.1.1 for Linux (TIBCO Software Inc., Palo Alto, CA, USA) is used for the classifier development and application. SVM-AA and SVM-di These are the classifiers based on support vector machines (SVMs), learning machines that make binary classifications based on a hyperplane separating a remapped instance space [30]. Amino acid composition (SVM-AA) and dipeptide frequencies (SVM-di) are used as the sequence descriptors. Strope and Moriyama [10] reported that the true and false positive rates by SVM-AA are >96% and 4–6%, respectively. SVM-AA performed much better than profile-HMM classifier for identifying distant 7TMRs (~90% accuracy by SVM-AA, while lower than 80% by profile HMMs), and similar accuracies were observed with SVM-AA even for short sub-sequences. Bhasin and Raghava [31] used SVM-di for their GPCRpred classifier and showed that 99.5% accuracy from cross-validation tests based on the training set including the five major 7TMR classes. We use SVMlight version 6.01 developed by Joachims [32,33] for the SVM implementation with the radial basis (rbf) kernel function. We performed the grid analysis with five-fold cross validation to obtain the optimal set of parameters (γ for the rbf kernel and the trade-off, C) for our training set. For SVM-AA and SVM-di, the values used were (γ, C) = (155, 0.5) and (417, 0.5291), respectively. PLS-ACC This classifier uses the partial least squares regression (PLS) with sequence descriptors based on the auto/cross-covariance transformation of amino acid properties [9]. We use an R implementation [34,35]: the PLS package (ver. 2.1-0) developed by Mevik and Wehrens [36,37]. The classification was done using the threshold score, 0.4982, which was obtained at the minimum error point [26]. PLS-ACC was found to perform better than profile-HMM classifiers and PSI-blast when training sets are small and also against short sub-sequences, constantly better than 90% accuracy whereas profile-HMM classifiers fluctuates as low as 80% accuracy [9]. All classifiers except for GPCRHMM were trained using the dataset including 1,015 each of positive (GPCR) and negative (non-GPCR) sequences (these sequences are available on the 7TMRmine website). GPCR sequences were randomly sampled from GPCRDB (June 2006 release) [28,29]. Only non-GPCR "Class Z (Archaeal/bacterial/fungal opsins)" sequences were excluded from sampling. Non-GPCR sequences were randomly sampled from UniProtKB/SwissProt (manually curated part of UniProt) [38,39]. We manually examined this random-negative set to ensure that no known GPCR sequences were included. Classifier performance against known proteins In order to understand how these classifiers perform for the actual 7TMR proteins, we tested them against the entire set of sequences obtained from GPCRDB [28,29]. In Additional file 1, the percentage of positives identified by each classifier is summarized. GPCRDB includes one non-GPCR class, "Class Z: Archael/bacterial/fungal opsins", which includes bacteriorhodopsins, proteorhodopsins, and related fungal opsins. They are light-driven proton and chloride pumps. Although these proteins have 7TM regions, they are not GPCRs and not involved with signal transduction. Therefore, we consider these proteins as important negative test samples. As shown in Additional file 1, the percentage of positives obtained by classifiers varies depending on the GPCR class. Only Class A (Rhodopsin-like), frizzled/smoothened, and vertebrate taste receptors (T2R) are consistently identified at higher than 96% by any classifier. GPCRHMM completely missed insect odorant receptors and plant MLOs. This is because GPCRHMM is not trained for these proteins as described earlier. Compared to alignment-based classifiers (SAM/SAM1/SAM2 and GPCRHMM), all alignment-free classifiers showed very high false positive rates (shown as % positives against Class Z). In order to reduce false positive rates, Moriyama et al. [14] took the intersection of six selected classifiers (SVM-AA, SVM-di, PLS-ACC, LDA, QDA, and KNN20). As shown in Additional file 1, this strategy (called "6 class") reduced the false positive rate to ~6% without affecting the true positive rates. By taking the union of "6 class" and GPCRHMM as well as SAM2, we achieved the highest coverage for all GPCR classes without increasing the false positive rate. Additional file 1 also shows the classifier performance against the GPCR datasets from two organisms (Homo sapiens and D. melanogaster). Using the combination classifier "6 class + GPCRHMM + SAM2", nearly 100% of all known 7TMRs were recovered from these two genomes. Transmembrane prediction methods HMMTOP2.1 [40-42] and TMHMM2.0 [43] are both HMM-based TM-prediction methods. Both are considered to be the two best TM-prediction methods [e.g., [44,45]]. Many secreted proteins contain short N-terminal signal peptides, which often have strongly hydrophobic segments; consequently many TM-prediction methods misidentify these signal peptides as TM regions. Phobius [46,47] addressed this problem by combining a signal peptide model, SignalP-HMM [48], and TMHMM improving overall accuracy in detecting and differentiating proteins with signal peptides and proteins with TM segments. We incorporated HMMTOP2.1 and Phobius in our classifier set. As shown in Figure Figure2A,2A Genes encoding transmembrane proteins constitute 20–30% of both prokaryotic and eukaryotic genomes [51-54]. Therefore, TM-region prediction is in general one of the most important steps for analyzing proteins. Inclusion of TM-prediction options adds flexibility to explore beyond just 7TM proteins. For this purpose, the users may elect to use only TM-prediction options with any number of levels (Figure (Figure2A).2A User submitted sequences For user-submitted protein sequences, all classifiers are run first and the identification results are displayed for users to review. If the user chooses to perform further hierarchical analysis, the option interface similar to Figure Figure2A2A Utility and discussion 7TMR protein mining from the Arabidopsis thaliana genome 7TMR proteins form the largest receptor superfamily in vertebrates and other metazoans (e.g., ~800 in human, ~1,000 in Caenorhabditis elegans) [29]. However, few 7TMR candidates are reported in plants and fungi. Only 22 candidate Arabidopsis 7TMRs were described to date [55] (more recent review is found in Moriyama and Opiyo, in press 65). We explored the possibility of finding more divergent groups of 7TMR candidates from the A. thaliana genome using both alignment-free and alignment-based methods [14]. For the 7TMRmine server, we updated all classifiers using a larger training dataset, and added new classifiers (SAM1, SAM2, GPCRHMM, and Phobius). The server also includes a newer release of the A. thaliana genome (TAIR8; 32,690 proteins excluding those shorter than 35 amino acids; 27,066 proteins further excluding predicted alternative-splicing products). Table 1 summarizes the results obtained from the classifiers based on profile HMMs and TM-prediction methods. GPCRHMM predicted 39 proteins (46 including predicted alternative-splicing products) as 7TMR candidates. In A. thaliana, currently 22 (27 including predicted alternative-splicing products) are known to be 7TMRs: 15 MLOs (19 including predicted alternative-splicing products), G-protein-coupled receptor 1 (GCR1), Arabidopsis thaliana regulator of G-protein signaling 1 (AtRGS1), and five heptahelical transmembrane proteins (HHPs; 6 including predicted alternative-splicing products). GCR1 and AtRGS1 are known to directly interact with the plant Gα subunit GPA1 [56]. AtRGS1 is a putative membrane receptor for D-glucose and also functions as a GTPase activating protein to AtGPA1 [57]. Two proteins, GTG1 and GTG2 (four proteins including predicted alternative-splicing products; [58]), were claimed to be plant GPCRs based on co-immunoprecipitation of AtGPA1 with these membrane proteins. However, GTG1/GTG2 are treated separately here as their animal homologues are reported to be likely channel proteins with no topological similarity to GPCRs [59]. Of the 22 known 7TMR proteins in A. thaliana, GPCRHMM recognized only GCR1 as a candidate. The AtRGS1 protein contains the RGS domain (120 amino acids) attached to the 7-TM region. As described also by Gookin et al.[15], GPCRHMM does not recognize AtRGS1 as a 7TMR protein unless the C-terminal RGS domain is removed. As expected, none of the MLOs and HHPs was identified by GPCRHMM. As mentioned before, the training dataset used for GPCRHMM excluded any such extremely diverged proteins [27]. On the other hand, the SAM classifiers were trained using the dataset that included wider ranges of 7TMR proteins. Thus both SAM1 and SAM2 identified all 15 MLOs (19 including alternative-splicing products) as well as GCR1 correctly. However, even after removing the RGS domain sequence, SAM classifiers could not identify AtRGS1 positively; only GCR1 was identified positively by both SAM2 and GPCRHMM.
By using either Phobius or HMMTOP, ~200 of 27,066 A. thaliana proteins (or ~250 of 32,690 including alternative-splicing products) were predicted to have exactly seven TM-regions. 103 proteins (134 including alternative-splicing products) were predicted to be 7-TM proteins by both methods. The 22 (or 27 including alternative-splicing products) known A. thaliana 7TMR proteins were predicted to have between six and eight and between seven and ten TM-regions by Phobius and HMMTOP, respectively. Only 11 of the 22 proteins (or 13 of 27 including alternative-splicing products) are predicted to have exactly seven TM-regions by the both methods. Note that GTG1 and GTG2 are predicted to have eight or nine TM-regions (one of the two GTG2 alternative-splicing products, AT4G27630.1, is predicted to have only five TM-regions by both methods). Of the 27,066 A. thaliana proteins, 969 proteins have between five and ten TM-regions by both methods. The range "5–10TMs" (by HMMTOP) was also used by Moriyama et al. [14] as the best coverage against the entire GPCR dataset for the hierarchical classification. Figure Figure33
As shown in this example, users can choose classifiers in any combination in any number of levels (currently up to six) to create their own hierarchical filtering system. By using less strict methods at the earlier level and more strict methods at the later level, the 7TMRmine Web server facilitates the prioritization of the 7TMR protein candidate set and generation of a protein set in a manageable size for further investigation. The union and intersection of positive or negative sets can be easily obtained as shown in Figure Figure3C.3C Distribution of transmembrane proteins among eukaryotic genomes Using 7TMRmine, we examined the distribution of transmembrane proteins among various eukaryotes. The server currently has classification results from 68 organisms across the major eukaryotic phyla: 10 land plants (including 1 moss and 1 fern), 8 green algae, 2 diatoms, 14 fungi, 6 vertebrates, 1 urochordate, 1 cephalochordate, 1 echinoderm, 7 arthropodes, 1 nematode, 2 annelida, 1 mollusca, 1 cnidaria, 1 placozoa, and 11 protists (including 1 red alga, 1 choanoflagellate and 2 Dictyostelium species). From each genome, proteins shorter than 35 amino acids and proteins with unidentified residues (irregular letters other than the 20 alphabets, most often 'X') over more than 30% of the length are excluded. The summary statistics are shown in the "TM/7TMR Mining Summary Statistics" page (Figure (Figure4).4
Distributions of TM proteins among four representative organismal groups are compared in Figure Figure6.6 Distribution of 7TMR proteins among eukaryotic genomes The "TM/7TMR Mining Summary Statistics" page also summarizes the distribution of 7TMR protein candidates among eukaryotes (Figure (Figure4).4
7TMR candidates in the A. thaliana, rice, and poplar genomes As described earlier, from the A. thaliana genome, the 16 high-ranking proteins identified by Gookin et al. [15] as well as 15 of the 22 known 7TMRs are found in the 132 proteins (156 including predicted alternative-splice forms) obtained from the intersection of the "6 classifiers" AND "7–8 TM" predictions (see Venn diagrams for A. thaliana in Figure Figure7).7 Conclusion 7TMRmine facilitates the discovery of extremely divergent 7TMR proteins from diverse genomes. By combining prediction results from various classifiers including alignment-based and alignment-free classifiers as well as transmembrane prediction methods in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. Furthermore, 7TMRmine can be used as a general transmembrane-protein classifier. Statistics provided for pre-analyzed 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla. Availability and requirements 7TMRmine is freely available from http://bioinfolab.unl.edu/emlab/7tmr using any current Web browser. Authors' contributions GL wrote part of the programs, carried out analyses of genomes, and revised the manuscript. ZW designed and developed the preliminary version of the database and programs. AMJ contributed to the discussion and writing of the manuscript. ENM conceived of the study, supervised the entire project, wrote part of the programs, carried out analyses of genomes, and wrote the manuscript. ENM also maintains the Web server and database. All authors read and approved the final manuscript. Additional file 1 Classifier performance on GPCRDB proteins. Classifiers were tested against the entire dataset of GPCRDB. The table summarizes the % positive identifications for each GPCR class as well as for two organisms (Homo sapiens and Drosophila melanogaster). Click here for file(45K, pdf) Additional file 2 Number of transmembrane regions predicted from GPCRDB proteins. Transmembrane regions were predicted from the entire GPCRDB proteins using two methods, Phobius and HMMTOP. Click here for file(300K, pdf) Additional file 3 7TMR candidate proteins identified from the Arabidopsis thaliana genome. 189 proteins (or 162 proteins excluding predicted alternative-splice products) were obtained by combining the results of eight classifiers and two TM-prediction methods. Click here for file(79K, pdf) Additional file 4 7TMR candidate proteins identified from the Oryza sativa genome. 84 proteins were obtained by combining the results of eight classifiers and two TM-prediction methods. Click here for file(40K, pdf) Additional file 5 7TMR candidate proteins identified from the Populus trichocarpa genome. 153 proteins were obtained by combining the results of eight classifiers and two TM-prediction methods. Click here for file(47K, pdf) Acknowledgements The authors thank Qiaomei Zhong for developing the early prototype of the database and Web interface. We also thank Dr. Stephen O. Opiyo and Pooja K. Strope for training PLS, SAM, and SVM classifiers. This work was in part funded by Nebraska EPSCoR Women in Science, NSF EPSCoR Type II grant, and the grant number R01LM009219 from the National Library of Medicine to E.N.M., and the NIGMS (GM65989-01), the DOE (DE-FG02-05er15671), and the NSF (MCB-0209711, MCB-0723515) to A.M.J. The authors have no conflicts of interest that are directly relevant to the content of this article. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||
Genomics. 2006 Sep; 88(3):263-73.
[Genomics. 2006]BMC Biol. 2008 Oct 6; 6():42.
[BMC Biol. 2008]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D227-30.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2003 Jan 1; 31(1):400-2.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2008 Jan; 36(Database issue):D281-8.
[Nucleic Acids Res. 2008]Genomics. 2007 May; 89(5):602-12.
[Genomics. 2007]J Proteome Res. 2007 Feb; 6(2):846-53.
[J Proteome Res. 2007]Science. 2000 Mar 10; 287(5459):1830-4.
[Science. 2000]Bioinformatics. 2000 Sep; 16(9):767-75.
[Bioinformatics. 2000]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]Genome Biol. 2008; 9(7):R120.
[Genome Biol. 2008]J Mol Evol. 2003 Jan; 56(1):77-88.
[J Mol Evol. 2003]J Biol Chem. 1999 Dec 3; 274(49):34993-5004.
[J Biol Chem. 1999]Nature. 2008 Apr 24; 452(7190):1002-6.
[Nature. 2008]Nature. 2008 Apr 24; 452(7190):1007-11.
[Nature. 2008]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]Nucleic Acids Res. 2001 Jan 1; 29(1):102-5.
[Nucleic Acids Res. 2001]Bioinformatics. 1998; 14(9):755-63.
[Bioinformatics. 1998]Comput Appl Biosci. 1996 Apr; 12(2):95-107.
[Comput Appl Biosci. 1996]Genomics. 2007 May; 89(5):602-12.
[Genomics. 2007]Bioinformatics. 2002 Jan; 18(1):147-59.
[Bioinformatics. 2002]Protein Sci. 2006 Mar; 15(3):509-21.
[Protein Sci. 2006]Nucleic Acids Res. 2008 Jan; 36(Database issue):D281-8.
[Nucleic Acids Res. 2008]Nucleic Acids Res. 2003 Jan 1; 31(1):294-7.
[Nucleic Acids Res. 2003]Genomics. 2007 May; 89(5):602-12.
[Genomics. 2007]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W383-9.
[Nucleic Acids Res. 2004]J Proteome Res. 2007 Feb; 6(2):846-53.
[J Proteome Res. 2007]Bioinformatics. 2002 Jan; 18(1):147-59.
[Bioinformatics. 2002]Nucleic Acids Res. 2003 Jan 1; 31(1):294-7.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2008 Jan; 36(Database issue):D190-5.
[Nucleic Acids Res. 2008]Nucleic Acids Res. 2003 Jan 1; 31(1):294-7.
[Nucleic Acids Res. 2003]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]J Mol Biol. 1998 Oct 23; 283(2):489-506.
[J Mol Biol. 1998]J Mol Biol. 2001 Jan 19; 305(3):567-80.
[J Mol Biol. 2001]Protein Sci. 2002 Dec; 11(12):2774-91.
[Protein Sci. 2002]Protein Eng Des Sel. 2005 Jun; 18(6):295-308.
[Protein Eng Des Sel. 2005]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W429-32.
[Nucleic Acids Res. 2007]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]J Exp Bot. 2005 Dec; 56(422):3137-47.
[J Exp Bot. 2005]PLoS Biol. 2006 Feb; 4(2):e20.
[PLoS Biol. 2006]Protein Sci. 1998 Apr; 7(4):1029-38.
[Protein Sci. 1998]Nucleic Acids Res. 2006; 34(3):1066-80.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2003 Jan 1; 31(1):294-7.
[Nucleic Acids Res. 2003]EMBO Rep. 2004 Jun; 5(6):572-8.
[EMBO Rep. 2004]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]Plant Cell. 2004 Jun; 16(6):1616-32.
[Plant Cell. 2004]FEBS Lett. 2008 Oct 29; 582(25-26):3577-84.
[FEBS Lett. 2008]Cell. 2009 Jan 9; 136(1):136-48.
[Cell. 2009]Nat Cell Biol. 2008 Oct; 10(10):1135-45.
[Nat Cell Biol. 2008]Genome Biol. 2008; 9(7):R120.
[Genome Biol. 2008]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]Genome Biol. 2006; 7(10):R96.
[Genome Biol. 2006]Genome Biol. 2008; 9(7):R120.
[Genome Biol. 2008]Nucleic Acids Res. 2001 Jan 1; 29(1):102-5.
[Nucleic Acids Res. 2001]Protein Eng Des Sel. 2006 Nov; 19(11):511-6.
[Protein Eng Des Sel. 2006]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W383-9.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W143-7.
[Nucleic Acids Res. 2005]BMC Res Notes. 2008 Aug 21; 1():67.
[BMC Res Notes. 2008]BMC Biol. 2008 Oct 6; 6():42.
[BMC Biol. 2008]Science. 1998 Dec 11; 282(5396):2028-33.
[Science. 1998]Nature. 2001 Nov 22; 414(6862):450-3.
[Nature. 2001]