![]() | ![]() |
Formats:
|
||||||||||||||||||
Copyright © 2008 Goh et al; licensee BioMed Central Ltd. Protein intrinsic disorder toolbox for comparative analysis of viral proteins 1Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA 2Institute for Intrinsically Disordered Protein Research, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA 3Institute for Biological Instrumentation, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia Corresponding author.Gerard Kian-Meng Goh: gerard/at/compbio.iupui.edu; A Keith Dunker: kedunker/at/iupui.edu; Vladimir N Uversky: vuversky/at/iupu.edu SupplementIEEE 7th International Conference on Bioinformatics and Bioengineering at Hardvard Medical School Mary Qu Yang, Jack Y Yang, Hamid R Arabnia and Youping Deng http://www.biomedcentral.com/content/pdf/1471-2164-9-S2-info.pdfConferenceIEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School 14–17 October 2007 Boston, MA, USA This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract To examine the usefulness of protein disorder predictions as a tool for the comparative analysis of viral proteins, a relational database has been constructed. The database includes proteins from influenza A and HIV-related viruses. Annotations include viral protein sequence, disorder prediction, structure, and function. Location of each protein within a virion, if known, is also denoted. Our analysis reveals a clear relationship between proximity to the RNA core and the percentage of predicted disordered residues for a set of influenza A virus proteins. Neuraminidases (NA) and hemagglutinin (HA) of major influenza A pandemics tend to pair in such a way that both proteins tend to be either ordered-ordered or disordered-disordered by prediction. This may be the result of these proteins evolving from being lipid-associated. High abundance of intrinsic disorder in envelope and matrix proteins from HIV-related viruses likely represents a mechanism where HIV virions can escape immune response despite the availability of antibodies for the HIV-related proteins. This exercise provides an example showing how the combined use of intrinsic disorder predictions and relational databases provides an improved understanding of the functional and structural behaviour of viral proteins. Background Goals and objectives Structures and functions of a large number of viral proteins are not yet totally understood [1-5]. This may account for the continuous need for the development of novel computational and experimental tools suitable for the viral protein analysis. Although experimental techniques remain the major providers of structural and functional knowledge, often, the experiments are expensive or difficult to the point of infeasibility. The use of various bioinformatics tools to predict structure and function represents an alternative approach that is gaining significant attention. Comparative computational studies have opened a new way for easier benchmarking and functional analysis of proteins. Here we examine the usefulness of intrinsic disorder predictions for studying the viral proteins. To this end, a set of biocomputing tools that include relational database design and utilization of disorder prediction algorithms was elaborated. Viral protein functions by proteins, location and virus type Two families of RNA viruses, the Lentivirinae (HIV) and the Orthomyxoviridae (Influenza), were used in this comparative study. These viral families were selected because they are widely studied due to their involvement in major outbreaks during the last century [5,6]. The Lentiviruses include the HIV and the SIV viruses among others [7], whereas the orthomyxoviruses encompass mainly the various influenza viruses [8]. The influenza A virion (which is a complete virus particle with its RNA core and protein coat) is a globular particle sheathed in a lipid bilayer derived from the plasma membrane of its host (Figure (Figure1A).1A
HIV is also an enveloped virus. Figure Figure1B1B Table 1 represents a list of some of the most important proteins analyzed in this study. These proteins are arranged by their approximate location in the HIV and Influenza A virions [7-10]; i.e., according to their proximity to the core where the RNA is housed. The proteins that are located closer to the core are likelier to be involved in interaction with the viral RNA. Note: the exact locations of some of the proteins within the virions are not known as of yet. Table 1 shows that proteins similarly located within the virions of different viral types possess significant functional similarities [11]. For example, similar functions can be seen in the surface proteins (gp120, HA, NA) in both influenza A and HIV viruses. Although Table 1 lists major functions for several proteins, it is important to remember that some of the functions are not totally understood or are not known at all [1]. Multi-functionality of a protein is, of course, also possible. Intrinsic disorder Many proteins are intrinsically disordered; i.e., they lack rigid 3-D structure under physiological conditions in vitro, existing instead as dynamic ensembles of interconverting structures. Intrinsically disordered proteins [12] are also known by several other names including "intrinsically unstructured" [13] and "natively unfolded." [14-16] While the function of a given protein is often determined by its unique structure, comparative studies on several exceptions to the structure-to-function mechanism led to the realizations that intrinsically disordered proteins share many sequence characteristics and so comprise a distinct cohort. These intrinsically unstructured proteins and regions differ from structured globular proteins and domains with regard to many attributes, including amino acid composition, sequence complexity, hydrophobicity, charge, flexibility [12,15], and type and rate of amino acid substitutions over evolutionary time [17]. Many of these differences between ordered and intrinsically disordered proteins were utilized to develop numerous disorder predictors. The disorder predictors used in this paper are PONDR®s (Predictors of Naturally Disordered Regions) VLXT and VL3 [18-21]. We utilized these predictors to address the following question: Can disorder prediction be used to determine or map at least some the functions for viral proteins? Results Predicted intrinsic disorder in various viruses Table 2 lists the average percentages of predicted disordered residues (the percentage disorder rate) that have been found in proteins studied by NMR or X-Ray crystallography. They are also divided into PDB-Select 90 proteins [22,23], lentivirus, and influenza A virus. Table 2 shows that the percentage of residues predicted to be disordered by PONDR® VLXT in proteins from a PDB-Select 90 set is 24 ± 2, whereas the corresponding value for PONDR® VL3 predictions is 14 ± 2. Table 2 also shows that predicted disorder is a bit more abundant in lentivirus proteins in comparison with proteins from the influenza virus. The values given in this table are the average percentages of disordered residues in a given dataset, not the average percentages of disordered residues in each chain. The former provides a better gauge of the mean since the number of influenza and HIV-related proteins available in PDB [22] is relatively small.
Averaged predicted disorder rates enable analysis of the viral protein disorder Table 2 provides a simple measure to classify a given protein as ordered or disordered by prediction. For example, the averaged predicted disorder rate for proteins from PDBS90 is 24 ± 2 (15 ± 2). If this value is used as a benchmark for labelling a protein as moderately disordered or mostly structured, then any protein that falls close to this number with respect to percentage of predicted disordered residues can safely be classified as 'moderately disordered' by prediction. The information in Table 2 provides the benchmarks for the analysis of the results shown in Tables 3 and 4. Table 3 categorizes proteins found in the HIV virus by their function and arranges data by protein location in the virion. The envelope proteins are placed on the forefront with gp120 being the Surface Unit (SU). Just below the SU lies the gp41, which is a transmembrane protein (TM). Using the information in Table 2, we can tell that gp120 can be considered as quite ordered by prediction, whereas the gp41 should be categorized as rather disordered. The amount of predicted intrinsic disorder is high in matrix and capsid proteins, as well as in Vpr and Tat proteins. Nef protein and integrase are predicted to be moderately disordered, whereas both protease and reverse transcriptase contain the least amount of predicted disorder.
Table 4 gives the data on the abundance of disorder in influenza A virus proteins. These data provide comparisons of proteins from orthomyxovirus and lentivirus. The envelope of the orthomyxovirus contains hemagglutinin (HA) and neuraminidase (NA) protein. While the percentage predicted disorder rate for gp120 does not vary much by strains, predicted intrinsic disorder in both HA and NA vary significantly by subtype (see Table 5). The M1 is a matrix protein, which provides a link between the surface protein and the capsid. M1 is predicted to be moderately disordered. Both non-structural proteins of the influenza A viruses were predicted to be rather disordered. Similarly, both nucleoprotein and main core protein were predicted to contain significant percentage of disordered residues.
Table 6 summarizes some trends in the distribution of intrinsic disorder among various functional classes of proteins derived from the analysis of literature. First, a collection of RNA-binding proteins show a strong tendency to be highly disordered, both experimentally and computationally. The next sets of proteins that have also been observed disordered, though not as highly disordered as RNA-binding proteins, are DNA-binding proteins. Single-span membrane proteins also contain significant intrinsic disorder, except for the segment that crosses the membrane, which is typically predicted to be ordered. Finally, various enzymes as well as transmembrane proteins (e.g. pores) are among the polypeptides with the least intrinsic disorder. Overall, the results in this table are consistent with previous studies on all of the SwissProt proteins, which were partitioned by functional annotation [24-26]. Figure Figure22
Discussion Predicted disorder varies with protein type, protein location and function Predicted disorder varies with location of protein in virion There is an interesting correlation between the percentage disorder rate and the protein localization within the virion. This phenomenon is especially clear for influenza virus (see Figure Figure2),2 Hemagglutinin versus gp120 Data in Tables 3 – 4 and Figure Figure22 According to our analysis, gp120, like HA, is predicted to be quite ordered. There are however, observed differences in the prediction results for these two proteins: gp120 was consistently predicted to be ordered, whereas the levels of predicted order in influenza HA varied with the viral subtype. A summary of this can be found in Table 5. It is also should be noted that both HA (HA2) and NA are transmembrane proteins [28,29]. Transmembrane proteins are more ordered In HIV, a protein that spans the lipid membrane is the transmembrane (TM) protein, gp41. This protein acts as a fusion protein [30] and functions in membrane interactions. This integral membrane protein contains a TM anchor domain that holds this envelope protein in association with the lipid bilayer [31]. This TM protein is responsible for the fusion of the viral and cellular membranes via its fusion peptide located in its extracellular, N-terminal domain [32]. Previously transmembrane fragments of channels and pores were predicted to be highly ordered [24-26]. However, the situation might be quite different for membrane proteins with relatively large extra- and intracellular domains, e.g., for fusion proteins. We now have an opportunity to analyse the predicted disorder rate of transmembrane proteins that are involved in the membrane fusion. Table 3 shows that the predicted disorder level for gp41 is quite sizeable (34% for PONDR® VLXT). In fact, the amount of disorder in this protein is significantly higher than that of the transmembrane proteins (HA, NA) of the influenza A virus. This might be also correlated with the high level of predicted disorder in the HIV matrix. Similarly to gp41, HA is a transmembrane glycoprotein. Both, gp41 and HA, are members of the class I viral fusion proteins that mediate viral entry into cells. Class I viral fusion proteins are thought to fold into a prefusion, metastable conformation, which is then activated to undergo a large conformational rearrangement to a lower energy state, thereby providing the energy needed to accomplish membrane fusion [33-35]. The role of HA as a fusion protein and the associated large-scale conformational changes may help to account for the slightly higher predicted disorder rates observed for HA in many subtypes (see Table 5). Analysis of the data in Table 3 revealed that transmembrane viral proteins are, in general, characterized by relatively low predicted disorder rates. For instance, the Vpr protein which is present in HIV but is not expressed in SIV [36] and structure of which was determined by NMR is predicted to be rather disordered (39% by PONDR® VLXT and 64% by PONDR® VL3, Table 3). On the other hand, Vpu was predicted to be more ordered (26% by PONDR® VLXT) in agreement with the fact that this protein has a transmembrane domain [37]. More disorder at the core The matrix proteins, which form a layer below the lipid envelope, produced interesting data for both families of viruses. The matrices of both influenza and HIV viruses are relatively disordered (see Tables 3 and 4). The HIV matrix protein is predicted to be highly disordered, whereas the influenza virus matrix protein is predicted to be only moderately disordered (or somewhat ordered) by PONDR® VLXT. This peculiarity may highlight an important difference between the two families of viruses and may have important medical implications. For the influenza virus, the proteins that are even closer to the core include NS1, NP and PB1. All these proteins are predicted to be highly disordered. More disorder at the core unless proteins are enzymes While M1 is known to bind RNA, proteins that are located closer to the core are even more likely to interact with the viral RNA. This may account for the trend that, for proteins that are closer to the core, the amount of predicted disorder increases. This trend is clearly highlighted in Figure Figure2.2 Predicted disorder in hemagglutinin might correlate with viral infectivity Loss of infectivity for specific virus subtype via fatty acid deprivation The attachment and membrane fusion of the influenza virus and the host cell are mediated by its hemagglutinin (HA). HA is a homotrimer, and each monomer comprises an ectodomain with about 510 amino acid residues, a transmembrane domain with 27 residues, and a cytoplasmic domain with 10 to 11 residues. The HA monomer is synthesized as a single polypeptide chain and cleaved into two subunits, HA1 and HA2, by proteolytic enzymes after virus budding or during intracellular transport. The HA1 and HA2 subunits are functionally specialized. HA1 carries receptor-binding activity, and HA2 mediates membrane fusion [38]. As discussed briefly above, the amount of intrinsic disorder in HA1 and HA2 varies with the viral subtype (Table 5). Analysis of the past experimental data in comparison with the disorder predictions in Table 5 suggests that variations in the infectivity of the virus [39], variations in the assembly of the HA proteins, and variations in the correlation between protein-membrane interaction and the lipid raft motion may all be related to the amount of predicted disorder. For example, one of the HA functions is to assemble proteins, including those involved into the formation of pores. Acetylation of the HA molecules often affects this function, which is crucial for the infectivity of the virus. However, it has been shown that H1, H3, and H7 behave differently when the sites that are normally palmitylated are mutated. Viral subtypes with H1 proteins were most affected by the mutations, whereas the virions did not lose much of their infectivity in the case of H3 and H7 [40,41]. Table 5 shows that, among the viral subtypes analyzed, H1 proteins possess the least amount of predicted disorder, whereas H3 and H7 proteins were predicted to be essentially more disordered. We showed elsewhere that enzyme-mediated posttranslational modifications usually occur with disordered regions [24-26,42]. The increased predictions of intrinsic disorder in HA are associated with increased infectivity of influenza virus, perhaps via changes in posttranslational modification the ease of which may depend on the tendency to be disordered. Intrinsic disorder may provide a bypass to the lipid raft requirement For all enveloped viruses, the envelope is derived from the host cell during the process of virus budding. In the case of influenza virus, budding takes place at the apical plasma membrane and is heavily dependent on the presence of lipid microdomains, or "rafts" [43-45]. Lipid rafts, also known as detergent-insoluble glycosphingolipid-enriched domains, are specific domains on plasma membranes that are enriched in detergent-insoluble glycolipids (DIGs), cholesterol and sphingolipids [46-48]. Levels of cholesterol and sphingolipids can vary amongst individuals, which alters the extent the raft formation. Lipid rafts play an important role in several biological processes, including signal transduction, T-cell activation, protein sorting, and virus assembly and budding [48]. Such enveloped viruses incorporate some integral membrane proteins; among the best studied are the influenza virus hemagglutinin (HA) and neuraminidase (NA) [49]. Acetylation of the envelope proteins and also palmitoylation are important for these viral proteins to be targeted to the lipid raft microdomains on the cell surface [50]. C-terminal domains of both HA and NA of influenza virus are crucial for association with rafts and this interaction constitutes part of the signaling machinery necessary for apical targeting in polarized cells. In fact, the cytoplasmic tails of HA and NA are so important for assembly that the information contained in these tails is partially redundant [51]. For example, the removal of the cytoplasmic tail or mutation of the three palmitoylated cysteine residues in the transmembrane (TM) domain and the cytoplasmic tail of influenza virus hemagglutinin (HA) was shown to decrease the association of HA with lipid rafts, decrease the incorporation of HA into virions [44], and modulates incorporation of cholesterol into the viral envelope. The level of the envelope cholesterol has been shown to play a crucial role in the HA-mediated fusion of the influenza virus with the host cell [52]. These data were obtained for the WSN (H1N1) strain of influenza virus and the authors proposed that differences may exist with other virus strains. Perhaps the virion cholesterol is important for the organization of influenza virus HA trimers into fusion-competent domains, and perhaps also the depletion of cholesterol inhibits virus infectivity due to inefficient fusion [52]. Here we suggest that variations in intrinsic disorder in the surface proteins may play similar role. In fact, Table 5 shows that H1 is predicted to be ordered, whereas H3 and H7 are predicted to be more disordered. This increased level of disorder might offer a mechanism for proteins to by-pass the lipid raft requirement. Studies on chimera proteins with specific swapping of regions predicted to be ordered or disordered could be used to test this proposed mechanism. Disorder or order pairing of HA and NA may be intertwined with the evolution of the influenza viruses Ordered-ordered versus disordered-disordered HA and NA in influenza A virus serotypes As has already been mentioned, sixteen HA serotypes and nine NA subtypes of influenza A virus are known. Among the three influenza types, the type A viruses are the most virulent human pathogens that cause the most severe disease. The list of some influenza A virus serotypes with the largest known human pandemic deaths includes H1N1 ("Spanish flue"), H2N2 ("Asian flue"), H3N2 ("Hong Kong flu"), and H5N1 ("Avian flue"). Table 7 illustrates an interesting correlation between the amounts of predicted intrinsic disorder in HA and NA proteins from the different influenza A virus serotypes: in H1N1 and H5N1 subtypes, both HA and NA are predicted to be ordered, whereas H3N2 serotype is characterized by more disordered hemagglutinin and neuraminidase. Perhaps such a combination is not coincidental but is instead evolutionarily preferred. Disorder as a viral weapon for evading the immune response An understanding of viral surface proteins is crucial for developing the appropriate vaccination strategies and for improving the understanding of the immune responses. The comparative analysis of intrinsic disorder distribution in the HIV and influenza virions uncovers specific patterns that could provide some useful insight into these problems. Above we showed that the level of predicted disorder varies in the HA and NA subtypes. This observation might be used for tuning vaccination strategies. However, the data in Table 5 shows that the variations in the predicted disorder do not deviate greatly. Furthermore, in general, HA and NA can be described as highly ordered to or moderately disordered (see Tables 3 and 5, and Figure Figure2).2 The first step in HIV infection is the binding of the envelope glycoprotein gp120 to the host cell receptor CD4 [53,54]. CD4 binding induces extensive structural rearrangements in gp120, resulting in the exposure of a binding surface for the second host cell chemokine receptor, CCR5 or CXCR4 [55,56]. The interface between gp120 and CD4 is highly conserved among different HIV-1 isolates [57]. In gp120-CD4 complexes, CD4 was shown to interact with all three domains of gp120, including the inner domain, the outer domain, and the bridging β-sheet. Furthermore, in all structures of various gp120-CD4 complexes analyzed by X-ray crystallography, a deep hydrophobic cavity enclosed by conserved gp120 residues was detected [58]. CD4 residue Phe43 is the only cavity-interacting residue in CD4. It fits to the opening of this cavity [57] and was shown to contribute about 23% of the total interaction surface [58]. According to our analysis, the surface protein of the HIV virion, gp120, has a consistently low predicted disorder value across various strains of lentiviruses (data not shown). Therefore, this feature has functional implications since a rigid structure of gp120 might be necessary for the formation of a stable complex with the host protein, CD4. The analysis of Figure Figure22 An interesting possibility is that the high prevalence of intrinsic disorder in proteins located in the close proximity to the surface of HIV- related viruses provides a mechanism for the avoiding the induction of immune response. In fact, the antigenicity of a given protein is known to reside in a restricted number of antigenic determinants (sites or epitopes) located on its surface. As antigenic determinants of several proteins have been shown to correspond to the surface regions with high segmental mobility (high B-factor values), the high mobility of an antigenic determinant was suggested to help in the determinant adjustment to a pre-existing antibody site not fashioned to fit the exact geometry of a protein [59]. On the other hand, additional research has revealed that an effective antigenic site, being mobile, should possess an internal propensity to form ordered structure; i.e., it should not be completely disordered. Importantly, some long disordered regions and intrinsically disordered proteins promote weak immune responses or are even completely non-immunogenic [60-62]. This is further illustrated by the analysis of literature data on the gp120 immunogeneity. Neutralizing antibodies play a significant role in the vaccines development. The key HIV targets for neutralizing antibody are found in the external envelope protein, gp120 [63-65]. The principle neutralizing determinant of HIV-1 virus was mapped to the third variable (V3) loop region (residues 301–341) of gp120 [66-68]. This V3 loop is also required for viral entry into target T cells and macrophages [69] and interacts with chemokine co-receptors on the surfaces of these cells [55,56,70]. The V3 loop is characterized by a highly variable amino acid sequence, which is assumed to contribute to the ability of HIV to escape the host immune response [71]. Using solid state NMR spectroscopy it has been shown that a 24-residue fragment of the V3 loop of HIV-1 strain III (namely residues 308–331) that includes the GPGR motif is conformationally heterogeneous [71]. Furthermore, this fragment was shown to adopt very different conformations when bound to different anti-V3 antibodies [71-73]. The disorder-to-order transition of V3 loop has been hypothesized to play a crucial role in function of this protein, determining its potential to interact with a variety of chemokine receptors and thus allowing different avenues into the cell [71]. The same mechanism makes devising vaccines against HIV very difficult because some V3 loops escape detection by antibodies that specifically recognize a particular conformation but that fail to bind other conformations. These observations provide further support to the hypothesis that high abundance of intrinsic disorder in proteins located in the close proximity to the surface of HIV-1 can help this virus to avoid the immune response induction. Therefore, intrinsic disorder might represent a crucial viral weapon for evading immune response. We previously discussed several pathogens that use disordered regions for binding, with these disordered regions being weakly immunogenic, and suggested using disorder for binding might be a common strategy for avoiding the immune system [60]. For this mechanism, the disordered region needs to have a sufficiently high flexibility. For such flexible disordered regions, we speculate that the relatively small size of the antibody binding site provides insufficient binding energy to fold the flexible disorder and therefore cannot bind tightly enough for the generation of an immune response. On the other hand, in our proposal the relatively larger size of the receptor binding surface can provide sufficient energy of association to overcome the flexibility and thereby induce binding via a disorder-to-order transition. The flexibility of the key HIV proteins may be slightly less than that for the previously discussed pathogens, and so antibodies are produced and these bind to different conformational states. Yet the ability of the flexible disordered binding region to fold in different ways may lead to confusion of the immune system and may substantially weaken the overall immune response as discussed above. Thus, antigenic sites may benefit by being somewhat flexible [59], but probably become less effective as the flexibility increases beyond some useful level. Conclusion Results presented in this paper show the usefulness of the intrinsic disorder prediction for the comparative analysis of viral proteins. This approach offers several advantages, including the opportunity to map proteins by functionality, predicted disorder, and locality across viral species, strains and subtypes. Furthermore, it provides useful benchmarks for the evaluation of the intrinsic disorder concept and for the analysis of various disorder predictors. Using this comparative study of predicted disorder, several interesting patterns in the behaviour of viral proteins from HIV-1 and influenza A viruses were uncovered. We have shown that the patterns of predicted disorder can be mapped and related to the functions of the various proteins. There is evidence that the functions and the amount of disorder of the proteins are related to their physical location in the virion. Some of the key findings of this paper are further outlined below. Intrinsic disorder is unevenly distributed within the virions, especially for influenza, with the least predicted disorder being observed at the surface proteins and the most disorder being characteristic for the proteins at the virion core. While a similar trend is observed for HIV, the disorder changes are much less pronounced. Proteins near the surface of HIV-related viruses are characterized by higher levels of predicted disorder as compared to influenza. Although the major surface protein, gp120, has been consistently predicted to be ordered, its major neutralizing determinant is highly mobile. These data support a scenario where HIV virions can escape immune response despite the availability of antibodies for the HIV-related proteins. Significant variations in the amount of predicted disorder by HA subtypes in influenza A virus were observed. This might provide an explanation for the variations in the functionality and infectivity of specific viral subtypes. Furthermore, NA and HA of major influenza A pandemic, tend to pair in such a way that both tend to be predicted either ordered-ordered or disordered-disordered. Such behaviour might be linked to the evolutionary advantages of being ordered or disordered, but more experiments are needed to test this conjecture. Methods Tools and materials used The programs were written in C#, JAVA®-JDBC, Microsoft® SQLSERVER, and MYSQL. Object-oriented programming in JAVA®-5 was also used. The design of the database was done using relational database concepts with normalization in Third Normal Form Boyce-Codd Normal Form [74]. PONDR® VLXT and VL3 predictors The predictors of intrinsic disorder used in this paper are PONDR® VLXT and VL3 [18,19,21]. PONDR® VLXT was built using 15 proteins whose structures were elucidated using X-ray diffraction, NMR spectroscopy, circular dichroism spectroscopy, or limited proteolysis [19]. PONDR® VL3, on the other hand, was built using a combination of 30 neural networks and a training set of disordered regions of 150 proteins [21]. Relational database and entity relationship diagram In order to do a comparative study of the viral proteins, it was necessary to develop a database that would capture the information from the amino acid sequence and the disorder prediction. The list of proteins of interest included viral proteins of lentiviruses and orthomyxoviruses. Searches were done on the list using the Entrez website [22]. Available samples were randomly chosen with preferences given to those with longer chains and those with binding partners. Whenever possible, corresponding viral protein of different virus strains were included as samples and annotated. The respective FASTA and PDB [22] files were downloaded and stored using a JAVA® program and the list prepared. In order to provide a benchmark for predicted disorder, a set of proteins from PDB-Select 90 was randomly chosen and downloaded to the database. The mean and standard deviation were calculated using bootstrapping techniques when necessary [75]. PDB Select90 [23] is defined as a representative, non-redundant subset of the PDB [22], made up of proteins that have no more than 95% sequence identity [23]. Figure Figure33
Using a set of programs written in JAVA®, the PDB and FASTA files were searched, and the essential information was placed in the MYSQL tables using accessions seq_access and seq_access_atom. The necessary FASTA files were then used to generate PONDR® VLXT and PONDR® VL3 scores via a LINUX BASH shell script. Another JAVA® program was then used to load the prediction into the seq_predn table. Information regarding to the virus and its subtype was initially stored in a Microsoft® SQLSERVER database via C# and later was transferred to the MySQL database server. Competing interests The authors declare that they have no competing interests. Authors' contributions GG has designed and implemented the experiments. AKD and VNU have provided advice and participated in the manuscript writing. Acknowledgements This work was supported in part by the grants R01 LM007688-01A1 (to A.K.D and V.N.U.) and GM071714-01A2 (to A.K.D and V.N.U.) from the National Institutes of Health and the Programs of the Russian Academy of Sciences for the "Molecular and cellular biology" and "Fundamental science for medicine" (to V. N. U.). We gratefully acknowledge the support of the IUPUI Signature Centers Initiative. This article has been published as part of BMC Genomics Volume 9 Supplement 2, 2008: IEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/9?issue=S2 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||
Nat Rev Immunol. 2002 Sep; 2(9):706-13.
[Nat Rev Immunol. 2002]Nat Rev Microbiol. 2004 Nov; 2(11):909-14.
[Nat Rev Microbiol. 2004]Nat Rev Microbiol. 2004 Nov; 2(11):909-14.
[Nat Rev Microbiol. 2004]N Engl J Med. 2001 Jun 7; 344(23):1764-72.
[N Engl J Med. 2001]Clin Microbiol Rev. 1996 Jan; 9(1):100-17.
[Clin Microbiol Rev. 1996]Clin Microbiol Rev. 1996 Jan; 9(1):100-17.
[Clin Microbiol Rev. 1996]J Mol Biol. 1999 Jan 8; 285(1):1-32.
[J Mol Biol. 1999]Clin Microbiol Rev. 1996 Jan; 9(1):100-17.
[Clin Microbiol Rev. 1996]J Mol Biol. 1999 Jan 8; 285(1):1-32.
[J Mol Biol. 1999]J Gen Virol. 1999 Apr; 80 ( Pt 4)():863-9.
[J Gen Virol. 1999]Nat Rev Immunol. 2002 Sep; 2(9):706-13.
[Nat Rev Immunol. 2002]J Mol Graph Model. 2001; 19(1):26-59.
[J Mol Graph Model. 2001]J Mol Biol. 1999 Oct 22; 293(2):321-31.
[J Mol Biol. 1999]Biochemistry. 1996 Oct 29; 35(43):13709-15.
[Biochemistry. 1996]Biochemistry. 1999 Nov 9; 38(45):15009-16.
[Biochemistry. 1999]Proteins. 2000 Nov 15; 41(3):415-27.
[Proteins. 2000]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Protein Sci. 1992 Mar; 1(3):409-17.
[Protein Sci. 1992]J Proteome Res. 2007 May; 6(5):1899-916.
[J Proteome Res. 2007]J Proteome Res. 2007 May; 6(5):1882-98.
[J Proteome Res. 2007]Protein Sci. 1992 Mar; 1(3):409-17.
[Protein Sci. 1992]J Proteome Res. 2007 May; 6(5):1882-98.
[J Proteome Res. 2007]J Proteome Res. 2007 May; 6(5):1899-916.
[J Proteome Res. 2007]J Proteome Res. 2007 May; 6(5):1882-98.
[J Proteome Res. 2007]Nat Rev Immunol. 2002 Sep; 2(9):706-13.
[Nat Rev Immunol. 2002]Science. 2004 Mar 19; 303(5665):1838-42.
[Science. 2004]Protein Sci. 1994 Oct; 3(10):1687-96.
[Protein Sci. 1994]Protein Sci. 1994 Oct; 3(10):1687-96.
[Protein Sci. 1994]J Virol. 2004 May; 78(10):5258-69.
[J Virol. 2004]J Virol. 2000 Jun; 74(11):5368-72.
[J Virol. 2000]Microbiol Rev. 1993 Mar; 57(1):183-289.
[Microbiol Rev. 1993]Virol J. 2007 Oct 18; 4():100.
[Virol J. 2007]J Proteome Res. 2007 May; 6(5):1899-916.
[J Proteome Res. 2007]J Proteome Res. 2007 May; 6(5):1882-98.
[J Proteome Res. 2007]Nat Rev Microbiol. 2006 Jan; 4(1):67-76.
[Nat Rev Microbiol. 2006]Curr Top Microbiol Immunol. 2005; 285():25-66.
[Curr Top Microbiol Immunol. 2005]Retrovirology. 2005 Feb 22; 2():11.
[Retrovirology. 2005]Protein Sci. 2007 Oct; 16(10):2205-15.
[Protein Sci. 2007]J Proteome Res. 2007 May; 6(5):1899-916.
[J Proteome Res. 2007]J Proteome Res. 2007 May; 6(5):1882-98.
[J Proteome Res. 2007]Annu Rev Biochem. 2000; 69():531-69.
[Annu Rev Biochem. 2000]J Virol. 2005 Nov; 79(21):13673-84.
[J Virol. 2005]J Virol. 2002 May; 76(9):4603-11.
[J Virol. 2002]J Virol. 1996 Mar; 70(3):1406-14.
[J Virol. 1996]J Proteome Res. 2007 May; 6(5):1899-916.
[J Proteome Res. 2007]Annu Rev Cell Dev Biol. 1998; 14():111-36.
[Annu Rev Cell Dev Biol. 1998]J Biol Chem. 1999 Jan 22; 274(4):2038-44.
[J Biol Chem. 1999]Traffic. 2000 Mar; 1(3):203-11.
[Traffic. 2000]Nat Rev Mol Cell Biol. 2000 Oct; 1(1):31-9.
[Nat Rev Mol Cell Biol. 2000]Nature. 1997 Jun 5; 387(6633):569-72.
[Nature. 1997]J Virol. 2003 Dec; 77(23):12543-51.
[J Virol. 2003]Nature. 1984 Dec 20-1985 Jan 2; 312(5996):767-8.
[Nature. 1984]Nature. 1984 Dec 20-1985 Jan 2; 312(5996):763-7.
[Nature. 1984]Nature. 1996 Nov 14; 384(6605):184-7.
[Nature. 1996]Nature. 1996 Nov 14; 384(6605):179-83.
[Nature. 1996]J Med Chem. 2007 Oct 4; 50(20):4898-908.
[J Med Chem. 2007]Nature. 1984 Sep 13-19; 311(5982):123-6.
[Nature. 1984]Biochemistry. 2002 May 28; 41(21):6573-82.
[Biochemistry. 2002]Mol Biol (Mosk). 1999 Jul-Aug; 33(4):679-83.
[Mol Biol (Mosk). 1999]Proc Natl Acad Sci U S A. 1986 Sep; 83(18):7023-7.
[Proc Natl Acad Sci U S A. 1986]Science. 1985 May 31; 228(4703):1091-4.
[Science. 1985]Proc Natl Acad Sci U S A. 1989 Sep; 86(17):6768-72.
[Proc Natl Acad Sci U S A. 1989]Virology. 1993 Jan; 192(1):197-206.
[Virology. 1993]J Virol. 1991 Jan; 65(1):190-4.
[J Virol. 1991]Nat Struct Biol. 1999 Feb; 6(2):141-5.
[Nat Struct Biol. 1999]J Biomol NMR. 2000 Apr; 16(4):313-27.
[J Biomol NMR. 2000]Biochemistry. 2002 May 28; 41(21):6573-82.
[Biochemistry. 2002]Nature. 1984 Sep 13-19; 311(5982):123-6.
[Nature. 1984]Genome Inform Ser Workshop Genome Inform. 1999; 10():30-40.
[Genome Inform Ser Workshop Genome Inform. 1999]Proteins. 2001 Jan 1; 42(1):38-48.
[Proteins. 2001]Proteins. 2003; 53 Suppl 6():566-72.
[Proteins. 2003]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Protein Sci. 1992 Mar; 1(3):409-17.
[Protein Sci. 1992]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]