![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||
Copyright © Copyright 2002 The Protein Society Toward genomic identification of β-barrel membrane proteins: Composition and architecture of known structures Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, Louisiana 70112-2699 Reprint requests to: William C. Wimley, Department of Biochemistry SL43, Tulane University Health Sciences Center, New Orleans, LA 70112-2699; e-mail: wwimley/at/tulane.edu; fax: (504) 584-2739. Received July 18, 2001; Revised October 29, 2001; Accepted November 1, 2001. This article has been cited by other articles in PMC.Abstract The amino acid composition and architecture of all β-barrel membrane proteins of known three-dimensional structure have been examined to generate information that will be useful in identifying β-barrels in genome databases. The database consists of 15 nonredundant structures, including several novel, recent structures. Known structures include monomeric, dimeric, and trimeric β-barrels with between 8 and 22 membrane-spanning β-strands each. For this analysis the membrane-interacting surfaces of the β-barrels were identified with an experimentally derived, whole-residue hydrophobicity scale, and then the barrels were aligned normal to the bilayer and the position of the bilayer midplane was determined for each protein from the hydrophobicity profile. The abundance of each amino acid, relative to the genomic abundance, was calculated for the barrel exterior and interior. The architecture and diversity of known β-barrels was also examined. For example, the distribution of rise-per-residue values perpendicular to the bilayer plane was found to be 2.7 ± 0.25 Å per residue, or about 10 ± 1 residues across the membrane. Also, as noted by other authors, nearly every known membrane-spanning β-barrel strand was found to have a short loop of seven residues or less connecting it to at least one adjacent strand. Using this information we have begun to generate rapid screening algorithms for the identification of β-barrel membrane proteins in genomic databases. Application of one algorithm to the genomes of Escherichia coli and Pseudomonas aeruginosa confirms its ability to identify β-barrels, and reveals dozens of unidentified open reading frames that potentially code for β-barrel outer membrane proteins. Keywords: Proteomic, genomic, β-barrel, membrane protein, outer membrane, dyad repeat The β-barrel is one of two known structural motifs for membrane-spanning proteins. As many as several hundred β-barrel species can be found in the outer membrane of Gram-negative bacteria (Schulz 2000; Alm et al. 2000; Molloy et al. 2000), and they also occur in the outer membranes of mitochondria (Benz 1994) and chloroplasts (Fischer et al. 1994). In addition to these native proteins, the β-barrel motif is also used by a large, diverse set of secreted membrane permeabilizing protein toxins and antibiotics that assemble into β-barrels on exogenous membranes (Saier 2000). In a recent review, Schulz (2000) summarized the main structural features shared by all known β-barrel membrane proteins in a list of 10 explicit rules: in summary, known β-barrels are composed of an even number of membrane-spanning β-strands with an antiparallel β-meander topology. Neighboring strands in the barrel are connected by alternating long and short loops. The lipid-interacting outer surfaces of all β-barrels are hydrophobic, and have a band of aromatics near the bilayer interfaces, while the internal residues have an intermediate polarity. Known structures contain between 8 and 22 strands and include monomeric, dimeric, and trimeric β-barrels. Many of these features are apparent in the structure of the dimeric β-barrel phospholipase, OmpLA, which is shown in Figure 1 ![]()
One might assume that knowing these explicit rules would make the prediction of β-barrel structure and topology and the identification of β-barrels in genome databases readily solvable problems. In fact, several different types of structure prediction algorithms have been applied with mixed success (Schirmer and Cowan 1993; Fischbarg et al. 1995; von Heijne 1996), and recent structure prediction algorithms based on neural networks have been able to make reasonably accurate predictions of β-barrel structure and topology (Gromiha et al. 1997; Jacoboni et al. 2001). But these predictions were made for proteins already known to be β-barrel membrane proteins by other means. A more difficult part of the problem, and one that has not yet been solved, is the accurate identification of β-barrel membrane proteins in genome databases from physical principles. Currently, β-barrels are identified in genome annotations mainly by their homology to known β-barrels. Each Gram-negative bacterial genome has hundreds of "putative" and "probable" outer membrane proteins identified in this way. It would also be useful to able to identify them through their fundamental physical properties so that novel classes of β-barrels can be identified, and so that the homology-based annotation can be verified. Because each bacterial genome has as many as 1000 hypothetical or unknown proteins that have not been classified at all, there are undoubtedly many β-barrel membrane proteins that have not yet been identified. We are broadly interested in understanding β-barrel membrane proteins through a knowledge of their composition and physical properties and through parallel studies of how model β-sheets assemble in membranes (Bishop et al. 2001). In theory, a thorough understanding of the fundamental physical principles should contain sufficient information to allow researchers to determine if an unknown protein sequence is a β-barrel membrane protein. For α-helical bundle membrane proteins this idea is a proven one; prediction algorithms based on the physical principle that membrane-spanning helices will have a contiguous stretch of 19 or more hydrophobic residues, have very high accuracy (Rost et al. 1995; Casadio et al. 1996; Krogh et al. 2001), exceeding 99% in recent applications (S. Jayasinghe, K. Hristova, and S.H. White, 2001). However, β-barrel membrane proteins have been more difficult to identify from physical principles for several reasons. First, their hydrophobic, membrane-interacting residues are cryptic, hidden in the alternating inside-outside (dyad repeat) motif. Second, compared to helical membrane proteins, there are many fewer membrane-interacting residues on each strand, and this reduces the uniqueness of the membrane-spanning sequences. And third, some β-sheets in soluble proteins have, superficially, many of the same physical properties, such as similar strand length and amphipathicity as the β-sheets of β-barrel membrane proteins. In this work we set out to analyze the composition and architecture of all β-barrel membrane proteins of known structure, including many new structures, and to generate a body of data that will be a useful starting point in the rapid identification of β-barrel membrane proteins in genome databases. Results The β-barrel database All of the initial β-barrel structures published in the early 1990s belong to the closely related class of trimeric porins of 16 or 18 membrane-spanning β strands. The architecture of this class of porins has been discussed in the literature (Seshadri et al. 1998). In the last few years, the total number of known β-barrel membrane proteins has nearly doubled, and the architectural diversity of known structures has increased significantly with the addition of new β-barrel membrane proteins having different functions, topology, and architecture. For example, three-dimensional structures are now known for the monomeric, TonB-dependent transport proteins FepA (Buchanan et al. 1999) and FhuA (Locher et al. 1998), which have 22 β-strands each and for the trimeric, single-barrel transporter TolC (Koronakis et al. 2000) in which each monomer contributes four β-strands to a 12-stranded barrel. New additions also include the first known dimeric β-barrel, OmpLA (Snijder et al. 1999), shown in Figure 1 ![]() For this work we identified all β-barrel membrane proteins in the Protein Data Bank (Berman et al. 2000) and used a BLAST (Altschul et al. 1990) sequence alignment to screen each sequence against all other sequences in the PDB. For closely homologous or identical sequences (i.e., those with more than 70% conserved residues) we eliminated all but one member. The β-barrel database that we used in the calculations is described in detail in Table 1. It has 15 diverse members comprising a total of 210 membrane-spanning β-strands with more than 2000 amino acids in the membrane-spanning segments.
Identification of membrane-spanning segments Three features, which are present in all β-barrel structures, were used to align the XY plane of each protein's Cartesian coordinates with the putative plane of the bilayer: the band of aromatics that lies in the bilayer interfacial region (Schiffer et al. 1992; von Heijne 1994; Yau et al. 1998), the band of charged residues just outside of the aromatics, and the band of aliphatic residues that interact with the hydrocarbon core of the bilayer (see Fig. 1 ![]() After aligning the structures along the bilayer normal, we identified all β-strands in each structure using the annotation in the PDB datafile, and we identified the β-strands that span the membrane by inspection of molecular graphics images. One additional residue beyond the designated membrane-spanning β-sheet was also included in each strand segment. Residues in a membrane-spanning strand were designated as either exposed, internal, or involved in protein–protein interfaces. Exposed residues were those whose Cα to Cβ vector extended away from the axis of the barrel and whose side chain was more than 50% "solvent" exposed on the barrel surface. Internal residues were those whose Cα to Cβ vector pointed towards the interior of the barrel. The geometry of β-sheet secondary structure places side chains on alternating inner and outer surfaces of the β-sheet so this distinction is unambiguous. We classified the numerous glycine residues in the β-barrel database by the orientation of their Cα-H vectors and the exposure of the α carbon. We did not differentiate between internal residues that were exposed to water within an aqueous pore or those that were buried in the protein. Residues in protein–protein contacts were those residues whose Cα to Cβ vector was oriented out from the barrel axis, but whose side chain was not exposed in the multimer structure because of protein–protein contacts. Because we are trying to characterize and exploit the unique physical properties of the membrane-interacting surfaces of these proteins, we have excluded the residues in protein–protein contacts from the database. The properties and composition of these residues, which are similar to protein–protein interfaces in soluble proteins, have been discussed (Seshadri et al. 1998). Identification of the bilayer midplane with hydrophobicity profiles Hydrophobicity profiles for the external and internal residues for all XY-aligned structures were calculated by summing the hydrophobicity of all β-strand residues within a 5-Å sliding window that was moved along the axis of the bilayer normal. Examples of hydrophobicity profiles for external residues are shown in Figure 2A and B ![]() ![]()
The midpoint of the negative ΣΔG band, as delineated by the crossover points, was taken to be the midpoint of the bilayer. We transformed the coordinates of the β-barrel structures so that the bilayer midplane for all structures was set to z = 0. This places all of the proteins in the database on a universal "bilayer" coordinate system. The transbilayer profiles for all of the β-barrel proteins in the database (e.g., Fig. 2A,B) were remarkably similar. Composite profiles calculated from the sum of all the β-barrels are shown in Figure 3A and B ![]() ![]()
Composition of β-barrels The β-barrel database contains 1592 amino acids in membrane-spanning β-barrels that are either exposed or internal and about 400 additional residues that are found at protein–protein interfaces. Raw abundance (Fig. 4 ![]() ![]()
The information content of an amino acid abundance measurement such as those shown in Figure 4A and B ![]() ![]() ![]() ![]()
Architecture of β-barrels The goal of this work is to obtain information from known β-barrels that will be useful in characterizing unknown sequences in genome databases. Thus, we also need to explore the architecture and architectural diversity of known structures. The most relevant architectural variable is the rise per residue of the β-strands along the direction normal to the bilayer plane. Simulations have shown that the shear number and tilt angle of β-barrels can vary within certain bounds (Murzin et al. 1994; Sansom and Kerr 1995), as reflected in the known structures. Although the maximum possible rise per residue is about 3.6 Å for a β-strand perpendicular to the bilayer, known structures (Schulz 2000) and theory (Sansom and Kerr 1995) suggest that tilted strands are energetically preferred. We determined the distribution of β-barrel rise per residue values at the bilayer midplane by calculating the value, over the three residues closest to the midplane, for each membrane-spanning strand. The results, shown in Figure 6 ![]()
We also calculated the distribution of loop length in the β-barrels in the database. These data are shown in Figure 7 ![]() ![]() ![]()
Discussion Uniqueness of membrane β-barrel dyad repeats Membrane-spanning β-strands, like all β-sheets, have a dyad repeat topology in which alternating residues are oriented toward alternating faces of the sheet. In β-barrel membrane proteins about half of the membrane-spanning residues are hydrophobic residues that are oriented toward the membrane lipids, while the other half are more hydrophilic residues that are oriented towards the interior of the barrel. Several β-barrel identification algorithms have been developed, in part, on the idea that membrane β-barrels could be recognizable through the dyad repeat of hydrophobic (external) and hydrophilic (internal) residues (e.g., Fischbarg et al. 1995). However, difficulties arise when genome databases are screened for β-barrel membrane proteins using this simple idea because the interior of membrane-spanning β-barrels are not necessarily very hydrophilic, and because many soluble β-sheets also have a similar dyad repeat motif in which one hydrophobic face of a sheet is buried and one hydrophilic face is more exposed to the aqueous phase. Our goal in this work was to use the known β-barrels to generate a data set based on the observed abundance of the amino acids and the architecture of β-barrel membrane proteins that will further help to differentiate β-barrel membrane proteins from the abundant amphipathic β-sheets of soluble proteins. From the strand length distribution shown in Figure 6 ![]()
![]() ![]()
β-barrel profiles An example of a 10 residue sliding window score profile using the abundance data in Table 2 is shown in Figure 10A ![]() ![]()
To improve the ability to rapidly recognize β-barrels in genome databases and to simplify the sliding window average, we also incorporated the architectural data (Figs. 6 ![]() ![]() ![]() ![]() ![]() ![]() Screening of genomic data These analyses are being conducted so that we can begin to develop methods for rapidly identifying potential β-barrels in genome databases. Potential β-barrels can then be further analyzed with neural network-based structure prediction algorithms (Gromiha et al. 1997; Jacoboni et al. 2001) and with molecular biology and proteomics tools (Molloy et al. 2000). A rapid genomic screening algorithm requires a simple parameterization or scoring of each protein sequence. One feature we expect to find in all β-barrel membrane proteins is a set of roughly 5 to 15 peaks in the β-hairpin analysis like that in Figure 10B ![]() ![]() ![]()
Using this simple and rapid scoring algorithm we have begun to analyze the whole genomes of Gram-negative bacteria. Here we discuss preliminary results from the genomes of Escherichia coli and Pseudomonas auriginosa as examples. After scoring and ranking all the open reading frames in these two genomes, we examined the 125 highest scoring proteins for each genome. These proteins, which represent about 2.5% of all open reading frames, fall between 1.7 and 5.5 in β-barrel score (Fig. 11 ![]()
Conclusions We have analyzed the amino acid composition and architecture of all β-barrel membrane proteins of known structure. These data have been used to develop a simple algorithm for rapidly screening genomes for potential β-barrel membrane proteins. Application of this algorithm to the genomes of the Gram-negative bacteria Escherichia coli and Psedomonas auriginosa has revealed dozens of potential β-barrel membrane proteins that have previously not yet been identified or annotated as such. Future experiments will be directed toward refinement of the screening algorithm and toward application of proteomics methods to determine if the potential β-barrels that we have identified can be expressed as β-barrel membrane proteins in bacterial outer membranes. Materials and methods Transformation of PDB coordinates to the bilayer plane Each protein's XYZ PDB coordinates were transformed to align the "bilayer plane" of the protein with the XY plane of the coordinate system. First, the PDB coordinate file was converted to a kinemage file using PreKin (Richardson and Richardson 1994). With the program Mage (Richardson and Richardson 1994) we viewed the kinemage and used the position of the external aromatics, aliphatics, and charged residues to align each protein with the XY plane. The transformation matrix was obtained from Mage and used in a modified version of the program KinPlot (Wimley et al. 1994) to transform the coordinates and rewrite them in PDB format. The output of this procedure is a PDB format file in which the plane of the bilayer is coincident with the XY plane of the atomic coordinate system. Alignment of the proteins along the z-axis is described in the text. All the software used in this work that is not publicly obtainable is available from the author upon request. Hydrophobicity profiles Hydrophobicity profiles were calculated over a 5-Å sliding average window, which was moved across the protein in the bilayer coordinate system along a line normal to the bilayer. The "location" of each residue was taken to be the XYZ coordinates of the β-carbon, or the α-carbon for glycine. We examined the differences that would occur in the locations of long polar side chains, such as lysine, if we instead used the position of the polar side-chain moiety, but we found only small net differences from the position of the β-carbon (~1 Å or less). The octanol hydrophobicity scale, which has been discussed in detail elsewhere (Wimley et al. 1996; White and Wimley 1998 White and Wimley 1999) is based on the partitioning of peptides of the form AcWL-X-LL into bulk octanol. The scale is less permissive of polar residues, and appears to be a good scale for mimicking the environment of membrane proteins. Electronic supplemental material Electronic supplemental material consists of tabulated amino acid abundance data (Table 2) and tables of sorted β-barrel scores for the complete genomes of the two Gram-negative bacteria discussed in the text: Escherichia coli and Pseudomonas aeruginosa. After the file header, the genomic data are given in five columns: β-barrel score (sorted), protein length, number of peaks in the β-hairpin score greater than 4.0 (Fig. 10 ![]() Acknowledgments The New Orleans Protein Folding Intergroup is gratefully acknowledged for many invaluable discussions, and we thank Samuel J. Landry and William F. Walkenhorst for critically reading the manuscript. We are indebted to Dr. Harald Engelhardt (Max-Planck Institute for Biochemistry, Munich) for sending the coordinates of Omp32 before their release from the PDB. Funded by NIH (GM60000) and the Louisiana Board of Regents Support Fund 1999-02-RD-A-43. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact. Notes Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.29402 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||
Curr Opin Struct Biol. 2000 Aug; 10(4):443-7.
[Curr Opin Struct Biol. 2000]Infect Immun. 2000 Jul; 68(7):4155-68.
[Infect Immun. 2000]Eur J Biochem. 2000 May; 267(10):2871-81.
[Eur J Biochem. 2000]Biochim Biophys Acta. 1994 Jun 29; 1197(2):167-96.
[Biochim Biophys Acta. 1994]J Biol Chem. 1994 Oct 14; 269(41):25754-60.
[J Biol Chem. 1994]Protein Sci. 1993 Aug; 2(8):1361-3.
[Protein Sci. 1993]J Membr Biol. 1995 Feb; 143(3):177-88.
[J Membr Biol. 1995]Protein Eng. 1997 May; 10(5):497-500.
[Protein Eng. 1997]Protein Sci. 2001 Apr; 10(4):779-87.
[Protein Sci. 2001]J Mol Biol. 2001 Jun 15; 309(4):975-88.
[J Mol Biol. 2001]Protein Sci. 1995 Mar; 4(3):521-33.
[Protein Sci. 1995]Eur Biophys J. 1996; 24(3):165-78.
[Eur Biophys J. 1996]J Mol Biol. 2001 Jan 19; 305(3):567-80.
[J Mol Biol. 2001]Protein Sci. 1998 Sep; 7(9):2026-32.
[Protein Sci. 1998]Cell. 1998 Dec 11; 95(6):771-8.
[Cell. 1998]Nature. 2000 Jun 22; 405(6789):914-9.
[Nature. 2000]Nature. 1999 Oct 14; 401(6754):717-21.
[Nature. 1999]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Protein Eng. 1992 Apr; 5(3):213-4.
[Protein Eng. 1992]Annu Rev Biophys Biomol Struct. 1994; 23():167-92.
[Annu Rev Biophys Biomol Struct. 1994]Biochemistry. 1998 Oct 20; 37(42):14713-8.
[Biochemistry. 1998]Protein Sci. 1998 Sep; 7(9):2026-32.
[Protein Sci. 1998]Biochemistry. 1996 Apr 23; 35(16):5109-24.
[Biochemistry. 1996]Biophys J. 1992 Feb; 61(2):434-47.
[Biophys J. 1992]Protein Sci. 1998 Sep; 7(9):2026-32.
[Protein Sci. 1998]J Mol Biol. 1994 Mar 11; 236(5):1369-81.
[J Mol Biol. 1994]Biophys J. 1995 Oct; 69(4):1334-43.
[Biophys J. 1995]Curr Opin Struct Biol. 2000 Aug; 10(4):443-7.
[Curr Opin Struct Biol. 2000]J Membr Biol. 1995 Feb; 143(3):177-88.
[J Membr Biol. 1995]Nature. 2001 Jan 25; 409(6819):529-33.
[Nature. 2001]Biochemistry. 1996 Apr 23; 35(16):5109-24.
[Biochemistry. 1996]Biochemistry. 1995 Mar 14; 34(10):3430-7.
[Biochemistry. 1995]Eur Biophys J. 1996; 24(3):165-78.
[Eur Biophys J. 1996]J Mol Biol. 2001 Jan 19; 305(3):567-80.
[J Mol Biol. 2001]Protein Eng. 1997 May; 10(5):497-500.
[Protein Eng. 1997]Protein Sci. 2001 Apr; 10(4):779-87.
[Protein Sci. 2001]Eur J Biochem. 2000 May; 267(10):2871-81.
[Eur J Biochem. 2000]Trends Biochem Sci. 1994 Mar; 19(3):135-8.
[Trends Biochem Sci. 1994]Protein Sci. 1994 Sep; 3(9):1362-73.
[Protein Sci. 1994]Biochemistry. 1996 Apr 23; 35(16):5109-24.
[Biochemistry. 1996]Biochim Biophys Acta. 1998 Nov 10; 1376(3):339-52.
[Biochim Biophys Acta. 1998]Nucleic Acids Res. 2000 Jan 1; 28(1):235-42.
[Nucleic Acids Res. 2000]Nature. 2001 Jan 25; 409(6819):529-33.
[Nature. 2001]Nature. 2000 Aug 31; 406(6799):959-64.
[Nature. 2000]Nature. 1999 Oct 14; 401(6754):717-21.
[Nature. 1999]Biochemistry. 1996 Apr 23; 35(16):5109-24.
[Biochemistry. 1996]Nature. 2001 Jan 25; 409(6819):529-33.
[Nature. 2001]Nature. 2001 Jan 25; 409(6819):529-33.
[Nature. 2001]