Send to

Choose Destination
J Mol Biol. 1987 Jul 5;196(1):199-216.

Determinants of a protein fold. Unique features of the globin amino acid sequences.

Author information

MRC Laboratory of Molecular Biology, Cambridge, England.


The three-dimensional structures of globins are known, from crystallographic analyses, to be very similar. Their amino acid sequences, however, differ greatly. Only two residues are absolutely conserved in all sequences, and the residue identities of some pairs of sequences are only 16%. We have determined the nature and exact extent of the sequence variations and the extent to which the conserved features of the globin sequences are unique to this family. The 226 globin sequences now known were aligned and analysed. Because distantly related protein sequences cannot be aligned correctly without the use of structural data, we developed a method that incorporated structural information into the alignment procedure. Analysis of the aligned sequences show that: (1) Although individual chains vary in size between 132 and 157 residues, deletions and insertions result in there being only 102 residue sites common to all globins. These sites form six separate regions. Insertions and deletions between these regions means that their separations can vary in different sequences. (2) Within the conserved regions there are 32 sites that almost always contain hydrophobic residues. In the known structures, these sites are in the protein interior. We measured the variations in the size of the residues that occur in the 226 sequences at these sites. At six sites the residues differ in size by less than 40 A3, at 11 sites they differ by 40 to 100 A3, and at 15 sites they differ by more than 100 A3. There are two other conserved buried sites: one contains the His linked to the haem iron and the other usually contains a His involved with the haem ligand. (3) Within the conserved regions there are another 32 sites that are almost always occupied by charged, polar or small non-polar (Gly or Ala) residues. In the known structures, these sites are on the protein surface. To determine the extent to which the conserved features found for the globin sequences are unique to that protein family, the following procedure was used. The six conserved regions, and the residue restrictions that occur at the 66 sites within these regions, were encoded into two "templates". One was based only on the sequences so far determined; the other was extended to include as yet unobserved substitutions that seemed plausible on the basis of size, hydrophobicity and polarity. Each of the 3286 non-globin sequences in the data bank was then examined by a computer program to see how closely it could be matched to these templates.(ABSTRACT TRUNCATED AT 400 WORDS).

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center