Logo of narLink to Publisher's site
Nucleic Acids Res. 2006 Jan 1; 34(Database issue): D777–D780.
Published online 2005 Dec 28. doi:  10.1093/nar/gkj053
PMCID: PMC1347416

Epitome: database of structure-inferred antigenic epitopes


Immunoglobulin molecules specifically recognize particular areas on the surface of proteins. These areas are commonly dubbed B-cell epitopes. The identification of epitopes in proteins is important both for the design of experiments and vaccines. Additionally, the interactions between epitopes and antibodies have often served as a model for protein–protein interactions. One of the main obstacles in creating a database of antigen–antibody interactions is the difficulty in distinguishing between antigenic and non-antigenic interactions. Antigenic interactions involve specific recognition sites on the antibody's surface, while non-antigenic interactions are between a protein and any other site on the antibody. To solve this problem, we performed a comparative analysis of all protein–antibody complexes for which structures have been experimentally determined. Additionally, we developed a semi-automated tool that identified the antigenic interactions within the known antigen–antibody complex structures. We compiled those interactions into Epitome, a database of structure-inferred antigenic residues in proteins. Epitome consists of all known antigen/antibody complex structures, a detailed description of the residues that are involved in the interactions, and their sequence/structure environments. Interactions can be visualized using an interface to Jmol. The database is available at http://www.rostlab.org/services/epitome/.


Protein–antigen structures

Antigen–antibody complexes have long been used as a model for understanding the general phenomenon of molecular recognition (15). The number of experimental high-resolution 3D structures of antibody–antigen complexes in the PDB (6) has significantly increased over the last years. Several groups have used these data to analyze and characterize antigenic interactions, i.e. interactions between the protein (the antigen) and the Complementarity Determining Regions (CDRs) of the antibody (7,8). An important first step in studying antigenic interactions is the characterization of CDRs. MacCallum et al. (8) observed that the hypervariable loops of CDRs adopt only a limited number of backbone conformations that are determined by a few key residues. Two recent studies have suggested that the amino acid composition and the length of CDRs determine the type of antigen that can be bound (9,10). Several studies have attempted to differentiate the residues on the antigen surface that are involved in the antigenic interaction from all others (5,7,11). The results of these studies were rather inconsistent. Differences in the data sets chosen (some of which were very small) and in the methodologies may explain some of those inconsistencies. Most importantly, however, the definitions of the CDRs often differed greatly, i.e. if two studies investigate the same PDB complex and use the same methodology, they might disagree on which of the interactions are antigenic (7). An important ramification of this problem was unveiled by Blythe and Flower (12), who showed that most existing B-cell epitope prediction methods do not work adequately. One explanation for this observation could be that most methods rely on inaccurate identifications of epitopes.

Definition of the CDRs

Antibodies are composed of a skeleton of beta-sheets. Most of the amazing variety of antibodies is realized by differences in six hypervariable loops of the CDRs. Therefore, the CDRs have previously been defined through these six loops. The first definition of CDRs was as regions in the Kabat sequence variability plot (13,14). The residues in these regions are identified through an alignment between the query sequence and a consensus motif for antibodies. Although widely used, the Kabat CDR-definitions can be problematic because CDRs that are in structural loops often have very unusual sequences that are not captured by regular sequence motifs (15). In fact, any method based only on sequence information is prone to misaligning and therefore mis-assigning loopy CDRs. Chothia and co-workers (16) therefore based their CDR identification on structural information. Initially, hypervariable loops were defined according to a few structures. Later, the numbering of the residues that was used to locate the CDRs was changed to account for structures that became available subsequently (17). Studies also differ in their definition of secondary structures, thereby increasing the inconsistency in defining hypervariable loops. Additional disadvantages of both the Kabat and Chothia et al. method are described elsewhere (http://www.bioinf.org.uk/abs/).

Here, we address these problems through a comprehensive study of all known antigen–antibody complexes in the PDB. Analyzing the structures, we identified the consensus residues on the antibodies and thereby identified the CDRs on all known protein–antibody complexes (details below). This initial set of CDRs facilitated the automatic generation of a database with all known antigenic residues in the PDB; we also included the sequence environment and a detailed description of the CDR with which they interact. Several databases of antibody–antigen complex structures are available (15,18,19). Some of these databases focus on the structural aspects of the interaction (19,20). There are also databases that compile B-cell epitopes without their corresponding antibodies (12,21). However, none of these databases explicitly locates the CDRs or identifies the antigenic residues semi-automatically. In this sense, our resource is more comprehensive and easily adjustable to growing data, as more 3D structures of antigen–antibody complexes become available. Thus, the databases mentioned above, particularly the ones that are not structure based, are complementary to Epitome.


Extraction of 3D structures and identification of CDRs

In order to identify all structures in the PDB that contain at least one antibody–antigen complex, we searched with BLAST (22) for a consensus sequence of an antibody against the PDB. The rationale for using BLAST rather than PSI-BLAST was to avoid capturing molecules such as T-cell receptors which, despite their similarity to antibodies, participate in cell-mediated immune response, and therefore represent a different type of antigenic interaction. We then added PDB structures that contain an immunoglobulin fold from the Structural Classification of Proteins database (SCOP) (23) and PDB entries that are identified as antibody–antigen complexes through keywords (e.g. ‘antibody’ and ‘antigen’). We discarded all complexes with T-cell receptors or MHC molecules, since these are formed during cell-mediated immune response. We labeled residues as interacting if any of their respective atoms were within a sphere of ≤6Å (24). This resulted in our final list of interactions between antibodies and antigens. Thus, we define antibody–antigen interaction as spatial proximity between a residue within the CDRs and a residue on the surface of the antigenic protein.

We located the CDRs in the known protein–antibody complexes through the following knowledge-based approach. We began by creating multiple structure alignments of antibody structures using SKA (25,26). Since the light and heavy chains have different CDRs, two different multiple structure alignments were performed corresponding to each type of antibody chain. Additionally, due to the fact that our database included several redundant sequences, we ran the structural alignment program on a sequence-unique subset of all protein–antibody complexes. As antibody sequences are highly similar to each other, the criteria for the redundancy of the complex set was determined by the antigen sequences; sequence redundancy was reduced at HSSP-values of 0 (corresponding to <33% pairwise sequence identity for long alignments) (2730). Then, we identified structurally aligned positions that interact with a protein in more than 10% of the complexes of the alignment. We defined the borders of the CDRs through those highly populated positions. Given the CDRs in the aligned antibodies, we transferred their location to the antibody chains of the corresponding sequence–structure family that they represent by structural pairwise alignments using Combinatorial Extension (CE) (31) (Figure 1). Finally, we defined all the residues on the protein surface that are in contact with the residues on the antibody CDRs as antigenic residues.

Figure 1
Antigenic residues according to Epitome. Complex structure of quail lysozyme (in blue) and the light chain of an antibody (in green), as taken from PDB ID 1bql (33). The residues that are defined to be in CDR 1 of the light chain according to Kabat definition ...

Content statistics

Epitome currently contains 142 antigens from protein–antibody complex structures with a current total of 10 180 antigenic interactions. A total of 63 of the complexes consist of antigens that are sequence-unique, i.e. 63 are such that no other antigen in the database has a level of sequence similarity to any other of the 63 that would enable coarse-grained homology modeling.

Input and fields

Epitome users can search for epitopes either by querying the database or by entering a sequence and ‘BLASTing’ for similar sequences that are stored in the database. The fields that can be queried include one or more of the following: PDB identifier (four-letter code used by the PDB, e.g. 1pdb); Antigen chain ID (PDB identifier for the chain of the antigen, e.g. 1pdb_C), antigen residue type (one letter code for amino acids, e.g. Y corresponds to Tyrosine), antigen residue secondary structure state as defined by DSSP (32) (1 letter code; GHI corresponds to helical structures, EB to strands and TSL to other), antigen residue solvent accessibility (the input is the accessible surface in Å2 as defined by DSSP (32) and the search is on all residues with accessibility values that are bigger or equal to the input value), antigen residue position (the residue number as annotated in the PDB file), heavy/light chain (the interaction involves residues that are located either on the light or the heavy or both chains of the antibody), antibody chain identifier (similar to the antigen chain identifier), antibody residue type (one letter code for amino acids, e.g. C corresponds to Cysteine), antibody residue position in the PDB (the position of the antibody residue that is involved in the interaction as annotated by the PDB) and CDR number (possible values: 1, 2, 3).


Results for database queries are presented as a table that lists all features of the result sets (Figure 2). The antigen results include the residues in the environment of the antigen (highlighted in red). If a user performs a BLAST sequence search against the Epitome database to find PDB structures containing antigens with similar sequences, the output will be all complex structures consisting of proteins with high degree of similarity to the input sequence, the corresponding E-value and BLAST score of the pairwise sequence alignments. Additionally, each PSI-BLAST hit contains a link that can trigger another database query.

Figure 2
Screenshot of a database entry. Each line of the table represents different antigenic interaction, i.e. interaction of a protein surface residue with an antibody surface residue that is located on one of the antibody's 6 CDRs. Note that the search could ...


Since most Epitome entries were identified using the SCOP database, Epitome updates will follow updates of SCOP, i.e. Epitome will be updated twice a year as soon as SCOP updates its parseable files. Additionally, all the other programs used to create the database are installed locally and can be run automatically.


Thanks to Jinfeng Liu (Columbia) for computer assistance and to Andrew Kernytsky and Henry Bigelow for helpful comments on the manuscript. Thanks also to the anonymous reviewer for an immensely supportive, helpful and enjoyable critique. This work was supported by the grants RO1-GM64633-01 from the National Institutes of Health (NIH), and RO1-LM07329-01 from the National Library of Medicine (NLM). Last, not least, thanks to Helen Berman (Rutgers), Phil Bourne (UCSD) and their crews for maintaining an excellent PDB, and to all experimentalists who enabled this analysis by making their data publicly available. Funding to pay the Open Access publication charges for this article was provided by the National Library of Medicine (NLM).

Conflict of interest statement. None declared.


1. Jones S., Thornton J.M. Prediction of protein-protein interaction sites using patch analysis. J. Mol. Biol. 1997;272:133–143. [PubMed]
2. Lo Conte L., Chothia C., Janin J. The atomic structure of protein–protein recognition sites. J. Mol. Biol. 1999;285:2177–2198. [PubMed]
3. Chen R., Mintseris J., Janin J., Weng Z. A protein–protein docking benchmark. Proteins. 2003;52:88–91. [PubMed]
4. Jones S., Thornton J.M. Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 1997;272:121–132. [PubMed]
5. Jones S., Thornton J.M. Principles of protein-protein interactions. Proc. Natl Acad. Sci. USA. 1996;93:13–20. [PMC free article] [PubMed]
6. Berman H.M., Westbrook J., Feng Z., Gillliland G., Bhat T.N., et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
7. Davies D.R., Cohen G.H. Interactions of protein antigens with antibodies. Proc. Natl Acad. Sci. USA. 1996;93:7–12. [PMC free article] [PubMed]
8. MacCallum R.M., Martin A.C., Thornton J.M. Antibody–antigen interactions: contact analysis and binding site topography. J. Mol. Biol. 1996;262:732–745. [PubMed]
9. Collis A.V., Brouwer A.P., Martin A.C. Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen. J. Mol. Biol. 2003;325:337–354. [PubMed]
10. Almagro J.C. Identification of differences in the specificity-determining residues of antibodies that recognize antigens of different size: implications for the rational design of antibody repertoires. J. Mol. Recognit. 2004;17:132–143. [PubMed]
11. Van Regenmortel M.H.V. Structure of Antigens. CRC Press, Inc.; 1992. 2000 Corporate Blvd, N.W., Boca Raton, Florida 33431.
12. Blythe M.J., Flower D.R. Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Sci. 2005;14:246–248. [PMC free article] [PubMed]
13. Wu T.T., Kabat E.A. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J. Exp. Med. 1970;132:211–250. [PMC free article] [PubMed]
14. Johnson G., Wu T.T. Kabat Database and its applications: 30 years after the first variability plot. Nucleic Acid Res. 2000;28:214–218. [PMC free article] [PubMed]
15. Allcorn L.C., Martin A.C. SACS—self-maintaining database of antibody crystal structure information. Bioinformatics. 2002;18:175–181. [PubMed]
16. Chothia C., Lesk A.M., Tramontano A., Levitt M., Smith-Gill S.J., Air G., Sheriff S., Padlan E.A., Davies D., Tulip W.R. Conformations of immunoglobulin hypervariable regions. Nature. 1989;342:877–883. [PubMed]
17. Al-Lazikani B., Lesk A.M., Chothia C. Standard conformations for the canonical structures of immunoglobulins. J. Mol. Biol. 1997;273:927–948. [PubMed]
18. Saha S., Bhasin M., Raghava G.P. Bcipep: a database of B-cell epitopes. BMC Genomics. 2005;6:79. [PMC free article] [PubMed]
19. Peters B., Sidney J., Bourne P., Bui H.H., Buus S., Doh G., Fleri W., Kronenberg M., Kubo R., Lund O., et al. The design and implementation of the immune epitope database and analysis resource. Immunogenetics. 2005;57:326–336. [PubMed]
20. Kaas Q., Ruiz M., Lefranc M.P. IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data. Nucleic Acids Res. 2004;32:D208–D210. [PMC free article] [PubMed]
21. McSparron H., Blythe M.J., Zygouri C., Doytchinova I.A., Flower D.R. JenPep: a novel computational information resource for immunobiology and vaccinology. J. Chem. Inf. Comput. Sci. 2003;43:1276–1287. [PubMed]
22. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
23. Murzin A.G., Brenner S.E., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. [PubMed]
24. Ofran Y., Rost B. Analysing six types of protein–protein interfaces. J. Mol. Biol. 2003;325:377–387. [PubMed]
25. Petrey D., Xiang Z., Tang C.L., Xie L., Gimpelev M., Mitros T., Soto C.S., Goldsmith-Fischman S., Kernytsky A., Schlessinger A., et al. Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins. 2003;53(Suppl. 6):430–435. [PubMed]
26. Petrey D., Honig B. GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 2003;374:492–509. [PubMed]
27. Mika S., Rost B. UniqueProt: creating representative protein sequence sets. Nucleic Acid Res. 2003;31:3789–3791. [PMC free article] [PubMed]
28. Rost B. Twilight zone of protein sequence alignments. Protein Eng. 1999;12:85–94. [PubMed]
29. Sander C.S.R. Database of homology-derived structures and the structural meaning of sequence alignment. Proteins. 1991;9:56–68. [PubMed]
30. Schneider R., de Daruvar A., Sander C. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1997;25:226–230. [PMC free article] [PubMed]
31. Shindyalov I.N., Bourne P.E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998;11:739–747. [PubMed]
32. Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;12:2577–2637. [PubMed]
33. Chacko S., Silverton E.W., Smith-Gill S.J., Davies D.R., Shick K.A., Xavier K.A., Willson R.C., Jeffrey P.D., Chang C.Y., Sieker L.C., et al. Refined structures of bobwhite quail lysozyme uncomplexed and complexed with the HyHEL-5 Fab fragment. Proteins. 1996;26:55–65. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...