Send to

Choose Destination
See comment in PubMed Commons below
Mol Immunol. 2010 Jan;47(4):694-700. doi: 10.1016/j.molimm.2009.10.028. Epub 2009 Nov 24.

A germline knowledge based computational approach for determining antibody complementarity determining regions.

Author information

  • 1R&D Informatics, Centocor Discovery Research, San Diego, CA 92121, USA.


Determination of framework regions (FRs) and complementarity determining regions (CDRs) in an antibody is essential for understanding the underlying biology as well as antibody engineering and optimization. However, there are no computational algorithms available to delimit an antibody sequence or a library of sequences into FRs and CDRs in a coherent and automatic fashion. Based upon the mapping relationships among mature antibody sequences and their corresponding germline gene segments, a novel computational algorithm has been developed for automatic determination of CDRs. Even though a human can make more than 10(12) different antibody molecules in its preimmune repertoire to fight off invading pathogens, these antibodies are generated from rearrangements of a very limited number of germline variable (V) gene, diversity (D) gene and joining (J) gene segments followed by somatic hypermutation. The framework regions FR1, FR2 and FR3 in mature antibodies are encoded by germline V gene segments, while FR4 is encoded by J gene segments. Since there are only a limited number of germline gene segments, these genes can be pre-delimited to generate a knowledge base of FRs and CDRs. Then for a given antibody sequence, the algorithm scans each pre-delimited gene in knowledge base, finds the best matching V and J segments, and accordingly, identifies the FRs and CDRs. The described algorithm is stringently tested using nearly 25,000 human antibody sequences from NCBI, and it is proven to be very robust. Over 99.7% of antibody sequences can be delimited computationally. Of those delimited sequences, only 0.28% of them have somatic insertions and deletions in FRs, and their corresponding delimited results need manual checking. Another feature of the algorithm is that it is CDR definition independent, and can be easily extended to other CDR definitions besides the most widely used Kabat, Chothia and IMGT definitions. In addition to delimitation of antibody sequences into FRs and CDRs, the described algorithm is good for sequence annotation and sequence quality control by detecting unusual sequence patterns and features. Furthermore, it has been suggested that the algorithm may easily be embedded into other applications, such as to create a gene family specific PSSM (Position Specific Scoring Matrix) for antibody engineering, and to automatically number an antibody sequence.

Copyright 2010 Elsevier Ltd. All rights reserved.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Write to the Help Desk