Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
Math Biosci. 2005 Feb;193(2):223-34.

Percolation of annotation errors through hierarchically structured protein sequence databases.

Author information

  • 1Medical Research Council Biostatistics Unit, Institute of Public Health, University of Forvive Site, Robinson Way, Cambridge CB2 2SR, UK. wally.gilks@mrc-bsu.cam.ac.uk


Databases of protein sequences have grown rapidly in recent years as a result of genome sequencing projects. Annotating protein sequences with descriptions of their biological function ideally requires careful experimentation, but this work lags far behind. Instead, biological function is often imputed by copying annotations from similar protein sequences. This gives rise to annotation errors, and more seriously, to chains of misannotation. [Percolation of annotation errors in a database of protein sequences (2002)] developed a probabilistic framework for exploring the consequences of this percolation of errors through protein databases, and applied their theory to a simple database model. Here we apply the theory to hierarchically structured protein sequence databases, and draw conclusions about database quality at different levels of the hierarchy.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Write to the Help Desk