Send to

Choose Destination
See comment in PubMed Commons below
WormBook. 2005 Sep 23:1-23.

Genomic classification of protein-coding gene families.

Author information

Division of Biology, 156-29, California Institute of Technology, Pasadena, CA 91125, USA.


This chapter reviews analytical tools currently in use for protein classification, and gives an overview of the C. elegans proteome. Computational analysis of proteins relies heavily on hidden Markov models of protein families. Proteins can also be classified by predicted secondary or tertiary structures, hydrophobic profiles, compositional biases, or size ranges. Strictly orthologous protein families remain difficult to identify, except by skilled human labor. The InterPro and NCBI KOG classifications encompass 79% of C. elegans protein-coding genes; in both classifications, a small number of protein families account for a disproportionately large number of genes. C. elegans protein-coding genes include at least approximately 12,000 orthologs of C. briggsae genes, and at least approximately 4,400 orthologs of non-nematode eukaryotic genes. Some metazoan proteins conserved in other nematodes are absent from C. elegans. Conversely, 9% of C. elegans protein-coding genes are conserved among all metazoa or eukaryotes, yet have no known functions.

[Indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for WormBook Icon for NCBI Bookshelf
    Loading ...
    Support Center