NCBI Prokaryotic Genome Annotation Pipeline

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume.  NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

The NCBI prokaryotic annotation pipeline is available as a service for GenBank submitters. The pipeline is capable of annotating both complete genomes and draft WGS genomes consisting of multiple contigs.  You can request PGAP annotation when you submit your genome to GenBank.

Both WGS and non-WGS genomes, including gapless complete bacterial chromosomes, can be submitted via the Submission Portal. You will be asked to choose whether the genome being submitted is considered WGS or not. The differences for GenBank purposes are: non-WGS     Each chromosome is in a single sequence and there are no extra sequences     Each sequence in the genome must be assigned to a chromosome or plasmid or organelle     Plasmids and organelles can still be in multiple pieces. WGS     One or more chromosomes are in multiple pieces and/or some sequences are not assembled into chromosomes In both cases:     There can still be gaps within the sequences; you will supply that information in the submission     Plasmids and organelles can still be in multiple pieces.     Internal sequences must be arranged in the correct order and orientation.     Sequences concatenated in unknown order are not allowed.     Submission is through the Genome Submission Portal. See the genome submission instructions page for details.


All RefSeq bacterial and archaeal genomes, with the exception of RefSeq Prokaryotic Reference Genomes, are annotated using NCBI's prokaryotic genome annotation pipeline. Additional information on this policy is available here:

For information about RefSeq Eukaryotic genomes, please see: Eukaryotic Genome Annotation

Last updated: 2017-11-13T22:58:45Z