NCBI

NCBI Gpipe Accessions

A new class of NCBI identifier

NCBI now assigns an intermediate identifier to genomic sequences that are being processed by NCBI's genome annotation pipeline. These identifiers are included in the annotated RefSeq record on the ACCESSION line and are being phased in with genomic Reference Sequences released after May 2009. The gpipe class of accessions has the format GPC_######### or GPS_######### for chromosomes and scaffolds, respectively. Gpipe accessions are assigned at the beginning of an annotation cycle to enable the records to be retrieved from the NCBI nucleotide database using the Gpipe Accession number or gi. Gpipe accessions thus support both the internal NCBI process flow and preliminary review by collaborating groups to whom gpipe accessions have been communicated prior to making information fully public.


In the flat-file view of RefSeq records the gpipe accession appears in the ACCESSION line, for example:


       LOCUS       NC_000018           78077248 bp    DNA     linear   CON 10-JUN-2009
       DEFINITION  Homo sapiens chromosome 18, GRCh37 primary reference assembly.
       ACCESSION   NC_000018 GPC_000000042
       VERSION     NC_000018.9  GI:224589809

In the ASN.1 of RefSeq records the gpipe accession appears in the sequence ID block, for example:


         Seq-entry ::= seq {
          id {
            other {
              accession "NC_000018" ,
              version 9 } ,
            gpipe {
              accession "GPC_000000042" ,
              version 1 } ,
            gi 224589809 } ,
          descr {
            title "Homo sapiens chromosome 18, GRCh37 primary reference assembly" ,
        ...

The gpipe accession does not appear in the fasta sequence identifier or definition line of NCBI RefSeqs.

Note that the gpipe accession will always have a version of '1' because the gpipe identifier is a synonym for a specific version of a sequence.

Page updated: July 21, 2009