NCBI Logo

Accession Number prefixes: Where are the sequences from?

spacer

Site Map

GenBank

BankIt

Sequin

Third Party Annotation

International Nucleotide Sequence Database Collaboration

spacer

color borderDDBJ/EMBL/GenBank Accession Prefix Format

The format for GenBank Accession numbers are:

Nucleotide:1 letter + 5 numerals OR 2 letters + 6 numerals
Protein:3 letters + 5 numerals
WGS:4 letters + 2 numerals for WGS assembly version + 6-8 numerals
MGA:5 letters + 7 numerals

The International Nucleotide Sequence Database Collaboration DDBJ/EMBL/GenBank all receive sequence submissions, assign accessions, and exchange data so that all three groups represent the total collection. The accession assignment process is managed by prior agreement within the collaboration on which group will 'own' which accession prefix. This list of accession number prefixes should be used as a guide. There are cases where these assignments are not adhered to. For instance, there are ESTs and GSSs from GenBank that have the prefix for Direct submissions.

Allocation of Accession Prefixes

Nucleotide Accession Prefixes
Prefix Database Type
BA,DF,DG,LD DDBJ CON division
AN EMBL CON division
CH,CM,DS,EM, EN,EP,EQ,FA, GG,GL,JH,KB, KD,KE,KI,KK, KL,KN,KQ,KV, KZ NCBI CON division
C,AT,AU,AV,BB, BJ,BP,BW,BY,CI, CJ,DA,DB,DC, DK,FS,FY,HX, HY,LU DDBJ EST
F EMBL EST
H,N,T,R,W,AA,AI, AW,BE,BF,BG, BI,BM,BQ,BU, CA,CB,CD,CF, CK,CN,CO,CV, CX,DN,DR,DT, DV,DW,DY,EB, EC,EE,EG,EH, EL,ES,EV,EW, EX,EY,FC,FD, FE,FF,FG,FK, FL,GD,GE,GH, GO,GR,GT,GW, HO,HS,JG,JK, JZ GenBank EST
D,AB,LC DDBJ Direct submissions
V,X,Y,Z,AJ,AM, FM,FN,HE,HF, HG,FO,LK,LL, LM,LN,LO,LP, LQ,LR,LS,LT EMBL Direct submissions
U,AF,AY,DQ,EF, EU,FJ,GQ,GU, HM,HQ,JF,JN, JQ,JX,KC,KF, KJ,KM,KP,KR, KT,KU,KX,KY, MF GenBank Direct submissions
AP DDBJ Genome project data
BS DDBJ Chimpanzee genome data
AL,BX,CR,CT, CU,FP,FQ,FR EMBL Genome project data
AE,CP,CY GenBank Genome project data
AG,DE,DH,FT, GA,LB DDBJ GSS
B,AQ,AZ,BH,BZ, CC,CE,CG,CL, CW,CZ,DU,DX, ED,EI,EJ,EK, ER,ET,FH,FI, GS,HN,HR,JJ, JM,JS,JY,KG, KO,KS GenBank GSS
AK DDBJ cDNA projects
AC,DP GenBank HTGS
E,BD,DD,DI,DJ, DL,DM,FU,FV, FW,FZ,GB,HV, HW,HZ,LF,LG, LV,LX,LY DDBJ Patents
A,AX,CQ,CS,FB, GM,GN,HA,HB, HC,HD,HH,HI, JA,JB,JC,JD,JE EMBL Patents (nucleotide only)
I,AR,DZ,EA,GC, GP,GV,GX,GY, GZ,HJ,HK,HL, KH GenBank Patents (nucleotide)
G,BV,GF GenBank STS
BR DDBJ TPA
BN EMBL TPA
BK GenBank TPA
HT,HU DDBJ TPA CON division
BL,GJ,GK GenBank TPA CON division
EZ,HP,JI,JL, JO,JP,JR,JT, JU,JV,JW,KA GenBank TSA
FX,LA,LE,LH, LI,LJ DDBJ TSA
S GenBank From journal scanning
AD GenBank From GSDB
AH GenBank Segmented set header
AS GenBank Other - not currently being used
BC GenBank MGC project
BT GenBank FLI-cDNA projects
J,K,L,M GenBank from GSDB direct submissions
N GenBank and DDBJ N0-N2 were used intially by both groups but have been removed from circulation, N2-N9 are ESTs
AAAA-AZZZ, JAAA-JZZZ, LAAA-LZZZ, MAAA-MZZZ, NAAA-NZZZ, PAAA-PZZZ, QAAA-QZZZ, RAAA-RZZZ GenBank WGS
BAAA-BZZZ DDBJ WGS
CAAA-CZZZ, FAAA-FZZZ, OAAA-OZZZ EMBL WGS
DAAA-DZZZ GenBank WGS TPA
EAAA-EZZZ DDBJ WGS TPA
GAAA-GZZZ GenBank TSA
HAAA-HZZZ EMBL TSA
IAAA-IZZZ DDBJ TSA
KAAA-KZZZ GenBank Targeted Gene Projects
SAAA-SZZZ GenBank Pre-Grant/Patent Application Data
AAAAA-AZZZZ DDBJ MGA

Protein Accession Prefixes
Prefix Database Type
BAA-BZZ DDBJ Protein ID
CAA-CZZ, SAA-SZZ EMBL Protein ID
AAA-AZZ GenBank Protein ID
AAE GenBank Protein ID for Patents (note that there are also some patent proteins with AAA and AAC
FAA_FZZ DDBJ TPA Protein ID
DAA-DZZ GenBank TPA or TPA WGS Protein ID
GAA-GZZ DDBJ WGS Protein ID
EAA-EZZ, KAA-KZZ, OAA-OZZ GenBank WGS Protein ID
IAA-IZZ DDBJ TPA WGS Protein ID
LAA-LZZ DDBJ TSA Protein ID
JAA-JZZ GenBank TSA Protein ID
MAA-MZZ, NAA-NZZ, PAA-PZZ GenBank WGS/TSA Protein ID

Swiss-Prot/UniProtKB accession numbers follow a different format.

color borderRefSeq Accession Format

The RefSeq projects are NCBI sequence annotation projects and are not part of DDBJ/EMBL/GenBank. RefSeq accession numbers can be distinguished from GenBank accessions by their distinct format of an underbar in the third position.

 

Questions or Comments?
Write to the NCBI Service Desk

Revised April 17, 2017.