How to: Submit sequence data to NCBI

Starting with...

NOTES

SUBMISSION TOOLS
& HELP DOCUMENTS

 

Simple Sequence Submissions

 

Single nucleotide sequence

or

Several nucleotide sequences for different genes or loci

Contiguous bases of cDNA or genomic DNA, but should not be complete genomes. Complete genomes should be submitted via the appropriate protocol indicated below.

Records with simple annotation may be submitted by BankIt or Sequin, while records with complicated annotation may be more easily submitted via Sequin.

BankIt or Sequin

Group of nucleotide sequences for the same gene or locus

Includes:

  • population studies (sequences for a single organism)
  • phylogenetic studies (sequences for multiple organisms)
  • environmental samples (such as cultured or uncultured bacteria or metagenomic samples)

BankIt or Sequin

Batches of Sequences

Includes:

  • Expressed Sequence Tags (ESTs)
  • Genome Survey Sequences (GSSs)

Batch submit guidance page

 

Genomic Assembly Submissions

 

Small complete genomes

Includes chloroplasts, mitochondria, plasmids, phages, and viruses
(Locus_tag or BioProject registration is NOT required.)

Sequin

Large complete genomes

Includes paired chromosome and plasmids, as well as bacterial or eukaryotic chromosomes

Questions regarding a specific submission that are not answered in the documented instructions can be sent to genomes@ncbi.nlm.nih.gov .

Prokaryotic Genomes submission

Eukaryotic Genomes submission

Incomplete genomes

These can be whole genome shotgun (WGS) sequences. WGS submissions should be prepared using the tbl2asn or Sequin tools. For assistance contact genomes@ncbi.nlm.nih.gov .

Assembly submission information
& Examples

WGS submissions

High Throughput Genome Sequences (HTGSs)

The clones (e.g. BACs) of large-scale clone-based genome sequencing projects that are to be released quickly into GenBank can be submitted via the HTGS system. Sequences that are to be kept confidential or are few in number should be submitted as described above for Single nucleotide sequences.

HTGS submissions require prior communication with NCBI staff, so please read about the HTGS submission process for details.

HTGS submissions

 

Other Submission Types

 

Barcode of Life sequences

Mitochondrial cytochrome oxidase I sequences that are part of the Barcode of Life initiative can be submitted using a customized Bankit.

Barcode submit page

New sequence annotation for a non-RefSeq record submitted to GenBank by someone else

Third Party Annotation (TPA) submissions can be created for annotation of existing GenBank records when the submitter has experimental or inferential evidence that will be published in a peer-reviewed biological journal.

Please read about the TPA database and its submissions policies before submission.

TPA information

TPA FAQs

Computationally assembled transcript sequences

These records, based on those that have already been submitted to SRA or the Trace Archive, may be candidates for submission to the Transcriptome Shotgun Assembly (TSA) repository.

TSA information

Variations or Polymorphisms1

Single nucleotide polymorphisms as well as short insertions and deletions (<50bp) should be submitted to dbSNP, while large structural variations and copy number variation (CNV) data should be submitted to dbVar.

Please note that human variations/polymorphisms with clinical relevance should be submitted to a specialized Human Variation Batch submission process using HGVS nomenclature.

Variation Submission Portal

Primers, siRNAs, or probes

Primer or nucleotide-based probe sequences should be submitted to the Probe Database.

Probe submit page

High throughput sequences

The Sequence Read Archive (SRA) accepts reads from high throughput sequencing instruments. Some submissions include sets of SRA reads as part of a comprehensive package. For the specific datasets described below, please initiate submissions with the appropriate archive:

  • Human sequence or metagenome sequence data derived from clinical isolates or from sources with privacy concerns should be submitted to dbGaP.
  • Functional genomics studies that examine gene expression, regulation or epigenomics (using methods such as RNA-Seq, miRNA-Seq, ChIP-Seq or methyl-Seq) should be submitted to GEO.
  • Transcript survey sequence assemblies should go to the Transcriptome Shotgun Assembly (TSA) archive.
  • Non-human and environmental metagenomics data should go to the Metagenome archive.
  • Whole genome sequence assemblies should be submitted to WGS.
  • Capillary traces should be deposited in the Trace Archive.
  • Sequences from the Barcode of Life project should be submitted to Barcode.

Curators of these resources will assist submitters in sending the data to SRA during the submission process.

For data types not mentioned to the left, submit directly to SRA:

SRA submit page
SRA submission guidance

1If you need a GenBank accession number for Variation or Polymorphism submissions, you will need to annotate the variations as SNPs, insertions/deletions, or microsatellite regions on a nucleotide sequence and submit this to GenBank using the appropriate mechanism for the sequence type.

Support Center