skip to main content

SRA Submission Frequently Asked Questions

Note that this FAQ covers specific questions that may arise during the submission process. For a more complete overview, please review the SRA Quick Start Guide.

  1. BioSample: How many samples do I need for my SRA submission?
  2. How do I import a BioProject or BioSample accession into the SRA?
  3. What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?
  4. What is an MD5 checksum and how do I compute it?
  5. How do I connect to the SRA by FTP?
  6. I have sent my files, but my submission still reports "waiting for files" – is something wrong?
  7. Which accession(s) should be cited in a publication?
  8. How do I release data?
  1. BioSample: How many samples do I need for my SRA submission?

    BioSample is a record of biological isolates with unique physical properties. Biological and technical replicates (in most cases) should not be considered unique BioSamples (see below). For environmental samples, each physical isolate should be considered a BioSample, whereas uniquely attributable reads within an isolate are not. Note that a given data file can be linked to a single BioSample only. Please feel free to contact the curation staff at sra@ncbi.nlm.nih.gov if you have more than 25 BioSamples and would like assistance with your submission.

    Examples:

    • 23,000 unique 16S amplicons from a single seawater collection point – 1 BioSample (1 sample was collected and then analyzed to deduce 16S diversity)
    • 3 "identical" transgenic mice treated with the same drug as part of an experiment – 1 BioSample (Please see below for how biological and technical replicates can be represented in the SRA)
    • CHO cells infected with a virus and sampled at 0, 2, 4, and 8 hours post infection – 4 BioSamples (4 time points)
    • RNA-Seq data from a single male anteater taken from the brain, heart, lungs, testes, and liver – 5 BioSamples (5 different tissues isolated)

  2. How do I import a BioProject or BioSample accession into the SRA?

    BioProject and BioSample submissions must be made through the Submission Portal prior to transmitting data files to the SRA. Once you begin a BioProject or BioSample submission, it will be assigned a temporary tracking ID (SUB[number]) – this is not the final accession! Once a BioProject is complete, it is assigned an accession like PRJNA[number]. Once a BioSample submission is complete, each sample will receive an accession like SAMN[number]. When creating SRA experiments, please specify the PRJNA[number] accession as your BioProject, and SAMN[number] as your BioSample. Note that a given data file can be linked to a single BioSample only. Please feel free to contact the curation staff at sra@ncbi.nlm.nih.gov if you have more than 25 BioSamples and would like assistance with your submission.

  3. What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?

    BioSample is a record of biological isolates with unique physical properties. Biological and technical replicates (in most cases) should not be considered unique BioSamples. Instead, multiple SRA Experiments can be linked to a single BioSample to clearly indicate that the data are the result of biological or technical replicates. Note that a given data file can be linked to a single BioSample only.

    Each SRA Experiment is a unique sequencing library for a specific sample. Importantly, much of the descriptive information that is displayed in the public record of your data is captured at the level of the SRA Experiment. It is therefore imperative that you provide a clear, broadly understandable Title and Description for each Experiment:

    'SRA experiment Entrez report' image

    SRA Runs are simply a manifest of data file(s) that should be linked to a given sequencing library – no information present in the Run is displayed on the public record of your project (see the above image). Note that all data files listed in a Run will be merged into a single .sra archive file, so files from different samples or replicates should not be grouped in the same Run. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end.

  4. What is an MD5 checksum and how do I compute it?

    MD5 checksums are used by the SRA to verify the integrity of transmitted data. MD5 checksums are a 32-character alphanumeric string like

    bf4ac50dcd58bd2860dfac48c7fca348

    For Linux and Mac users, the command line utilities "md5sum"(Linux) and "md5"(Mac) can be used to generate MD5 checksums. Windows users will need to download a third-party utility.

  5. How do I connect to the SRA by FTP?

    The SRA recommends that you use a dedicated FTP client to transmit your data files. Once you have completed an SRA Run, you will be provided the current login username and password to access our FTP server.

    Key points: You should create a unique folder on the FTP server – many submitters use one of their accessions. We DO NOT accept .zip or .rar compressed files; .gz and .bz2 compressed files ARE accepted. As part of our normal pipeline, files are regularly moved from FTP to a staging area for processing. Files "disappearing" from FTP is completely normal.

  6. I have sent my files, but my submission still reports "waiting for files" – is something wrong?

    There is a delay between when files are received to when they are identified and queued for processing by our pipeline, typically on the order of minutes to hours. If it has been more than 2 hours and you see no change in status, please verify the following:

    File name AND MD5 sum of your transmitted files match those entered into the webpage exactly. If there is any discrepancy in file name or MD5 sum, your files will not be linked and processed. Check your FTP client to confirm that the transmission was completed successfully.

    If both of the above are correct and your submission has not progressed, please email sra@ncbi.nlm.nih.gov for assistance. Please provide as much information as possible (accession(s), file name(s), MD5 sum(s), etc.).

  7. Which accession(s) should be cited in a publication?

    Please cite the SRA Study accession, SRP[number], in any publications that reference the data. A search for the SRP accession will return all of the data linked to the project.

  8. How do I release data?

    At the bottom of each submission page, there is a "set release date" field and button:

    'set release date' image

    This can be used to trigger the release of SRA data, or to extend the hold date up to 1 year from the current date. Note that linked BioProjects and BioSamples will be released when SRA data are released. The converse is not true – setting immediate release for BioProjects and BioSamples will not set SRA data to also be released. When creating an SRA submission for immediate release, please set the release date to today’s date.

    Please also note that there is a delay from data release to full indexing on public NCBI pages. This delay is usually less than 24 hours, so data released today may not be fully visible until tomorrow morning.