Microbial Genome Submission Check

  1. Main page
    1. Name of submission.
    2. Parent accession.
    3. File/Browse.
  2. Results page.
  3. Downloaded html files.
  4. General description.
  5. Preparations.
  6. Check types.
    1. Self-consistency checks.
      1. Complete overlaps.
      2. RNA overlaps.
      3. Partial overlaps.
    2. Comparison checks.
      1. Frameshifts.
      2. Truncated proteins.
      3. Missing RNAs.
      4. RNA strand mismatches.
  7. Viewing the results.
  8. Frequently Asked Questions.
  1. Main page
    1. Name of submission.
    2. The name that will be used for the results file output.
    3. Parent accession.
    4. Enter the RefSeq Nucleotide Accession Number to EXCLUDE it from the comparison results.
    5. File/Browse
    6. Enter the filename of your submission. The file must be in ASN.1 text format (either Seq-entry or Seq-submit). Files of this type are used for submission to GenBank and can be produced in a variety of ways (see GenBank General Genome Submissions and subsequent pages for more information). The file should be a SINGLE entry only, not a set of sequences/records. For example a single chromosome of an organism.
  2. Results page.
  3. Once the tool has finished processing, it will say 'Job done' and there will be a download link. Download the resulting zip file which contains a number of html files and extract them to your local computer (using Winzip as an example) into the same directory. The main html file (with the same name you provided during submission) can then be opened with a web browser. Links to the subsequent pages and BLAST results are contained within the main html.
  4. Downloaded html files.
  5. The main html file contains the results output. If opened in a web browser, links to individual results, and to BLAST pages are provided.
  6. General description.
  7. Introduction.
    To help uncover potential errors in prokaryotic genome annotation a number of analyses are performed. The results are now used internally by GenBank and RefSeq to help both genome submitters to GenBank and to update annotation in RefSeq records. NCBI encourages anyone planning to submit their genomes to GenBank to use this tool to uncover potential problems that should be examined prior to submission.
  8. Preparations.
  9. Upon submission, the nucleotide sequence is extracted. The locations of proteins/genes are copied for later analysis and the protein sequences are extracted and used for a BLAST search against proteins encoded by complete prokaryotic RefSeq genomes and RPS-BLAST against profiles in CDD database (BLAST cutoff e-06 - rpsBLAST cutoff e-06). The BLAST results are stored for later analysis and relevant BLAST results are provided in the output files. The nucleotide sequence is checked for tRNAs and rRNAs using tRNAscanSE and an internal ribosomal RNA database (bacterial and archaeal tRNAs using covariance analysis). tRNA and rRNA results stored for later analysis and relevant results are provided in the output files.
  10. Check types.
    1. Self-consistency checks.
    2. Self-consistency checks are based exclusively on analysis of input annotations and the genomic sequence.
      1. Complete overlaps.
      2. Completely overlapping genes (on both strands) are reported. Annotations of this type are suspicious and should be manually inspected.
      3. RNA overlaps.
      4. Overlaps between RNA and other features (RNA and CDS) are reported. Annotations of this type are rare and should be manually inspected.
      5. Partial overlaps.
      6. Partial overlaps are any type of overlap above the distance of 30 bases. Overlaps of this type do occur in genome annotations and are reported here for manual inspection. Many overlaps or overlaps of significant length might indicate annotation problems.
    3. Comparison checks.
    4. Those types of checks involve comparison of the input genome annotations to external reference genomes and proteins and to computations done by the tool.
      1. Frameshifts.
      2. Adjacent genes on the same strand are analyzed for hits against the same subject (common BLAST hit) by comparing BLAST results. Since gene fusions/splits occur in prokaryotic genes the BLAST hits are analyzed for any subject (not the common BLAST hit) that covers 90% of the query protein, in which case the frameshift is not reported under the assumption that this gene is ‘real’. Any pair of genes failing to meet that criteria are reported as potential frameshifts and should be manually inspected.
      3. Truncated proteins.
      4. The results from the RPS-BLAST against conserved domains are analyzed for situations when the conserved domain is only partially covered by the query. Potential truncations will be reported (domain must cover at least 90% of the protein, whereas protein covers 80% or less of the domain). The results should be manually inspected for incorrectly annotated start sites (N-terminal truncation), frameshifts (C-terminal truncations), or any other type of potential problem.
      5. Missing RNAs.
      6. RNA annotations are checked with tRNAscanSE for tRNAs, and an internal ribosomal RNA database for structural RNAs. Completely missing tRNAs above a score of 60.0 and missing high-scoring ribosomal RNAs are reported. These should be examined to see if they can be added to the genome.
      7. RNA strand mismatches.
      8. Occasionally RNAs are incorrectly annotated on the wrong strand (which is difficult to do with protein coding genes). When overlapping computed and submitted RNA annotations have opposite strands they are reported and should be examined to see if the incorrect strand was used in the genome annotation.
  11. Viewing the results.
  12. When the job is done, you will see "Download results" button. Save the archive file locally and unpack it before viewing the main html page (index.html). Do not try to browse hyperlinks right in WinZip or another extractor viewing window.
  13. FAQ
  14. Q.I opened WinZip file and clicked on the input file, but the page does not look right and hyperlinks do not work. A.Do not do that. Unpack the whole archive to a separate directory and then browse.