Discrepancy Report

Introduction

The Discrepancy Report is an evaluation of a single or multiple ASN.1 files, looking for suspicious annotation or annotation discrepancies that NCBI staff has noticed commonly occur in genome submissions, both complete and incomplete (WGS). A few of the problems that this function was written to find include inconsistent locus_tag prefixes, missing protein_id's, missing gene features, and suspect product names. The function is available in specially configured Sequin, for evaluating a single file at a time, or multiple files can be evaluated at once with the command-line program asndisc.

If you have questions about the Discrepancy Report, please contact us by email at genomes@ncbi.nlm.nih.gov prior to creating your submission.

Table of Contents

  1. Using Sequin
  2. Using asndisc
  3. Examples
  4. Discrepancy Report Tests

Using Sequin

Sequin can be configured to have the Discrepancy Report function available by following the "Configuring Sequin for HTG Submissions" directions in the HTG section. "Discrepancy Report" will then be present as an option in the Special menu.

When you run the Discrepancy Report within Sequin, a box will pop up with the results. In the lefthand frame are the problems, and in the righthand frame are the features with the selected problem. Selecting a feature in the righthand frame will cause Sequin to jump to the feature in the standard display. Double-clicking on a feature in the righthand frame will open that feature's editor, so that changes can be made to it. After making all the changes, click the "Recheck" button to have the Discrepancy Report run again. And be sure to save your changes with File-Save or File-Save as.

Using asndisc

The commandline program asndisc is available by anonymous FTP. Copy the right version for your platform, then uncompress the file, rename it to "asndisc", and set the permissions, as necessary for the platform.

asndisc examines all the files with a common suffix in a directory and collates all the discrepancies into an output file. Each problem in the output file is prefaced with DiscRep, so that the types of problems can be easily found. The standard usage runs all of the tests, but specific tests can be enabled or disabled. In addition, expanded reports of particular tests can be generated. Running "asndisc -" provides the list of arguments.

This is the recommended usage:

Examples

The Discrepancy Report is something of a blunt instrument that reports everything that fails its tests; it does not consider whether those failures are real problems or just a reflection of the biology. For example, here is a summary of the analysis of a submission, performed with the default settings of asndisc:

Since this was a eukaryotic organism with introns, the "features have joined locations" is expected. Similarly, since the submitters have UTR information for some mRNAs, those mRNAs will extend beyond their CDS, generating "coding regions or mRNAs have inconsistent gene locations" reports. However, the other reports need to be investigated to determine whether they indicate a real problem with the annotation. For example, EC numbers need to be fielded in the EC_number qualifier. Similarly, RNA features (mRNA, tRNA, etc) need to have products.

Here is the summary of the expanded report that examined only SUSPECT_PRODUCT_NAMES:

Again, review the names and fix those that are incorrect. Since this is a eukaryote, it is possible that some of these are nuclear genes encoding organellar proteins, so perhaps those reports should be ignored. In contrast, no product name should contain the word 'partial'. See the product name guidelines in the Prokaryotic and Eukaryotic annotation guidelines for recommended and inappropriate product name formats.

After you've run the Discrepancy Report and fixed the problem annotation, let us know when you submit your genome about reports that you think can be ignored and why. If you are not certain whether a particular test is important for your genome, please ask us.

Discrepancy Report Tests

The available tests are:

The standard configuration of the DiscrepancyReport within the genome-center-configured Sequin turns off these mitochondrial-related tests:

Revised December 16, 2007

Genomes

Links