NCBI Genomic Biology
Assembly Information
AGP Resources
AGP Validation
AGP Validation
Files structure and content can be validated at two different levels:
I. File content and structure

program: agp_validate
purpose: this program checks for text formatting and consistency (as defined by the AGP Specification). It also prints statistics for components, gaps, scaffolds and objects.
availability: agp_validate is available by anonymous FTP. Copy the right version for your platform, then uncompress the file, rename it to "agp_validate", and set the permissions, as necessary for the platform.
usage: agp_validate [-options] [input files...]
Running the command without any options will perform all validation checks, except for those that require the component sequences to be available in GenBank, and will also generate a report of component, gap, scaffold and object statistics. Checking component accessions, lengths and the taxonomy ID of the source using the -alt or -a options requires that the components are available in GenBank.
options:
-alt Check component Accessions, Lengths and Taxids using GenBank data.
This can be very time-consuming, and is done separately from most other checks.
-a Check component Accessions (but not lengths or taxids); faster than "-alt".
-species Allow components from different subspecies during Taxid checks (implies -alt).
-list List error and warning messages.
-limit COUNT Print only the first COUNT messages of each type.
Default=10. To print all, use: -limit 0
-skip WHAT Do not report lines with a particular error or warning message.
-only WHAT Report only this particular error or warning.
Multiple -skip or -only are allowed. 'WHAT' may be:
- an error code (e01,.. w21,..; see '-list')
- a part of the actual message
- a keyword: all, warn[ings], err[ors], alt
Error level violations:
- Incorrect number of columns (excluding comments): There should be 9 tab-separated columns; the first 8 should not be empty.
- Empty columns (other than column 9 in gap lines).
- Empty lines.
- Non-positive integers in the following columns:
- 2: object_beg
- 3: object_end
- 4: part_num
- 6b: gap_length
- 7a: component_beg
- 8a: component_end
- Object ranges that are non-sequential and/or overlapping.
- object_end is less than object_beg.
- component_end is less than component_beg.
- The length of the span specified for the component (in column 7a and 8a) does not match the length of the span specified for the object (in column 2 and 3).
The length specified for the gap (in column 6b) does not match the length of the span specified for the object (in column 2 and 3).
- Invalid terms or symbols in the following columns:
- 5: component_type
- 7b: gap_type
- 8b: linkage
- 9a: orientation
- Linkage=yes with a gap_type other than fragment, clone or repeat.
- Multiple objects with the same object name (column 1).
- An object not beginning with an object_beg coordinate of 1.
- An object not beginning with a part_number of 1.
- Object has non-sequential lines and/or lines mixed with other objects.
- 0 or na component orientation used for a non-singleton scaffold.
Warning level violations:
- A gap at the beginning or the end of an object.
- Consecutive gap lines of the same type.
- Overlapping spans used for a given component_id.
- Non-draft component_id used more than once.
- Non-draft component spans out of order.
- An object with no components.
- A gap line missing column 9 (null).
- Extra tab character at the end of the line.
- File missing a line separator at the end.
Error level violations for GenBank-based validation (-alt):
- Invalid component_id.
- Component is not in GenBank.
- component_id is ambiguous without an explicit version.
- component_end is greater than the sequence length.
- Components with a taxonomy ID different from the inferred taxonomy ID for the AGP.
- Unable to infer a taxonomy ID for the AGP because less than 80% of the components have the same taxonomy ID.
Page last updated:
February 9, 2007