AGP Validation
AGP file structure and content can be validated using the agp_validate program described below. agp_validate does not currently include overlap or switch point validation; support for this additional level of validation may be added in the future.
File content and structure validation
program: agp_validate
purpose: this program checks AGP files for text formatting and consistency (as defined by the AGP Specification).
usage overview: agp_validate [-options] [FASTA files...] [AGP files...]
Run without any options agp_validate will perform a large number of validations on the input AGP files (see below), and will also generate a report of component, gap, scaffold and object statistics. If component FASTA sequence files are provided, agp_validate will also check that component spans do not exceed the sequence length. If the component sequences are available in GenBank, then agp_validate can optionally perform additional checks using the sequence lengths, versions, and taxonomy ID retrieved from GenBank.
availability: agp_validate is available by anonymous FTP. Copy the appropriate version for your platform, then uncompress the file, rename it to "agp_validate", and set the permissions, as necessary for the platform.
Error level violations reported include:
- Incorrect number of columns (excluding comments): There should be 9 tab-separated columns; the first 8 should not be empty.
- Non-positive integers in the following columns:
- 2: object_beg
- 3: object_end
- 4: part_number
- 6b: gap_length
- 7a: component_beg
- 8a: component_end
- object_end is less than the object_beg.
- component_end is less than the component_beg.
- The length of the span specified for the component (in column 7a and 8a) does not match the length of the span specified for the object (in column 2 and 3).
- The length specified for the gap (in column 6b) does not match the length of the span specified for the object (in column 2 and 3).
- Linkage=yes with a gap_type other than fragment, clone or repeat.
- Object does not start with an object_beg coordinate of 1.
- Object has ranges that are non-sequential and/or overlapping.
- Object does not start with a part_number of 1.
- Object has non-sequential lines and/or lines mixed with other objects.
- Multiple objects with the same object name (column 1).
- Component orientation of 0 or na used for a non-singleton scaffold.
- Invalid terms or symbols in the following columns:
- 5: component_type
- 7b: gap_type
- 8b: linkage
- 9a: orientation
Warning level violations reported include:
- Gap at the beginning or the end of an object.
- Consecutive gap lines of the same type.
- Overlapping spans used for a given component_id.
- Non-draft component_id used more than once.
- Non-draft component spans out of order.
- Gap line column has only 8 columns.
- Gap line has text in column 9.
- Extra tab character at the end of the line.
- Component type is not consistent with the line format.
- Component type is not consistent with the component_id accession.
Additional errors and warnings reported when optional validations are invoked:
- Invalid component_id. [-alt or -g option]
- Component is not in GenBank. [-alt option]
- component_id is ambiguous without an explicit version. [-alt option]
- component_end is greater than the sequence length. [-alt option, or FASTA files provided]
