Announcing NCBI's plan for adopting v2.0 of the AGP specifications
NCBI is switching from the old version of the AGP specification (version 1.1) to the new version (version 2.0) because the latter can convey valuable information on the nature of the evidence linking sequences on either side of a gap. This change will affect users who obtain AGP files from the NCBI FTP site, as well as users who submit AGP files to GenBank as part of a genome assembly submission.
Timeline |
|
| 2012 Feb 10 | AGP files written in 2.0 format. AGP 2.0 files accepted in submissions. |
| 2012 Jul 1 | AGP 1.1 files no longer accepted. |
AGP files produced by NCBI
The AGP files that NCBI posts on the GenBank genomes FTP site (ftp://ftp.ncbi.nih.gov/genbank/genomes/) and on the RefSeq genome FTP site (ftp://ftp.ncbi.nih.gov/genomes/) will be written in v2.0 format starting on 10th February 2012. These v2.0 AGPs can be identified by the header line: "##agp-version 2.0". AGP parsers will require slight modification to enable them to read v2.0 AGPs (see list of changes from v1.1 to v2.0 in the AGP v2.0 specification). Existing AGP files already on the FTP site will not be updated, they will remain in v1.1 format.
AGP files submitted to NCBI
GenBank will accept AGP files in v2.0 format from 10th February 2012 onwards. GenBank will continue to accept AGP files in the old v1.1 format until 30th June 2012, but will convert them to v2.0 format. AGP files in v1.1 format will not be accepted after 1st July 2012.
Table1. Mapping of AGP v1.1 gaps to AGP v2.0 gaps
Note: AGP v1.1 gaps can be mapped forward to AGP v2.0 gaps, however, ambiguities prevent AGP v2.0 gaps from being mapped back to AGP v1.1 gaps.
AGP v1.1 gap |
AGP v2.0 gap |
||
Gap_type |
Linkage |
Gap_type |
Linkage |
| fragment | no | scaffold | yes |
| fragment | yes | scaffold | yes |
| clone | yes | scaffold | yes |
| repeat | yes | repeat | yes |
| clone | no | contig | no |
| contig | no | contig | no |
| repeat | no | repeat | no |
| centromere | no | centromere | no |
| telomere | no | telomere | no |
| short_arm | no | short_arm | no |
| heterochromatin | no | heterochromatin | no |
Display of assembly gaps and linkage evidence by INSDC
The International Nucleotide Sequence Database Collaboration (INSDC) recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" (see INSDC Feature Table Definitons). DDBJ, ENA & GenBank will use the "assembly_gap" feature to display information derived from version 2.0 AGP files in their flat-file views of sequence records.
Table2. Mapping of AGP v2.0 gaps to INSDC features
AGP v2.0 gap |
INSDC Gap Qualifier |
|
Gap_type |
Linkage |
/gap_type |
| scaffold | yes | "within scaffold" |
| repeat | yes | "repeat within scaffold" |
| contig | no | "between scaffolds" |
| repeat | no | "repeat between scaffolds" |
| centromere | no | "centromere" |
| telomere | no | "telomere" |
| short_arm | no | "short_arm" |
| heterochromatin | no | "heterochromatin" |
