BankIt Submission Help: Feature Table File
BankIt accepts features as a five-column, tab-delimited
table file. The feature table specifies the location and type of each feature,
and BankIt processes the feature intervals and translates any CDS features into
The feature table format allows different kinds of features (e.g., gene,
mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be
annotated. The valid features
are restricted to those approved by the International Nucleotide Sequence Database Collaboration.
Preparing the Feature Table File
The first line of the feature table contains the
following basic information
The sequence identifier (Sequence_ID) must match the label used to identify each
table's corresponding sequence in the nucleotide FASTA file.
Subsequent lines of the table list the features.
Prepare the feature table file in a text editor and save it as plain ascii
text (not .rtf or .doc)
Format for a feature table:
- Each feature is shown on a separate line.
- Multiple nucleotide intervals for a feature are on subsequent lines.
- Qualifier(s) describing a feature are on the line(s) below that feature and its intervals.
- Each column is separated by a tab.
As shown in the examples below:
Column 1: Start location (first nucleotide) of a feature
Column 2: Stop location (last nucleotide) of a feature
Column 3: Feature name (for example, 'CDS' or 'mRNA' or 'rRNA' or 'gene' or
Column 4: Qualifier name (for example, 'product' or 'number' or 'gene' or 'note')
Column 5: Qualifier value
Note in the examples below that 'gene' is both a Feature and a
Qualifier and must be entered in two separate columns.
The examples below show sample tables and illustrates a number of points
about the table format.
<1 >1050 gene
<1 1009 CDS
product acid trehalase
<1 >1050 mRNA
product acid trehalase
2626 2590 tRNA
1080 1210 CDS
note alternatively spliced
1055 1210 mRNA
1055 1340 gene
1055 1079 5'UTR
1316 1340 3'UTR
- Features that are on complementary strand, such as the tRNA-Phe, are indicated by reversing the interval locations.
- Locations of partial(incomplete) features are indicated with a ">" or
"<" next to the number. In the Seq1 example, the gene, CDS and mRNA all
begin upstream of the start of the nucleotide sequence.
The "<" symbol indicates that they are 5' partial features and the ">" symbol
indicates that the gene and mRNA are 3' partial.
Furthermore, for the protein to translate correctly, the correct reading frame
must be indicated with the qualifer "codon_start" on the CDS. There is no need
to indicate codon_start on complete CDSs, as it is assumed that the translation
starts at the first nucleotide of the interval if no codon_start is provided.
- If a feature contains multiple intervals, like the spliced tRNA-Phe, each
interval is listed on a separate line by its start and stop position before
subsequent qualifier lines.
- Gene features are always a single interval, and their location should cover
the intervals of all the relevant features (for example: CDS plus 5'UTR plus 3'UTR).
- If a protein has more than one name, each can be listed in the table as a
separate product qualifier on the CDS in the table. The value of the first
product qualifier will become the /product on the CDS in the flatfile, and any
additional product qualifiers will be shown as a /note on the CDS in the
flatfile. All CDS features must have atleast one product.
- A flatfile /note can be added to any feature using the qualifier note in the