BankIt Submission Help: Nucleotide FASTA file
Use Plain Text Format:
- Use a text editor (for example, WordPad) to prepare the FASTA file of nucleotide sequences.
- Be sure to save your file as Plain Text or Text document.
- If you are not sure that the "Save" option in your program does this automatically, use "Save As...". In the "Save as type:" pull-down menu, select "Text Document"
- If using Word, select "Save As.." from the File menu. In the "Save as type:" pull-down menu, select "Plain Text(*.txt)."
- Do not save the file as .doc or .rtf (rich text format); BankIt will not allow you to upload a non-plain text file
- Each sequence in the FASTA file contains a Definition Line followed by the sequence data.
- The Definition Line for each sequence begins with a ">" followed by a Sequence_ID (SeqID). The SeqID identifies the same specimen in all the steps of a submission (for example, in the nucleotide FASTA file, in a protein FASTA file, or in a Source Modifier file).
- Each SeqID must be unique within the file
- SeqIDs may not contain spaces.
- SeqIDs may contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#)
- SeqIDs must be 25 or fewer characters.
- The SeqID must be separated by a space from the rest of the Definition Line text
- It is recommended that the Definition Line include the organism name. If Organism Names are not input as part of their FASTA Definition Lines, they must be provided in a separate table in a subsequent page of the submission process.
- The Organism Name must be provided in this format:
[organism=Organism Name] (square bracket equal sign Organism Name square bracket).
- Source Modifiers provided in the FASTA file Definition Line must follow the same format as Organism Name. Examples: [isolate=mosquito12] [clone=AC3] [strain=BuzzLY]
- Brief, free text description of the sequence may follow the formatted
Organism Name and Source Modifiers. Examples: 'cytochrome oxidase I, partial CDS' 'trnH-psbA intergenic spacer'
- The FASTA Definition Line may not contain any internal hard returns.
- However, the FASTA Definition Line must be separated from the actual sequence by a hard return.
The placement of spaces and hard returns within a FASTA file is critical for the FASTA information and sequence(s) to be read correctly:
- Sample FASTA files showing Definition Lines and sequences
>Seq1 [organism=Carpodacus mexicanus] [clone=6b] actin (act) mRNA, partial cds
>Seq2 [organism=uncultured bacillus sp.] [isolate=A2] corticotropin (CT) gene, complete cds
>Seq3 [organism=Phalaenopsis equestris var. leucaspis]
>Seq9 [organism=Petunia integrifolia subsp. inflata]
Sample nucleotide FASTA