BankIt Submission Help: Protein FASTA

The format of the protein FASTA file is similar to the format of the nucleotide FASTA file.

Like the nucleotide FASTA file, the protein FASTA file contains a SequenceID followed by the data for the sequence but it does not include organism name or any other source modifiers.

For the protein FASTA definition line, start with a > followed by the Sequence_ID of the nucleotide sequence that translates to the protein sequence.

Use the same Sequence_ID for the protein FASTA you used for its corresponding sequence in the nucleotide FASTA file.

There must NOT be a space between the > and the Sequence_ID

There must be a hard return between the >SequenceID and the actual protein sequence.

Format of a protein FASTA definition line showing placement of spaces and hard returns

Correct IUPAC codes for amino acids can be found in the GenBank Submissions Handbook

Sample Protein FASTA

Sample Protein FASTA File
sample file

For barcode submissions, one has the option of providing a file of protein sequences in FASTA format. This protein FASTA file is not required for Barcode submissions.