The format of the protein FASTA file is similar to the format of the nucleotide FASTA file.
Like the nucleotide FASTA file, the protein FASTA file contains a SequenceID followed by the data for the sequence but it does not include organism name or any other source modifiers.
For the protein FASTA definition line, start with a > followed by the Sequence_ID of the nucleotide sequence that translates to the protein sequence.
Use the same Sequence_ID for the protein FASTA you used for its corresponding sequence in the nucleotide FASTA file.
There must NOT be a space between the > and the Sequence_ID
There must be a hard return between the >SequenceID and the actual protein sequence.
Correct IUPAC codes for amino acids can be found in the GenBank Submissions Handbook
>Seq1 LYLIFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMI GAPDMAFPRMNNMSFWLLPPSFLLLLASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGVSS ILGAINFITTAINMKPPTLSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGD PVLYQHLFWFFGHPEVYILIL
>Seq2 VGTALXLLIRAELXQPGALLGDDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMN NMSFWLLPPSFLLLMASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGISSILGAINFITTA INMKPPALSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPVLYQHLFWFF GHPEVYILIL
For barcode submissions, one has the option of providing a file of protein sequences in FASTA format. This protein FASTA file is not required for Barcode submissions.