Preparing a Traces Information File

Traces are the sequence data (chromatograms), base calls, and quality estimates for single-pass reads from large-scale sequencing projects. They are are maintained in the NCBI permanent repository, Trace Archive, and are linked to their respective records in GenBank through Entrez.

Barcode Submission Tool accepts traces as compressed archives accompanied by a Trace Information file which describes the traces in the archive.

Setting up the Trace Information Table

The Trace Information file is a tab-delimited text file of information describing the traces for all the specimens in a submission.

Use the following template to create the Trace Information table.

These templates can be edited with a spreadsheet program like Excel, a text editor like WordPad, or the file can be created by a program or script.

Template_ID, Trace_file, Trace_format, Center_Project, Program_ID, and Trace_End are required for each trace.

Contents of the Trace Information Table

The first row in the table contains the labels for each column.

Template_ID - identifies a sequence and must be the same value as the Sequence_ID used in the nucleotide FASTA file.

Trace_file - the path to a specific trace in the trace archive, if you set up the trace archive by putting all the traces into a directory (folder) named traces, the path would start with "traces/" For example: traces/filename.scf.

Note: If you set up your traces directory with subdirectories (eg, for each separate submission set or for each separate organism, etc), the path listed in the trace_file column must include the subdirectory name. For example: traces/subdirectory_name/filename.scf.

Trace_format - names the format of the provided trace file. Trace_format can have the following values: SCF, SFF, ZTR, and ABI.

Center_project - a sequencing center's internal designation for a specific sequencing p roject. This field can be useful for grouping related traces.

Program_ID - the base calling program. This field is free text. Program name, version numbers or dates are very useful. examples

Trace_end - labels which end of the sequence is contained in the read. Possible values: F, R, N for Forward, Reverse, and uNknown.

A Trace_file may appear only once in a Trace Information file, a Template_ID may appear more than once.

Sample Trace Information Table
Template_ID Trace_file Trace_format Center_Project Program_ID Trace_End
Seq1 traces/HCIUP1D61225.scf SCF my_proj ABCD F
Seq2 traces/HCIUP1D61235.scf SCF my_proj ABCD F
Seq3 traces/HCIUP1D61245.scf SCF my_proj ABCD F
Seq3 traces/HCIUP1D61246.scf SCF my_proj ABCD F
Seq4 traces/HCIUP1D61265.scf SCF my_proj ABCD F
Seq5 traces/HCIUP1D61275.scf SCF my_proj ABCD F
Seq6 traces/HCIUP1D61207.scf SCF my_proj ABCD F
Seq7 traces/HCIUP1D61217.scf SCF my_proj ABCD F

Sample Trace Information Table (right-click to save) as a tab-delimited text file.

Saving the Trace Information Table

When using a spreadsheet program, be sure to save your file as tab-delimited text. If you are not sure that the "Save" option in your program will do this for you, use "Save As..."

In Excel, select "Save As..." from the File menu. In the "Save as type:" pull-down menu, select "Text (Tab delimited) (*.txt)."