|dbSTS: database of "Sequence Tagged Sites"|
Publication Source Protocol Buffer STS
Sequenced Tagged Sites (STSs) are short (about 200-500 bp) sequences that are operationally unique in a genome (i.e., can be specifically detected by PCR in the presence of all other genomic sequences), and that define a specific position on the physical map. STSs can therefore be used to generate mapping reagents which map to single positions within the genome.
STSs are usually submitted to GenBank and dbSTS as batches of dozens to thousands of entries, with a great deal of redundancy in the citation, submitter and library information. To improve the efficiency of the submission process for this type of data, we have designed a special streamlined submission process and data format (see below).
In dbSTS and GenBank an STS record includes a SEQUENCE, which is usually the sequenced product (amplicon) of a Polymerase Chain Reaction (PCR) using specific PRIMERS. In some cases a researcher may have primer sequences, but will not have determined the sequence which they amplify. Often knowing primer sequences is all that is needed for mapping or genotyping experiments.
For those cases in which you only have primer sequences you may consider submitting data to the NCBI Probe Database.
The NCBI has established a public access database, the Probe Database, for archiving primers and other nucleic acid reagents designed for use in a wide variety of biomedical and research applications.
Please contact the Probe Database administrator email@example.com for depositing primers and any other sequences or data that were used or obtained in your experiments and that are not a sequenced amplicon. Submitters who have prepared their files in dbSTS submission format (described below) can continue use the format for Probe submissions. New users or users who did not prepare their files in dbSTS submission format, please contact the Probe Database administrator (firstname.lastname@example.org) to inquire about Probe Database submission format.
The batch submission process for STS sequence data involves the completion of six file types:
Typically a batch of STSs share the same publication, source, contact, protocol, and buffer information. You only need to prepare one of each of those files.
If any of the STS files have different publication, source, contact, protocol, and buffer information, you must complete a new file for that data.
Send the completed files to:
You can attach all the files to a single email message, or you can include them in the body of the email message. Please be sure that they are in plain text (ASCII) format.
We prefer to have the individual STS data files batched together as much as possible: for example, all STS entries in one file.
You can submit sources, publications, contacts, protocols, and buffers together in one file. You can also send them in the same file as the STS entries - the TYPE field will differentiate them for the parsing software.
When STS data is loaded into the database, checks are run to determine if the given primer sequences are found in the STS sequence and if the given length of the STS is accurate.
If an entry does not pass these checks, it sometimes indicates that there was an error in the sequences in the input file.
Entries that do not pass this validation check will be returned to the submitter so that they can be re-checked and corrected, if necessary, before entry.
You will receive a list of dbSTS IDs and GenBank accession numbers from a dbSTS curator via email.
Once your sequences are released into the public database, they will be available from the STS division of GenBank and from the separate dbSTS site (How to Access STS Entries). The sequences and accession numbers in both sources are the same, but there is additional annotation in the dbSTS records such as references to the top nucleotide and protein matches.
If you would like your sequences held confidential until publication, you can indicate that by putting the release date in the PUBLIC field of the STS files. Your sequences will be released on that date, or when the accession numbers or sequence data are published, whichever comes first.
Updates to STS entries are done basically in the same way as new entries. Changes to any item in the STS input file (other than STS# or CONT_NAME) are made by completing an input file with new data in the fields that need to be changed. For the STATUS field enter "Update" instead of "New".
In addition to the fields to be changed Updates need to include TYPE, STATUS, STS#, and CONT_NAME fields.
For changes in Publication, Contact, or Source data, or for changes in STS#'s or CONT_NAME, send an email message describing the change that is needed.
Send the update files to: email@example.com
If you have questions about the STS submission format, please contact
File TypesThere are six types of deliverable files:
Each STS file needs to reference the Publication, Source, and Contact data. Therefore the Publication, Source, and Contact files must be in the database when the STS file is entered. Once these files have been submitted and entered, they do not need to be re-submitted for additional STS files that have the same Publication, Source, or Contact.
TYPE: Entry type - must be "Pub" for publication entries. **Obligatory field** MEDUID: Medline/PubMed unique identifier. Not obligatory, include if you know it. TITLE: Title of article. (Begin on line below tag, use multiple lines if necessary) **Obligatory field** AUTHORS: Author name, format: Name,I.I.; Name2,I.I.; Name3,I.I. (Begin on line below field tag, use multiple lines if necessary) **Obligatory field** JOURNAL: Journal name VOLUME: Volume number SUPPL: Supplement number ISSUE: Issue number I_SUPPL: Issue supplement number PAGES: Page, format: 123-9 YEAR: Year of publication. **Obligatory field** STATUS: Status field.1=unpublished, 2=submitted, 3=in press, 4=published. **Obligatory field** ||
TYPE: Pub MEDUID: TITLE: Human chromosome 7 STS AUTHORS: Green,E. YEAR: 1996 STATUS: 1 || TYPE: Pub MEDUID: 96172835 TITLE: CpG islands of chicken are concentrated on microchromosomes AUTHORS: McQueen,H.A.; Fantes,J.; Cross,S.H.; Clark,V.H.; Archibald,A.L.; Bird,A.P. JOURNAL: Nat. Genet. VOLUME: 12 PAGES: 321-324 YEAR: 1996 STATUS: 4 ||
TYPE: Entry type - must be "Source" for source entries. **Obligatory field** NAME: Name of source. **Obligatory field** ORGANISM: Organism from which source prepared: Scientific name. **Obligatory field** STRAIN: Organism strain CULTIVAR: Plant cultivar SEX: Sex of organism (female, male, hermaphrodite) ORGAN: Organ name TISSUE: Tissue type CELL_TYPE: Cell type CELL_LINE: Name of cell line STAGE: Developmental stage VECTOR: Name of vector. V_TYPE: Type of vector (Cosmid, Phage, Plasmid, YAC, Other) HOST: Laboratory host name DESCR: Description of source preparation methods, vector, etc. This field starts on the line below the DESCR: tag. ||
TYPE: Source NAME: cSRL flow sorted Human Chromosome 11 specific cosmid ORGANISM: Homo sapiens VECTOR: sCos-1 V_TYPE: Cosmid DESCR: Human Chromosome 11 specific cosmid library prepared from flow sorted human Chromosome 11 derived from Chinese Hampster Ovary (CHO) monochromosomal somatic cell hybrid, J1 || TYPE: Source NAME: Bovine sperm ORGANISM: Bos taurus STRAIN: Holstein SEX: male TISSUE: seminal vesicle CELL_TYPE: sperm STAGE: adult VECTOR: pBluescript V_TYPE: Plasmid DESCR: Genomic PstI fragments cloned into pBluescript ||
TYPE: Entry type - must be "Cont" for contact entries. **Obligatory field** NAME: Name of person who provided the STS. FAX: Fax number as string of digits. TEL: Telephone number as string of digits. EMAIL: E-mail address LAB: Laboratory providing STS. INST: Institution name ADDR: Address string, comma delineation. ||
TYPE: Cont NAME: Eric Green FAX: TEL: EMAIL: firstname.lastname@example.org LAB: Center for Genetics in Medicine INST: Washington University School of Medicine ADDR: Box 8232, 4566 Scott Avenue, St. Louis, MO 63110, USA ||
TYPE: Entry type - must be "Protocol" for protocol entries. **Obligatory field** NAME: Name of protocol. **Obligatory field** PROTOCOL: Description of protocol used. Starts on the line below the PROTOCOL tag. Lay out this description as you want it to appear in GenBank, using blanks, not tabs, to line up columns. ||
TYPE: Protocol NAME: STS-A (E.Green) PROTOCOL: Template: 30-100 ng Primer: each 1 uM dNTPs: each 200 uM Taq Polymerase: 0.05 units/ul Total Vol: 5 ul || TYPE: Protocol NAME: STS-B (E.Green) PROTOCOL: Template: 30-100 ng Primer: each 1 uM dNTPs: each 200 uM Taq Polymerase: 0.05 units/ul Total Vol: 10 ul ||
TYPE: Entry type - must be "Buffer" for buffer entries. **Obligatory field** NAME: Name of buffer. **Obligatory field** BUFFER: Description of buffer used. Starts on the line below the BUFFER tag. Lay out this description as you want it to appear in GenBank, using blanks, not tabs, to line up columns. ||
TYPE: Buffer NAME: STS-1 (E.Green) BUFFER: MgCl2: 1.5 mM KCl: 50 mM Tris-HCl: 10 mM pH: 8.3 || TYPE: Buffer NAME: STS-2 (E.Green) BUFFER: MgCl2: 2.5 mM KCl: 50 mM Tris-HCl: 10 mM pH: 8.3 ||
TYPE: Entry type - must be "STS" for STS entries. **Obligatory field** STATUS: Status of STS entry - "New" or "Update". **Obligatory field** CONT_NAME: Name of contact (Must be identical string to the NAME field of the Contact file.) **Obligatory field** PROTOCOL: Protocol name. (Must be identical string to the NAME field of the Protocol file.) **Obligatory field for New entries** BUFFER: Buffer name. (Must be identical string to the NAME field of the Buffer file.) **Obligatory field for New entries** SOURCE: Source name. (Must be identical string to the NAME field of the Source file.) **Obligatory field for New entries** CITATION: Journal citation. (Must be identical string to the TITLE field of the Publication file). Starts on line below CITATION: tag - use continuation lines if necessary. **Obligatory field for New entries** STS#: STS id assigned by contact lab. **Obligatory field** For STS entry updates, this is the string we match on. SYNONYMS: Synonyms list, separated by commas. PRIMER_DB: Database which contains the sequence used as the source of the primer sequences, if relevant. PRIMER_ACC: Accession number of the sequence from which primer sequences were derived. GB#: GenBank accession number. GDB#: Human genome database accession number. GDB_DSEG: Human genome database Dsegment number. CLONE: Clone id. P_END: Which end sequenced, e.g. 5' DNA_TYPE: Genomic (default),cDNA, Viral, Synthetic, Other. SIZE: Size of STS (in nucleotides); includes primer sites. F_PRIMER: Sequence of forward primer. B_PRIMER: Sequence of backward primer. PCR_PROFILE: Description of PCR profile. Starts on line below the PCR_PROFILE: tag. Line up data as you wish it to appear in GenBank. Use blanks, not tabs to format this data. PUBLIC: Date for public release. **Obligatory field** Leave blank for immediate release. Use the date format mm/dd/yyyy (e.g., 12/31/1999). GENE_SYMBOL: Putative gene symbol. GENE_NAME: Full name of putative gene. PRODUCT: Putative product identification. COMMENT: Comments about STS. Starts on line below COMMENT: tag. SEQUENCE: Sequence string. Starts on line below SEQUENCE: tag. **Obligatory field for New entries** ||
TYPE: STS STATUS: New CONT_NAME: Eric Green PROTOCOL: STS-A (E.Green) BUFFER: STS-1 (E.Green) CITATION: Human chromosome 7 STS SOURCE: Human EGreen STS#: sWSS282 SYNONYMS: F_PRIMER: AAGCACAGGAGAAGATGG B_PRIMER: GAATTGACAGACAGTAAGGAAG DNA_TYPE: Genomic P_END: PUBLIC: PRODUCT: GENE_SYMBOL: GENE_NAME: SIZE: 143 PCR_PROFILE: Presoak: 0 degrees C for 0.00 minute(s) Denaturation: 92 degrees C for 1.00 minute(s) Annealing: 60 degrees C for 2.00 minute(s) Polymerization: 72 degrees C for 2.00 minute(s) PCR Cycles: 35 Thermal Cycler: Perkin Elmer TC SEQUENCE: ATTCTATCCAAGTCTCAAGGCCCCACAACCTGGAGCTCTGATGCTCAAGCACAGGAGAAG ATGGGTGTCCAGCTCAAACACAGAGAACACATTCACCCTTCCCTGCCTTTTTGTTCTGTT CAGACCCTCAGCAGATAGGATGCCTGCCCACAGCGGTAAGGGCACATCTTCCTTACTGTC TGTCAATTCAGATGCTGATCACTCTGGT || Example of a sequence update: TYPE: STS STATUS: Update CONT_NAME: Thomas Hudson STS#: DXYS112 F_PRIMER: CTTCAGATCAGATTAAGGTGCTCT B_PRIMER: GGGAAGCATTGACTGCATTA PUBLIC: SIZE: 231 SEQUENCE: CTNTACAGCAAGCTTAGTATCATCCTCTTCAGATCAGATTAAGGTGCTCTTGAAAGCTCA GANNNTTGTATTTGTTTAAATGCACAGTAATTAAAAGTNTTTTTTTTAATCAGCAAAAGC AGTTAAAGTAAANCAANATATTNANGCCNAAANTNTATTTATNTCACATATCCTGANGTG GCNCTNNCANGNTGTTNTNCATGGGGNAAATNTGCATCTGTAGATCTGTTGNTTCANTAA TGCAGTCAATGCTTCCCTTTGNNCAGNTCTAGGGTAGNTTAAATNAGANTCTTNCANCTT TNNNGGNCTGAAAAGAANNATTTAACCNCCTTGTNNANNCTGGAAACCNNGCTACCTNTG NAGGTNNTCGTNCTNCCNTNNCANCGTTTTGCTGTTTGCTANGTCAAGCCTCTTGCCTTC NTCCGNCCCAAGTANCCNGTNCTNGGGCACTNAAAACCCNNNTTTTNGGACCANGCNNGN ANGCCCCANATT ||
On-Line STS Database, Data Input Format Specification
This draft document is being made available solely for review purposes and should not be quoted, circulated, reproduced or represented as an official NCBI document. The draft is undergoing revisions and should not be considered or represented as reflecting the views, positions or intentions of the NCBI or the National Library of Medicine.