|dbEST: database of "Expressed Sequence Tags"|
Publication Library EST
Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library.
about ESTs can be found in:
Most EST projects develop large numbers of sequences. These are commonly submitted to GenBank and dbEST as batches of dozens to thousands of entries, with a great deal of redundancy in the citation, submitter and library information. To improve the efficiency of the submission process for this type of data, we have designed a special streamlined submission process and data format.
dbEST also includes sequences that are longer than the traditional ESTs, or are produced as single sequences or in small batches. Among these sequences are products of differential display experiments and RACE experiments. The thing that these sequences have in common with traditional ESTs, regardless of length, quality, or quantity, is that there is little information that can be annotated in the record.
If a sequence is later characterized and annotated with biological features such as a coding region, 5'UTR, or 3'UTR, it should be submitted through the regular GenBank submissions procedure (via BankIt or Sequin), even if part of the sequence is already in dbEST.
dbEST is reserved for single-pass reads. Assembled sequences should not be submitted to dbEST. GenBank will accept assembled EST submissions for the forthcoming TSA (Transcriptome Shotgun Assembly) division. Please contact firstname.lastname@example.org for more information about submitting EST assemblies. The individual reads which make up the assembly should be submitted to dbEST, the Trace archive or the Short Read Archive (SRA) prior to the submission of the assemblies. For additional information about submitting to Trace or SRA please see Trace web site.
NOTE: Beginning in 2009 Sequences derived from "next generation" sequencing platforms, including Roche 454, Illumina, Applied Biosystems SOLiD, and Helicos Biosciences HeliScope, should be submitted to the Short Read Archive (SRA) (For information contact email@example.com.)
Sequences which should not be included in EST submissions include the following: mitochondrial sequences, rRNA, viral sequences, vector sequences. Vector and linker regions should be removed from EST sequences before submission.
There are two parts to the submission instructions, one for the sequence data, and one for any mapping data. (NOTE: starting in 2009 map data will no longer be entered for dbEST submissions.)
The batch submission process for EST sequence data involves the completion of four file types:
The format for each file is described below.
If all the ESTs share the same Publication, Library, and Contact information, you only need to prepare one of each of those files. Then complete a separate EST file (file type d) for each sequence.
If any of the EST files have different Publication, Library, or Contact information, you must complete a new file of type a, b, or c.
Once we have entered particular Publication, Library, or Contact information into the database, you do not need to resend the data input files.
Send the completed files to:
You can attach all the files to a single email message, or you can include them in the body of the email message. Please be sure that they are in plain text (ASCII) format.
We prefer to have the individual EST data files batched together as much as possible.
You can submit library, publication, and contact data together in one file. You can also send them in the same file as the EST entries - the TYPE field will differentiate them for the parsing software.
You will receive a list of dbEST IDs and GenBank accession numbers from a dbEST curator via email.
If you would like your sequences held confidential until publication, you can indicate that by putting the release date in the PUBLIC field of the EST files. Your sequences will be released on that date, or when the accession numbers or sequence data are published, whichever comes first.
Once your sequences are released into the public database, they will be available from GenBank, accessible through Entrez Nucleotides.
Updates to EST entries are done basically in the same way as new entries. Changes to any item in the EST input file (other than EST# or CONT_NAME) are made by completing an input file with new data in the fields that need to be changed. For the STATUS field enter "Update" instead of "New".
In addition to the fields to be changed Updates need to include TYPE, STATUS, EST#, and CONT_NAME fields.
For changes in Publication, Contact, or Source data, or for changes in EST#'s or CONT_NAME, send an email message describing the change that is needed.
Send the update files to: firstname.lastname@example.org
If you have questions about the EST submission format, please contact
File TypesThere are four types of deliverable files:
Each EST file needs to reference the Publication, Library, and Contact data. Therefore the Publication, Library, and Contact files must be in the database when the EST file is entered. Once these files have been submitted and entered, they do not need to be re-submitted for additional EST files that have the same Publication, Library, or Contact.
TYPE: Entry type - must be "Pub" for publication entries. **Obligatory field** MEDUID: Medline unique identifier. Not obligatory, include if you know it. TITLE: Title of article. (Begin on same line or the line below tag, use multiple lines if necessary) **Obligatory field** AUTHORS: Author name, format: Name,I.I.; Name2,I.I.; Name3,I.I. (Begin on the same line or the line below tag, use multiple lines if necessary) **Obligatory field** JOURNAL: Journal name VOLUME: Volume number SUPPL: Supplement number ISSUE: Issue number I_SUPPL: Issue supplement number PAGES: Page, format: 123-9 YEAR: Year of publication. **Obligatory field** STATUS: Publication status. 1=unpublished, 2=submitted, 3=in press, 4=published **Obligatory field** ||
TYPE: Pub MEDUID: 92347897 TITLE: Expressed sequence tags and chromosomal localization of cDNA clones from a subtracted retinal pigment epithelium library AUTHORS: Gieser,L.; Swaroop,A. JOURNAL: Genomics VOLUME: 13 ISSUE: 2 PAGES: 873-6 YEAR: 1992 STATUS: 4 ||
Pub data template with required and most often used fields:
TYPE: Pub TITLE: title AUTHORS: authors JOURNAL: VOLUME: ISSUE: PAGES: YEAR: STATUS: ||
TYPE: Entry type - must be "Lib" for library entries. **Obligatory field** NAME: Name of library. **Obligatory field** ORGANISM: Organism from which library prepared. Scientific name. **Obligatory field** STRAIN: Organism strain CULTIVAR: Plant cultivar ISOLATE: Individual isolate from which the sequence was obtained SEX: Sex of organism (female, male, hermaphrodite) ORGAN: Organ name TISSUE: Tissue type CELL_TYPE: Cell type CELL_LINE: Name of cell line STAGE: Developmental stage HOST: Laboratory host VECTOR: Name of vector. V_TYPE: Type of vector (Cosmid, Phage,Plasmid,YAC, other) RE_1: Restriction enzyme at site1 of vector RE_2: Restriction enzyme at site2 of vector DESCR: Description of library preparation methods, vector, etc. Text starts on the same line or the line below the DESCR: tag. ||
TYPE: Lib NAME: Rat embryonic day 17 post-fertilization Library ORGANISM: Rattus norvegicus STRAIN: Sprague-Dawley SEX: male STAGE: embryonic day 17 post-fertilization TISSUE: aorta CELL_TYPE: vascular smooth muscle DESCR: ||
Lib data template with required and most often used fields:
TYPE: Lib NAME: ORGANISM: STRAIN: CULTIVAR: SEX: ORGAN: TISSUE: CELL_TYPE: CELL_LINE: STAGE: HOST: VECTOR: V_TYPE: RE_1: RE_2: DESCR: description ||
TYPE: Entry type - must be "Cont" for contact entries. **Obligatory field** NAME: Name of person submitting the EST. **Obligatory field** FAX: Fax number as string of digits. TEL: Telephone number as string of digits. EMAIL: E-mail address LAB: Laboratory providing EST. INST: Institution name ADDR: Address string, comma delineation. ||
TYPE: Cont NAME: Sikela JM FAX: 303 270 7097 TEL: 303 270 EMAIL: email@example.com LAB: Department of Pharmacology INST: University of Colorado Health Sciences Center ADDR: Box C236, 4200 E. 9th Ave., Denver, CO 80262-0236, USA ||
Contact data template with required and most often used fields:
TYPE: Cont NAME: FAX: TEL: EMAIL: LAB: INST: ADDR: ||
TYPE: Entry type - must be "EST" for EST entries. **Obligatory field** STATUS: Status of EST entry - "New" or "Update". **Obligatory field** CONT_NAME: Name of contact (must be identical string to the contact entry) **Obligatory field** CITATION: Journal citation. (Must be identical string to the publication title) Begins on the same line or the line below tag - use continuation lines if necessary. **Obligatory field** LIBRARY: Library name. (Must be identical string to library name entry.) **Obligatory field** EST#: EST id assigned by contact lab. For EST updates, this is the string we match on. Length limit: 64 characters. **Obligatory field** GB#: GenBank accession number GDB#: Genome database accession number GDB_DSEG: Genome database Dsegment number CLONE: Clone id. SOURCE: Source providing clone e.g. ATCC SOURCE_DNA: Source id number for the clone as pure DNA SOURCE_INHOST: Source id number for the clone stored in the host. OTHER_EST: Other ESTs on this clone. DBNAME: Database name for cross-reference to another database DBXREF: Database cross-reference accession PCR_F: Forward PCR primer sequence PCR_B: Backward PCR primer sequence INSERT: Insert length (in bases) ERROR: Estimated error in insert length (bases) PLATE: Plate number or code ROW: Row number or letter COLUMN: Column number or letter SEQ_PRIMER: Sequencing primer description or sequence. P_END: Which end sequenced e.g. 5' HIQUAL_START: Base position of start of highest quality sequence (default=1) HIQUAL_STOP: Base position of last base of highest quality sequence. DNA_TYPE: cDNA (default), Genomic, Viral, Synthetic, Other PUBLIC: Date of public release. Leave blank for immediate release. Format: 9/11/1994 (MM/DD/YYYY) **Obligatory field** PUT_ID: Putative identification of sequence by submitter. TAG_LIB: Name of library whose tag is found in this sequence. TAG_TISSUE: Tissue that was source for the tagged library, if a library tag was found. TAG_SEQ: The actual sequence of the library tag found in the EST read. If the tag was searched for and not found, put 'Not found' in this field. POLYA: Y or N to indicate if a polyA tail was or was not found in the EST sequence. COMMENT: Comments about EST. Starts on the same line or the line below COMMENT: tag. SEQUENCE: Sequence string. Starts on the same line or the line below SEQUENCE: tag. **Obligatory field** ||
TYPE: EST STATUS: New CONT_NAME: Kerlavage AR EST#: HHC189f CLONE: HHC189 SOURCE: ATCC SOURCE_INHOST: 65128 OTHER_EST: HHC189r CITATION: Complementary DNA sequencing: expressed sequence tags and human genome project SEQ_PRIMER: M13 Forward P_END: 5' HIQUAL_START: 1 HIQUAL_STOP: 285 DNA_TYPE: cDNA LIBRARY: Hippocampus, Stratagene (cat. #936205) PUBLIC: PUT_ID: Actin, gamma, skeletal COMMENT: This is a comment about the sequence. It may span several lines. SEQUENCE: AATCAGCCTGCAAGCAAAAGATAGGAATATTCACCTACAGTGGGCACCTCCTTAAGAAGCTG ATAGCTTGTTACACAGTAATTAGATTGAAGATAATGGACACGAAACATATTCCGGGATTAAA CATTCTTGTCAAGAAAGGGGGAGAGAAGTCTGTTGTGCAAGTTTCAAAGAAAAAGGGTACCA GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT TGTTAGGAAATGGCAAAGTATTGATGATTGTGTGCTATGTGATTGGTGCTAGATACTTTAAC TGAGTATACGAGTGAAATACTTGAGACTCGTGTCACTT ||
EST data template with required and most often used fields:
TYPE: EST STATUS: CONT_NAME: CITATION: publication title LIBRARY: EST#: CLONE: SOURCE: SOURCE_DNA: SOURCE_INHOST: PCR_F: PCR_B: INSERT: ERROR: PLATE: ROW: COLUMN: SEQ_PRIMER: P_END: HIQUAL_START: HIQUAL_STOP: DNA_TYPE: PUBLIC: PUT_ID: POLYA: COMMENT: comments SEQUENCE: sequence ||
CONT_NAME field of EST file and NAME field of the Contact file
LIBRARY field of EST file and NAME field of the Library file.
CITATION field of EST file and TITLE field of the Publication file.
We scan these fields from the EST file and matching them automatically to Library, Contact and Publication records in the other tables, so content, spelling, letter case and spacing must match.