The DDBJ/EMBL/GenBank

                            Feature Table:
                                Definition

 

 

 

                                                    Version 7  Oct 2007

 

 

 

 

 

              DNA Data Bank of Japan, Mishima, Japan.

     EMBL Nucleotide Sequence Database, Cambridge, UK.

                   GenBank, NCBI, Bethesda, MD, USA.


1 Introduction.. 1

2 Overview of the Feature Table format.. 1

2.1 Format Design.. 1

2.2 Key aspects of this feature table design.. 2

2.3 Feature Table Terminology.. 3

3 Feature table components and format.. 4

3.1 Naming conventions. 4

3.2 Feature keys. 4

3.2.1 Purpose. 4

3.2.2 Format and conventions. 4

3.2.3 Key groups and hierarchy. 5

3.2.4 Feature key examples. 6

3.3 Qualifiers. 6

3.3.1 Purpose. 6

3.3.2 Format and conventions. 6

3.3.3 Qualifier values. 7

3.3.4 Qualifier examples. 8

3.4 Feature labels. 8

3.4.1 Purpose. 8

3.4.2 Format and conventions. 8

3.4.3 Examples of feature labels. 9

3.5 Location.. 9

3.5.1 Purpose. 9

3.5.2 Format and conventions. 9

3.5.3 Location examples. 11

4 Feature table Format.. 12

4.1 Format examples. 12

4.2 Definition of line types. 13

4.3 Data item positions. 13

4.4 Use of blanks. 13

5 Examples of sequence annotation.. 15

5.1 Eukaryotic gene. 15

5.2 Bacterial operon.. 16

5.3 Artificial cloning vector (circular) 17

5.4 Plasmid.. 18

5.5 Repeat element. 18

5.6 Immunoglobulin heavy chain.. 19

5.7 T-cell receptor.. 19

5.8  transfer RNA.. 20

6. Limitations of this feature table design.. 20

7. Appendices. 21

7.1 Appendix I  EMBL,GenBank and DDBJ entries. 21

7.1.1 EMBL Format 21

7.1.2 GenBank Format 22

7.1.3 DDBJ Format 23

7.2 Appendix II Feature table: Backus-Naur form.. 24

7.3 Appendix III: Feature keys reference. 27

7.4 Appendix IV: Summary of qualifiers for feature keys. 91

7.4.1 Qualifier List 91

7.4.2 Feature qualifiers – mapped to Feature keys. 112

7.5 Appendix V: Controlled vocabularies. 131

7.5.1 Nucleotide base codes (IUPAC) 131

7.5.2 Modified base abbreviations. 132

7.5.3 Amino acid abbreviations. 133

7.5.4 Modified and unusual Amino Acids. 134

7.5.5 Genetic Code Tables. 135

7.5.6 Country Names. 138

 


1 Introduction

Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism.
In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice.

2 Overview of the Feature Table format

The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis.

The range of features to be represented is diverse, including regions which:

·        perform a biological function,

·        affect or are the result of the expression of a biological function,

·        interact with other molecules,

·        affect replication of a sequence,

·        affect or are the result of recombination of different sequences,

·        are a recognizable repeated unit,

·        have secondary or tertiary structure,

·        exhibit variation, or have been revised or corrected.

 

 

2.1 Format Design

The format design is based on a tabular approach and consists of the following items:

Feature key

a single word or abbreviation indicating functional group 

Location

instructions for finding the feature

Qualifiers

auxiliary information about a feature


 

2.2 Key aspects of this feature table design

·        Feature keys allow specific annotation of important sequence features.

·        Related features can be easily specified and retrieved.

Feature keys are arranged hierarchically, allowing complex and compound features to be expressed. Both location operators and the feature keys show feature relationships even when the features are not contiguous. The hierarchy of feature keys allows broad categories of biological functionality, such as rRNAs, to be easily retrieved.

·        Generic feature keys provide a means for entering new or undefined features.
A number of "generic" or miscellaneous feature keys have been added to permit annotation of features that cannot be adequately described by existing feature keys. These generic feature keys will serve as an intermediate step in the identification and addition of new feature keys. The syntax has been designed to allow the addition of new feature keys as they are required.

·        More complex locations (fuzzy and alternate ends, for example) can be specified.
Each end point of a feature may be specified as a single point, an alternate set of possible end points, a base number beyond which the end point lies, or a region which contains the end point.

·        Features can be combined and manipulated in many different ways.

The location field can contain operators or functional descriptors specifying what must be done to the sequence to reproduce the feature. For example, a series of exons may be "join"ed into a full coding sequence.

·        Standardized qualifiers provide precision and parsibility of descriptive details

A combination of standardized qualifiers and their controlled-vocabulary values enable free-text descriptions to be avoided.
 

·        The nature of supporting evidence for a feature can be explicitly indicated.

Features, such as open reading frames or sequences showing sequence similarity to consensus sequences, for which there is no direct experimental evidence can be annotated. Therefore, the feature table can incorporate contributions from researchers doing computational analysis of the sequence databases. However, all features that are supported by experimental data will be clearly marked as such.

·        The table syntax has been designed to be machine parsible.

A consistent syntax allows machine extraction and manipulation of sequences coding for all features in the table.


2.3 Feature Table Terminology

The format and wording in the feature table use common biological research terminology whenever possible. For example, an item in the feature table such as:

Key             Location/Qualifiers

CDS             23..400

                /product="alcohol dehydrogenase"

                /gene="adhI"

 

might be read as:

The feature  CDS  is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and is coded for by a gene called “adhI”.

A more complex description:

Key             Location/Qualifiers

CDS             join(544..589,688..>1032)

                /product="T-cell receptor beta-chain"


which might be read as:

This feature, which is a partial coding sequence,  is formed by joining elements indicated to form one contiguous sequence encoding a product called T-cell receptor beta-chain.

The following sections contain detailed explanations of the feature table design showing conventions for each component of the feature table, examples of how the format might be implemented, a description of the exact column placement of all the data items and examples of complete sequence entries that have been annotated using the new format. The last section of this document describes known limitations of the current feature table design.

Appendix I gives an example database entry for the DDBJ, GenBank  and EMBL  formats. Appendix II describes the format in Backus-Naur Form (BNF). Appendices III and IV provide reference manuals for the feature table keys and qualifiers, respectively. Appendix V includes controlled vocabularies such as nucleotide base codes, modified base abbreviations, genetic code tables etc.

This document defines the syntax and vocabulary of the feature table. The syntax is sufficiently flexible to allow expression of a single biological entity in numerous ways. In such cases, the annotation staffs at the databases will propose conventions for standard means of denoting the entities.

This feature table format is shared by GenBank, EMBL and DDBJ. Comments, corrections, and suggestions may be submitted to any of the database staffs. New format specifications will be added as needed.


3 Feature table components and format

3.1 Naming conventions

Feature table components, including feature keys, qualifiers, accession numbers, database name abbreviations, feature labels, and location operators, are all named following the same conventions. Component names may be no more than 20 characters long  (Feature keys 15, Feature qualifiers 20) 
and must contain at least one letter. Case should not be regarded as significant in comparing feature labels (“Prot1” and “pROT1” are the same). The following characters are permitted to occur in feature table component names:

·        Uppercase letters (A-Z)

·        Lowercase letters (a-z) Numbers (0-9)

·        Underscore (_)

·        Hyphen (-)

·        Single quotation mark or apostrophe (')

·        Asterisk (*)

3.2 Feature keys

3.2.1 Purpose

Feature keys indicate (1) the biological nature of the annotated feature or (2) information about changes to or other versions of the sequence. The feature key permits a user to quickly find or retrieve similar features or features with related functions.

3.2.2 Format and conventions

There is a defined list of allowable feature keys, which is shown in Appendix III. Each feature must contain a feature key.


3.2.3 Key groups and hierarchy

The feature keys fall into families which are in some sense similar in function and which are annotated in a similar manner. A functional family may have a "generic" or miscellaneous key, which can be recognized by the 'misc.' prefix, that can used for instances not covered by the other defined keys of that group.

The feature key groups are listed below with a short definition and an annotation example:

1.    Difference and change features
Indicate ways in which a sequence should be changed to produce a different "version":

misc_difference location
              /replace="change_location"

2.    Expression signal features
Indicate regions containing a signal that alters a biological function:

misc_signal     location

3.    Transcript features
 Indicate products made by a region:

misc_RNA        location

4.    Binding features
Indicate that a sequence or nucleotide is covalently, non-covalently, or otherwise bound to something else:

misc_binding    location
              /bound_moiety="bound molecule"

5.    Repeat features
Indicate repetitive sequence elements:

repeat_region   location

6.    Recombination features
Indicate regions that have been either inserted or deleted by recombination:

misc_recomb     location

7.    Structure features
Indicate sequence for which there is secondary or tertiary structural information:

misc_structure  location

In addition to the functional groupings shown above, the feature keys can also be arranged in a hierarchical tree based on the degree of specificity or level of detail known about a feature. This hierarchy is shown in outline form in Appendix III where the most general level is the 'misc_feature' key and other keys are arranged in increasing level of detail. By using more general keys, features can be annotated even if their biological functions are insufficiently well characterized to assign them more specific keys.

3.2.4 Feature key examples

Key                     Description     

 

CDS                     Protein-coding sequence

RBS                     ribosome binding site

rep_origin              Origin of replication

protein_bind            Protein binding site on DNA

tRNA                    mature transfer RNA

 

See Appendix III for descriptions of all feature keys.

3.3 Qualifiers

3.3.1 Purpose

Qualifiers provide a general mechanism for supplying information about features in addition to that conveyed by the key and location.

3.3.2 Format and conventions

Qualifiers take the form of a slash (/) followed by the qualifier name and, if applicable, an equal sign (=) and a value. Each qualifier should have a single value; if multiple values are necessary, these should be represented by iterating the same qualifier, eg:

Key             Location/Qualifiers

 

CDS             1..1000

                /codon=(seq:"cug",aa:Ser)

                /codon=(seq:"tga",aa:Trp)

 

If the location descriptor does not need a continuation line, the first qualifier begins a new line in the feature location column. If the location descriptor requires a continuation line, the first qualifier may follow immediately after the location. Any necessary continuation lines begin in the same column. See Section 4 for a complete description of data item positions.


 

3.3.3 Qualifier values


Since qualifiers convey many different types of information, there are several value formats:

1.    Free text

2.    Controlled vocabulary or enumerated values

3.    Citation or reference numbers

4.    Sequences

5.    Feature labels

3.3.3.1 Free text

Most qualifier values will be a descriptive text phrase which must be enclosed in double quotation marks. When the text occupies more than one line, a single set of quotation marks is required at the beginning and at the end of the text. The text itself may be composed of any printable characters (ASCII values 32-126 decimal). If double quotation marks are used within a free text string, each set (") must be 'escaped' by placing a second double quotation mark immediately before it (""). For example:

              /note="This is an example of ""escaped"" quotation marks"

3.3.3.2 Controlled vocabulary or enumerated values

Some qualifiers require values from a controlled vocabulary and are entered without quotation marks. For example, the '/direction' qualifier has only three values: 'left', 'right' or 'both'. Qualifier value controlled vocabularies, like feature table component names, must be treated as completely case insensitive: they may be entered and displayed in any combination of upper and lower case ('/direction=Left' '/direction=left' and '/direction=LEFT' are all legal and all convey the same meaning). The database staffs reserve the right to regularize the case of qualifier values in the interest of readability, unlike the case of feature labels where the databases will maintain the case as originally entered (see Section 3.4.2). Qualifier value controlled vocabularies will be maintained by the cooperating database staffs. Examples of controlled vocabularies can be found in Appendices IV and V. The database staff should be contacted for the current lists.

3.3.3.3 Citation or reference numbers

The citation or published reference number (as enumerated in the entry 'REFERENCE' or 'RN' data item) should be enclosed in square brackets (e.g., [3]) to distinguish it from other numbers.

3.3.3.4 Sequences

Literal sequence of nucleotide bases e.g., join(12..45,"atgcatt",988..1050) in location descriptors has become illegal starting from implementation of version 2.1 of the Feature Table Definition Document (December 15, 1998)

3.3.4 Qualifier examples

 

Key             Location/Qualifiers

 

source          1..1509

                /organism="Mus musculus"

                /strain="CD1"

                /mol_type=”genomic DNA”

promoter        <1..9

                /gene="ubc42"

mRNA            join(10..567,789..1320)

                /gene="ubc42"

CDS             join(54..567,789..1254)

                /gene="ubc42"

                /product="ubiquitin conjugating enzyme"

                /function="cell division control"

3.4 Feature labels

The /label= qualifier takes as its value a feature label. Feature labels follow the same naming conventions as other feature table components (e.g., keys and qualifiers). While feature labels are optional, attaching a label to a feature allows it to be referred to unambiguously. For example, the feature label can be used to refer unambiguously to a coding region that exists in a different entry to the exons of which it is comprised.

3.4.1 Purpose

The feature label identifies a feature item within an entry and, when combined with the entry's primary accession number and the name of the database from which it came, is a permanent internationally unique tag for that feature. There are, however, certain situations in which a "permanent" feature may "disappear" from the distributed version of the database and others in which it may be desirable to change a feature's label. 

3.4.2 Format and conventions

Each feature in a feature table may have a label which must be unique within that entry, but which may be the same as feature labels used in other entries. A feature can be given any label. However, labels containing meaningful abbreviations will be much more easily remembered than non-descriptive labels. Because letter case is not significant, two features within one entry cannot have labels that differ only in case: '16S_rRNA' and '16s_rRNA' could not both be used in the same entry.

The full feature name syntax is as follows:

          Database name::primary accession number:feature label

References to a feature should use as much of the full feature name as required to unambiguously identify the feature.

3.4.3 Examples of feature labels

Feature label           Description    

 

adhI                    adhI gene coding for alcohol dehydrogenase

tfp35                   tail fiber protein 35

3'-ltr                  long terminal repeat

a1col_x51               prepro-alpha-1-collagen, exon 51

X10045:diff1            first conflict for the sequence of entry X10045

GB::K10675:catexA       feature with label catexA in entry K10675 of the

                        GenBank databank

3.5 Location

3.5.1 Purpose

The location indicates the region of the presented sequence which corresponds to a feature.

3.5.2 Format and conventions

The location contains at least one sequence location descriptor and may contain one or more operators with one or more sequence location descriptors. Base numbers refer to the numbering in the entry. This numbering designates the first base (5' end) of the presented sequence as base 1.

Base locations beyond the range of the presented sequence may not be used in location descriptors, the only exception being location in a remote entry (see 3.5.2.1, e).  

 

Location operators and descriptors are discussed in more detail below. 

 

3.5.2.1 Location descriptors

The location descriptor can be one of the following:

(a) a single base number

(b) a site between two indicated adjoining bases

(c) a single base chosen from within a specified range of bases (not allowed for new

    entries)

(d) the base numbers delimiting a sequence span

(e) a remote entry identifier followed by a local location descriptor

    (i.e., a-d)

 

A site between two adjoining nucleotides, such as endonucleolytic cleavage site, is indicated by listing the two points separated by a carat (^). The permitted formats for this descriptor are n^n+1 (for example 55^56), or, for circular molecules, n^1, where "n" is the full length of the molecule, ie 1000^1 for circular molecule with length 1000.

A single base chosen from a range of bases is indicated by the first base number and the last base number of the range separated by a single period (e.g., '12.21' indicates a single base taken from between the indicated points). From October 2006 the usage of this descriptor is restricted: it is illegal to use "a single base from a range" (c) either on its own or in combination with the "sequence span" (d) descriptor for newly created entries. The existing entries where such descriptors exist are going to be retrofitted.

 

Sequence spans are indicated by the starting base number and the ending base number separated by two periods (e.g., '34..456'). The '<' and '>' symbols may be used with the starting and ending base numbers to indicate that an end point is beyond the specified base number. The starting and ending base positions can be represented as distinct base numbers ('34..456') or a site between two indicated adjoining bases.

 

A location in a remote entry (not the entry to which the feature table belongs) can be specified by giving  the accession-number and sequence version of the remote entry, followed by a colon ":", followed by a location descriptor which applies to that entry's sequence (i.e. J12345.1:1..15, see also examples below)

3.5.2.2 Operators

The location operator is a prefix that specifies what must be done to the indicated sequence to find or construct the location corresponding to the feature. A list of operators is given below with their definitions and most common format.

complement(location)

Find the complement of the presented sequence in the span specified by "location" (i.e., read the complement of the presented strand in its 5'-to-3' direction)

join(location,location, ... location)

The indicated elements should be joined (placed end-to-end) to form one contiguous sequence

order(location,location, ... location)

The elements can be found in the specified order (5' to 3' direction), but nothing is implied about the reasonableness about joining them

 

Note : location operator "complement" can be used in combination with either "join" or "order" within the same location; combinations of "join" and "order" within the same location (nested operators) are illegal.


 

 

3.5.3 Location examples


The following is a list of common location descriptors with their meanings:

 

Location                  Description  

 

467                       Points to a single base in the presented sequence

 

340..565                  Points to a continuous range of bases bounded by and

                          including the starting and ending bases

 

<345..500                 Indicates that the exact lower boundary point of a feature

                          is unknown.  The location begins at some  base previous to

                          the first base specified (which need not be contained in

                          the presented sequence) and continues to and includes the

                          ending base

 

<1..888                   The feature starts before the first sequenced base and

                          continues to and includes base 888

 

1..>888                   The feature starts at the first sequenced base and

                          continues beyond base 888

 

102.110                   Indicates that the exact location is unknown but that it is

                          one of the bases between bases 102 and 110, inclusive

 

123^124                   Points to a site between bases 123 and 124

 

join(12..78,134..202)     Regions 12 to 78 and 134 to 202 should be joined to form

                          one contiguous sequence

 

 

complement(34..126)       Start at the base complementary to 126 and finish at the

                          base complementary to base 34 (the feature is on the strand

                          complementary to the presented strand)

 

 

complement(join(2691..4571,4918..5163))

                          Joins regions 2691 to 4571 and 4918 to 5163, then

                          complements the joined segments (the feature is on the

                          strand complementary to the presented strand)

 

join(complement(4918..5163),complement(2691..4571))

                          Complements regions 4918 to 5163 and 2691 to 4571, then

                          joins the complemented segments (the feature is on the

                          strand complementary to the presented strand)

 

J00194.1:100..202         Points to bases 100 to 202, inclusive, in the entry (in

                          this database) with primary accession number 'J00194'

 

join(1..100,J00194.1:100..202)

                          Joins region 1..100 of the existing entry with the region

                          100..202 of remote entry J00194


 

4 Feature table Format

The examples below show the preferred sequence annotations for a number of commonly occurring sequence types. These examples may not be appropriate in all cases but should be used as a guide whenever possible. This section describes the columnar format used to write this feature table in "flat-file" form for distributions of the database.

4.1 Format examples

Feature table format example (EMBL):

     source          1..1859

                     /db_xref="taxon:3899"

                     /organism="Trifolium repens"

                     /tissue_type="leaves"

                     /clone_lib="lambda gt10"

                     /clone="TRE361"

                     /mol_type=”genomic DNA”

     CDS             14..1495

                     /db_xref="MENDEL:11000"

                     /db_xref="SWISS-PROT:P26204"

                     /note="non-cyanogenic"

                     /EC_number="3.2.1.21"

                     /product="beta-glucosidase"

                     /protein_id="CAA40058.1"

                     /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSR.......

---------+---------+---------+---------+---------+---------+---------+---------

1       10        20        30        40        50        60        70       79

 

Feature table format example (GenBank):

 

     source          1..8959

                     /organism="Homo sapiens"

                     /db_xref="taxon:9606"

                     /mol_type=”genomic DNA”

     gene            212..8668

                     /gene="NF1"

     CDS             212..8668

                     /gene="NF1"

                     /note="putative"

                     /codon_start=1

                     /product="GAP-related protein"

                     /protein_id="AAA59924.1"

                     /translation="MAAHRPVEWVQAVVSRFDEQLPIKTGQQNTHTKVSTE.......

---------+---------+---------+---------+---------+---------+---------+---------

1       10        20        30        40        50        60        70       79

 

Feature table format example (DDBJ):

 

 

     source          1..2136

                     /clone="pK28"

                     /organism="Rattus norvegicus"

                     /strain="Sprague-Dawley"

                     /tissue_type="kidney"

                     /mol_type=”genomic DNA”

     mRNA            19..2128

     CDS             31..1212

                     /codon_start=1

                     /function="Dual specificity protein tyrosine/threonine

                     kinase"

                     /product="MAP kinase kinase"

                     /protein_id="BAA02603.1"

                     /translation="MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKL.......

---------+---------+---------+---------+---------+---------+---------+---------

1       10        20        30        40        50        60        70       79

 

4.2 Definition of line types

The feature table consists of a header line, which contains the column titles for the table, and the
individual feature entries. Each feature entry is composed of a feature descriptor line and qualifier and
continuation lines, if needed. The feature descriptor line contains the feature's name, key, and location. If
the location cannot be contained on the first line of the feature descriptor, it is continued on a continuation
line immediately following the descriptor line. If the feature requires further attributes, feature qualifier
lines immediately follow the corresponding feature descriptor line (or its continuation). Qualifier
information that cannot be contained on one line continues on the following continuation lines as
necessary.

Thus, there are 4 types of feature table lines:

      Line type            Content                 #/entry     #/feature

      ---------            -------                 -------     ---------

 

      Header               Column titles           1*          N/A

      Feature descriptor   Key and location        1 to many*  1

      Feature qualifiers   Qualifiers and values   N/A         0 to many

      Continuation lines   Feature descriptor or   0 to many   0 to many

                           qualifier continuation

 

4.3 Data item positions

The position of the data items within the feature descriptor line is as follows:

     column position    data item

     ---------------    ---------

 

     1-5                blank

     6-20               feature key

     21                 blank

     22-80              location

 

Data on the qualifier and continuation lines begins in column position 22 (the first 21 columns contain blanks). The EMBL format for all lines differs from the GenBank / DDBJ formats  that it includes a line type abbreviation in columns 1 and 2.

4.4 Use of blanks

Blanks (spaces) may, in general, be used within the feature location and qualifier values to make the construction more readable. The following rules should be observed:

·        Names of feature table components may not contain blanks (see Section 3.1)

·        Operator names may not be separated from the following open parenthesis (the beginning of the operand list) by blanks.

·        Qualifiers may not be separated from the preceding slash or the following equals sign (if one) by blanks


 

5 Examples of sequence annotation

The examples below show the preferred sequence annotations for a number of commonly occurring sequence types. These examples may not be appropriate in all cases but should be used as a guide whenever possible.

 

5.1 Eukaryotic gene

source             1..1509
                   /organism="Mus musculus"

                   /strain="CD1"

                   /mol_type=”genomic DNA”

promoter           <1..9

                   /gene="ubc42"

mRNA               join(10..567,789..1320)

                   /gene="ubc42"

CDS                join(54..567,789..1254)

                   /gene="ubc42"

                   /product="ubiquitin conjugating enzyme"

                   /function="cell division control"

                   /translation="MVSSFLLAEYKNLIVNPSEHFKISVNEDNLTEGPPDTLY

                   QKIDTVLLSVISLLNEPNPDSPANVDAAKSYRKYLYKEDLESYPMEKSLDECS

                   AEDIEYFKNVPVNVLPVPSDDYEDEEMEDGTYILTYDDEDEEEDEEMDDE"

exon               10..567

                   /gene="ubc42"

                   /number=1

intron             568..788

                   /gene="ubc42"

                   /number=1

exon               789..1320

                   /gene="ubc42"

                   /number=2

polyA_signal       1310..1317

                   /gene="ubc42"

 


 

 

5.2 Bacterial operon

source             1..9430
                   /organism="Lactococcus sp."
                   /strain="MG1234"
                   /mol_type="genomic DNA"

operon             160..6865

                   /operon=”gal”

-35_signal         160..165
                   /operon=”gal”

                   /experiment=”experimental evidence, no additional details

                   recorded”
-10_signal         179..184
                   /operon=”gal”

                   /experiment=”experimental evidence, no additional details

                   recorded”
CDS                405..1934

                   /operon=”gal”

                   /gene="galA"
                   /product="galactose permease"
                   /function="galactose transporter"
                   /experiment=”experimental evidence, no additional details

                   recorded”
CDS                2003..3001

                   /operon=”gal”

                   /gene="galM"
                   /product="aldose 1-epimerase"
                   /EC_number="5.1.3.3"
                   /function="mutarotase"
CDS                3235..4537

                   /operon=”gal”

                   /gene="galK"
                   /product="galactokinase"
                   /EC_number="2.7.1.6"
                   /experiment=”experimental evidence, no additional details

                   recorded”
mRNA               189..6865

                   /operon="gal"

                   /experiment=”experimental evidence, no additional details

                   recorded”

5.3 Artificial cloning vector (circular)

source             1..5300

/organism="Cloning vector pABC"

/lab_host="Escherichia coli"

/mol_type="other DNA"

/focus

source             1..5138

/organism="Escherichia coli"

/mol_type="other DNA"

                   /strain="K12"

source             5139..5247

/organism="Aequorea victoria"

/mol_type="other DNA"

                   /dev_stage="adult"

source             5248..5300

/organism="Escherichia coli"

/mol_type="other DNA"

/strain="K12"

CDS                join(complement(<1..799),complement(5080..5120))

/gene="mob1"

/product="mobilization protein 1"

CDS                complement(1697..2512)

/gene="Km"

/product="kanamycin resistance protein"

CDS                3037..3711

/gene="rep1"

/product="replication protein 1"

CDS                complement(4170..4829)

/gene="Cm"

                   /product="chloramphenicol resistance protein"

CDS                5139..5247

                   /gene="GFP"

                   /product="green fluorescent protein"

 


 

5.4 Plasmid

source             1..2245

                   /organism="Escherichia coli"

                   /plasmid="Plasmid XYZ"

                   /strain="K12"

                   /mol_type=”genomic DNA”

rep_origin         6

                   /direction=LEFT

                   /note="ori"

CDS                join(complement(567..795),complement(21..349))

                   /gene="trbC"

                   /product="transfer protein C"

CDS                803..1344

                   /gene="traN"

                   /product="transfer protein N"

CDS                1559..1985

                   /gene="incA

                   /product="incompatability protein A"

CDS                join(2004..2195,3..20)

                   /gene="finP"

                   /product="fertility inhibition protein P"

5.5 Repeat element

source             1..1011

                   /organism="Homo sapiens"

                   /clone="pha281u/1DO"

                   /mol_type="genomic DNA"

repeat_region      80..401

                   /rpt_type=DISPERSED

                   /rpt_family="Alu-J"


 

5.6 Immunoglobulin heavy chain

 

source             1..321

                   /organism="Mus musculus "

                   /strain="BALB/c2

                   /cell_line="hybridoma 1A4"

                   /rearranged

                   /mol_type=”mRNA”

CDS                <1..>321

                   /codon_start=1

                   /gene="VFM1-DFL16.1-JH4"

/product="immunoglobulin heavy chain"

V_region           1..277