Schema Change Document Description:

 

The table that lists the schema change has three columns.  The first two columns show the build id. The third column has change description.

Table names are in bold. Only columns are indented and only columns with changes are shown under the table.

Please see data dictionary for full table details.

 ‘ -- -- --‘ means no corresponding objects ( table or column) for the build.

For example, in build 118, a new table “SNPAlleleFreq” is added.

 

Build 127 Schema Change

Date last updated: March 2, 2007

 

-          For human, SubInd table is replaced with a partitioned view of a set of underlying tables by chromosomes in two separate databases to keep each database size small for ease of maintenance: human_gty1 has SubInd for chromosome 1-7.  human_gyt2 has SubInd for chromosome 8-22, X, Y, Mitochondrial.

-          Table name format is: SubInd_ch1, SubInd_ch2 etc. SubInd_Un is for any snp that is mapped to a contig but the contig’s chromosome is unknown; SubInd_NotOn is for any snp that can not be aligned to any contigs.

 

 

 

Build 125 Schema Change

Date last updated: September 28, 2005

 

-          Major changes are in alignment types in mapping tables.  Detailed description of new mapping tables is in:  ftp://ftp.ncbi.nih.gov/snp/specs/b125_mapping.doc

-          The other major change is that starting b125 the old dbSNP database is split by organisms into 39 different snp databases with identical schema.

-          Except mapping table changes, other changes are minimal.  Please check back around October 6, 2005 for updated b125 ER diagram.

 

Build 123 Schema Change

Date last updated: November 3, 2004

 

 

Build 122

Build 123

Note

 

 

 

    AlleleMotif                        

    AlleleMotif                         

 

        repeat_cnt                     

        repeat_cnt                     

 

          Datatype:int                 

          Datatype:real                

We have submitted variation as (ACTGTGGGGACACCTCGCCTGCACCCACCTCCCTCCCCCCAGCTCCCGAGC)10.5/11/11.5/12/14/15/21.  So repeat_cnt need to real to record the repeat value of "10.5" in this case.

 

 

 

    ContigInfo                         

    ContigInfo                         

 

        contig_acc                     

        contig_acc                     

 

          Datatype:varchar(15)         

          Datatype:varchar(25)         

To accomate longer contig accession names.

 

 

 

    -- -- --                            

    db_build_pipeline                  

This is an internal table to track tasks. It is in draft state.

 

 

 

    -- -- --                           

    dn_Motif_rev                       

Motif table has rev_motif_id to point to the reverse motif sequence in the same Motif table.  This table is a denormalized table that makes it easier to find the reverse form of a Motif.  This table is used in internal computations.

 

 

 

    -- -- --                            

    dn_UniGty_allele                   

This table is used in internal computation.

 

 

 

    -- -- --                           

    dn_UniVariation_rev                

Similar to dn_Motif_rev. This table makes it easier to get the reverse form of a univariation string.

 

 

 

 

 

 

    -- -- --                           

    hsSupplContigInfo                  

dbSNP build 123 human snp are mapped to Genome build 35.  To assist users who still work with NCBI build 34.3 map, we have also mapped the new dbSNP build 123 refSNP(rs) set against build 34.3. The results are in SNPContigLoc_b34_3.  This table stores the contig information for build 34.3. It has exactly the same structure as ContigInfo, which has build 35 contig information.

 

 

 

    -- -- --                           

    hsSupplMapInfo                     

Please see hsSupplContigInfo note. This table is the "SNPMapInfo" counterpart for build 123 rs onto build 34.3. SNPMapInfo, of course, is the map statistics for dbSNP build 123 rs hits on NCBI build 35.

 

 

 

    -- -- --                           

    IndGrpByPop                        

This table assigns "individual group such as Asian" to population id. Individual group is a way to group snp frequency data for analysis. Ind_grp is in SubmittedIndividual, IndivBySource and Individual table.

 

 

 

    -- -- --                           

    IndGrpByPopInd                     

This table assigs "individual group" to each individual id within a population. This is in addition to the above IndGrpByPop because submitters define population and sometimes a submitted population contains individuals from several "Individual groups".

 

 

 

    IndMergeArch                       

    IndMergeArch                       

We assign individual id (ind_id) based on submission information for example Coriell number. When there is no information on submitted individual id, we assign new ind_id. Later we may get Coriell numbers and find out that several separate ind_id are actually  one. We merge these ind_id. The merge process is saved in this table.

        old_ind_id                     

        -- -- --                       

 

 

 

 

        -- -- --                       

        ind_id_high                    

old_ind_id-> new_ind_id was in previous build. To be more precise, we changed to column name to: ind_id_high and ind_id_low. Since when we merge ind_id, we always keep the ind_id that has the lowest number.

 

 

 

        new_ind_id                     

        -- -- --                        

 

 

 

 

        -- -- --                       

        ind_id_low                     

 

 

 

 

        -- -- --                       

        build_id                       

 

 

 

 

        -- -- --                        

        create_time                    

 

 

 

 

    -- -- --                           

    IUPACna                            

This is a table that assists simple computation of IUPack bases.

 

 

 

 

 

 

    -- -- --                           

    mmContigExon_35_1                  

Started mapping of mouse to NCBI build 35.1.

 

 

 

 

 

 

    RsMergeArch                        

    RsMergeArch                        

 

        build_id                       

        build_id                       

smallint ranges from -32,768 - 32,767. Enough for our build_id. A quick way to save space.

          Datatype:int                 

          Datatype:smallint            

 

 

 

 

        -- -- --                       

        rsCurrent                      

When an rs merges away several times, we need to trace rsHigh->rsLow several time to get the current rs. For convenience, rsCurrent and orien2Current are added. If an rs only merged once, then rsLow is the same as rsCurrent.

        -- -- --                       

        orien2Current                  

see above.

 

 

 

    SNP                                

    SNP                                

 

        avg_heterozygosity             

        avg_heterozygosity             

 

          Datatype:float               

          Datatype:real                

"real" offers plenty precision for the field and saves storage space for this table.

 

 

 

        het_se                         

        het_se                         

 

          Datatype:float               

          Datatype:real                 

see above.

 

 

 

        avg_max_heterozygosity         

        -- -- --                       

Remove not used field to save space.

        snp_class                      

        -- -- --                       

snp_class is the class for the variation type. This is replace by field univar_id which points to the unified variation string for the snp in UniVariation table. Use UniVariation.subsnp_class for this information instead of the old snp_class.

        -- -- --                        

        exemplar_subsnp_id             

For a refSNP exemplar ss, we use to use view SNPSeqPick. When querying large sets of data,  using SNPSeqPick is inefficient and slow. Add this new field simplifies many queries that need to get exemplar ss data for an rs, for example the query to get rs fasta.

        -- -- --                       

        univar_id                      

 This univar_id points to the unified variation string for the snp in UniVariation table. This makes it easy to get all the alleles for an rs.

        -- -- --                       

        cnt_subsnp                     

A refSNP is a cluster of subsnp(ss). Cnt_subsnp is an important property of rs. Although it can be derived from SNPSubSNPLink table, it is convenient to save it here. This is a tradeoff between space and speed.

 

 

 

    SNP_mRNA_pos                       

    -- -- --                           

This table used to store a snp position on an mrna_acc. The position is derived from ContigExon. But due to gaps and alignment exceptions, the position in this table has not been precise.  This table is only used to link to NCBI SeqViewer when a snp is on mRNA. It meant to center the SeqViewer on the snp. But since mRNA sequence is usually short enough to page through, we rather simply bring the mRNA SeqViewer on the first position than keep an inprecise position. We may store this information again in the future when better ways to compute is found.

 

 

 

    SNPAllele                          

    -- -- --                           

This table is unnecessary now that we have univar_id in SNP and we have UniVariAllele table which lists all alleles for each unified variation. For ex. For '(CT)3/4/5/6', UniVariAllele will have allele_id for (CT)3, (CT)4, (CT)5 and (CT)6.

 

 

 

    -- -- --                           

    SNPAncestralAllele                 

We used to have ancestral_allele_flag in SNPAllele table. Since SNPAllele table can be dropped, we need this table to save the ancestral allele data.

 

 

 

    SNPContigLoc                       

    SNPContigLoc                       

 

        allele                         

        allele                          

 

          Datatype:varchar(255)        

          Datatype:varchar(1200)       

New mapping program in build 35 can find alignment with bigger gaps, so allele length needs to be increased. We see that most of the long alleles contain repeats. In the future we may try to condense the allele using repeat string and number the way we do with STR in variation string.

 

 

 

    -- -- --                           

    SNPContigLoc_b34_3                 

This table has exactly the same structure as SNPContigLoc, which has dbSNP build 123 rs hits on NCBI build 35 and other alternative assemblies. But this table SNPContigLoc_b34_3 has dbSNP build 123 rs hits on NCBI build 34.3 to assist users who still use the 34.3 genome maps.

 

 

 

 

 

 

    SNPFlankStatus                     

        -- -- --                       

Starting build 123, mapping programs use megablast internal repeat masker. Before build 123, dbSNP repeat masks flank sequences and save the masking status in SNPFlankStatus.   So this table is no longer needed.

 

 

 

 

 

 

 

 

 

    SNPHWProb                          

    SNPHWProb                          

 

        ind_cnt                         

        ind_cnt                        

 

          Datatype:int                 

          Datatype:smallint            

smallint ranges from -32,768 - 32,767. Enough for this field. A quick way to save space.

 

 

 

    -- -- --                            

    SNPIndGtyFlag                      

Since a snp is a cluster of ss and an ind_id is a cluster of submitted_ind_id. When we have individual genotype for several ss and several submitted_ind_id that cluster to one rs and one ind_id, individual genotype should be same. But we have found out many records with different genotype data.  This table flags this interesting fact.

 

 

 

 

 

 

    SNPSubSNPLink                      

    SNPSubSNPLink                       

 

        comment                        

        comment                        

 

          Null Option:NOT NULL         

          Null Option:NULL             

Comment field is raraly if ever used but occupies large space. So we nullified it but kept the field for compatible with old sql codes. In the future, we may drop this completely.

 

 

 

    SubInd                             

    SubInd                             

 

        batch_id                       

        batch_id                       

 

          Datatype:int                 

          Datatype:smallint            

smallint ranges from -32,768 - 32,767. Enough for this field. A quick way to save space.

 

 

 

        submitted_ind_id               

        submitted_ind_id               

 

          Datatype:int                 

          Datatype:smallint            

smallint ranges from -32,768 - 32,767. Enough for this field. A quick way to save space.

 

 

 

 

 

 

    SubSNP                             

    SubSNP                             

We added three derived fields to SubSNP table to speed up data dump by organism, chromosome, rs and then within each rs cluster, property of each ss.

        -- -- --                       

        snp_id                         

 

        -- -- --                       

        tax_id                         

 

        -- -- --                       

        chr_id                          

 

 

 

 

    SubSNPHWProb                       

    -- -- --                           

Hardy-Weinberg probability (HWP) are computed at each rs level (SNPHWProb) and at each submitted ss and population level( FreqSummaryBySsPop).  This table is therefore dropped.

 

 

 

    SubSNPRepMask                      

    -- -- --                           

We dropped this table for the same reason in SNPFlankStatus note.  Starting from dbSNP build 123, mapping uses megablast internal repeat mask, so we do not track the mask status of submitted snp seqeunce with dbSNP.

 

 

 

    -- -- --                           

    UniVariAllele                      

For each unified variation, this table lists all alleles within the variations. Please see SNP.univar_id note for more details.

 

 

 

    UniVariation                       

    UniVariation                       

This table keeps all unique variation.

        allele_cnt                     

        allele_cnt                     

 

          Datatype:int                 

          Datatype:smallint            

smallint is enough. No need for int.

 

 

 

        -- -- --                        

        src_code                       

This code is a foreign key to UniVariationSrcCode. It tracks the source the first time a univariation is added. For ex. The variation string is added for a reverse form of a variation.

 

 

 

        -- -- --                       

        rev_univar_id                  

Added this field so it is easy to find the reverse form of a univariation.

 

 

 

    -- -- --                           

    UniVariationSrcCode                 

Please see UniVariation.src_code and the table data for detail.

 

 

 

    -- -- --                           

    VariComplexAllele                  

This is an internal table to handle complex variation forms.

 

Build 122 Schema Change

Date last updated: August 18, 2004

 

Build 121

Build 122

Note

    -- -- --                           

    ChiSqPValueLookUp                   

This table replaces ChiSqPValue. This is used to get the p-value from ChiSq when computing H-W Prob.

 

 

 

    dn_Allele_rev                      

    dn_Allele_rev                      

This table is used internally for processing.

        rev_flag                       

        rev_flag                       

 

          Datatype:bit                 

          Datatype:tinyint             

 

 

 

 

    -- -- --                            

    dn_Gty_rev                         

Used for internal processing.

 

 

 

    dn_snpFxnCnt                       

    dn_snpFxnCnt                       

This is a summary table showing count of snp in each functional class(intron, coding etc).

        -- -- --                       

        tax_id                         

Added tax_id to this table since we start to have genomes for more organisms.

 

 

 

    dn_table_rowcount                  

    dn_table_rowcount                  

This table lists the row count of each table. When users re-create their own dbSNP, they can use this table to check if they data were loaded OK.

        org_prefix                     

        -- -- --                        

org_prefix and tax_id is not used yet. Currently, the table rowcount just the rowcount for a table. In the future when we have organism division databases, we may have tables shared by multiple organism. tax_id field will be used to show row count per tax_id.

        -- -- --                       

        tax_id                         

 

 

 

 

    FreqSummaryBySsPop                 

    FreqSummaryBySsPop                 

Column type changes for this table. Float to real to save storage space.

        source                         

        source                         

 

          Datatype:char(2)             

          Datatype:varchar(1)          

We compute H-W prob from either individual genotype data(I), or just genotype frequency (P). So we only need varcchar(1).

 

 

 

        chisq                          

        chisq                          

 

          Datatype:float               

          Datatype:real                

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

        df                             

        df                             

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

        hwp                            

        hwp                            

 

          Datatype:float               

          Datatype:real                

 

          Null Option:NOT NULL         

          Null Option:NULL             

 

 

 

 

    -- -- --                           

    ggaContigExon                      

The new set of chicken mapping table are added. They all have prefix 'gga'.  Organism prefix is defined in OrganismTax.prefix column.

 

 

 

    -- -- --                           

    ggaContigInfo                      

 

 

 

    -- -- --                           

    ggaMapLink                         

 

 

 

    -- -- --                           

    ggaMapLinkPid                      

 

 

 

    -- -- --                           

    ggaMapList                         

 

 

 

 

    -- -- --                           

    ggaSNPContigLoc                    

 

 

 

    -- -- --                           

    ggaSNPContigLocusId                

 

 

 

    -- -- --                           

    ggaSNPMapInfo                      

 

 

 

    -- -- --                           

    IndGrpCode                         

Added this new table to group individuals. See the data in IndGrpCode for a list of groups. Currently, the grouping only applies to human.

 

 

 

    IndivBySource                       

    IndivBySource                      

        -- -- --                       

        src_ind_grp                    

This column stores grouping supplied by sample source ,such as Coriell.

 

 

 

    Individual                          

    Individual                         

 

        -- -- --                       

        ind_grp                        

This column stores the grouping code assigned by dbSNP.

 

 

 

    IndivSourceCode                     

    IndivSourceCode                    

        -- -- --                       

        src_type                       

Added this column to distinguish various sources. For example, Coriell's src_type is "repository" and CEPH is "curator".

 

 

 

        -- -- --                       

        display_order                  

One individual often has several id from different source. This value will be used to order the display.

 

 

 

    -- -- --                            

    IndMergeArch                       

As we get more individual sample information, we may see that previously two separated individuals are actually the same. This table keeps individual merging history. This is not used yet.

 

 

 

 

 

 

    miscMapLink                        

    miscMapLink                        

        -- -- --                       

        i_tax_id                       

For organisms with no NCBI genome assembly, the hit to GenBank accessions are stored in this table. Since there're multiple organisms, tax_id was added. But since tax_id is in SNP table, we could get tax_id by joining SNP and this table. So this column may be unnecessary and be taken out in future schema modifications.

 

 

 

    Publication                        

    Publication                        

 

        asn                            

        -- -- --                       

This field was not used.

 

 

 

    SNPHWProb                           

    SNPHWProb                          

This table stays basically the same except the column name listed below. The column name is not consistent with table FreqSummaryBySsPop which has H-W prob for each ss over an population.

        degreeFreedom                  

        df                             

 

 

 

 

        chi_sq                         

        chisq                          

 

 

 

 

        hw_prob                        

        hwp                             

 

 

 

 

        -- -- --                       

        ind_cnt                        

Added column ind_cnt for individual count.

 

 

 

        source                         

        -- -- --                        

This field is removed because it is not as meaningful for an refSNP(rs) as the source in FreqSummaryBySsPop: some member ss may have individual genotype while other may only have genotype frequency.

 

 

 

    SubmittedIndividual                 

    SubmittedIndividual                

        -- -- --                       

        loc_ind_grp                    

loc_ind_grp will store the individual group name submitted by submitter.

 

 

 

    -- -- --                            

    UniVar_Valid_Allele                

This table is used internally.

 

 

Build 121 Schema Change

Date last updated: June 2, 2004

 

To keep dbSNP in manageable size, we have partitioned four large tables by ss# and moved the data with ss# less than 20 million to a supplemental database called dbSNP_sup.

The four large tables that have been partitioned are: SubSNPSeq3, SubSNPSeq5, SubSNPCommLine and SubSNPAcc.

 

For ss# between 1 and 10,000,000, it is stored in table <tablename>_p1, where <tablename> represents the above four tables.

For ss# between 10,000,001 and 20,000,000, it is stored in table <tablename>_p2

Currently, we do not have ss# over 30,000,000, when we do, we will create <tablename>_p3 for ss# between 20,000,001 and 30,000,000,

 

 For example:

SubSNPSeq3_p1 has 3’ sequence for all ss# between 1 and 10,000,000.

SubSNPSeq3_p2 has 3’ sequence for all ss# between 10,000,001 and 20,000,000.

 

SubSNPSeq3_p1 and SubSNPSeq3_p1 still have the same column definitions as the original SubSNPSeq3.

 

In dbSNP, for ss# over 20,000,000, we still store them in dbSNP, the main database.  We renamed these four tables by adding a suffix “_ins” to note that we will be add new data into these tables as new submissions come in.   The original names for these four tables are now partitioned views across dbSNP and dbSNP_sup having all ss#.  For more details on partitioned view,

 

This change will be reflected in ftp bcp files. Using SubSNPSeq3 as an example, instead of having a huge file for SubSNPSeq3 prior to this buid, we now have bcp file for SubSNPSeq_p1, SubSNPSeq_p2 and SubSNPSeq_ins.  Between builds, sequence for old snp seldom change (they only change when submitter updates them which is rare). We will have separate small update file for SubSNPSeq3_p1 and SubSNPSeq3_p2.  So for each new build, users only need to get the new bcp file from SubSNPSeq3_ins.

 

 

Build 120 Schema Change

Date last updated: April 24, 2004

Note:  ER diagram and Data dictionary for build 120 will be updated on soon.

 

 

Build 119

Build 120

Note

    agContigInfo                       

    agContigInfo                       

Please see change note in "Build 117 Schema Change - ContigInfo" in this document.

        -- -- --                       

        group_term                     

 

        -- -- --                       

        group_label                    

 

        -- -- --                       

        contig_label                   

 

 

 

 

    agSNPMapInfo                       

    agSNPMapInfo               

Please see change note in "Build 117 Schema Change-SNPMapInfo" in this document.

        -- -- --                        

        hap_cnt                        

 

        -- -- --                       

        unplaced_cnt                   

 

        -- -- --                       

        ref_cnt                        

 

        -- -- --                        

        assembly_cnt                   

 

 

 

 

    -- -- --                           

    dn_table_rowcount

Added this table to simply store the row count for each table. So users who create local dbSNP can compare the counts.

 

 

 

    -- -- --                           

    dn_validationSummary

An internal validation table.

 

 

 

    Individual                         

    Individual                  

SubmittedIndividual has tax_id due to annotation process. It is possible several submitted individuals are from the same sample. This curated sample data is in Individual table. So to normalize data,  tax_id should be in Individual table. Currently, we have tax_id in both SubmittedIndividual and Individual tables. Eventually, we will only have tax_id in Individual table.

        -- -- --                       

        tax_id                         

 

 

 

 

SNPHapMapPriority

    SNPHapMapPriority

 

        priority_code                  

        priority_code                  

 

Datatype:tinyint             

          Datatype:smallint            

HapMapClassCode has 9 codes, so priority_code needs to be a smallint.

 

Build 119 Schema Change

Date last updated: Jan. 30, 2004

Note:  ER diagram and Data dictionary for build 119 will be updated on soon.

Please see most recent data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html

 

Build 118

Build 119

Note

 

 

 

    -- -- --                           

    dgContigInfo                       

Added 'dg' prefixed tables for dog snp mapping data.

    -- -- --                           

    dgSNPContigLoc                     

 

    -- -- --                           

    dgSNPContigLocusId                 

 

    -- -- --                           

    dgSNPMapInfo                        

 

 

 

 

    -- -- --                           

    dn_chr_locusList                   

This table is a de-normalized table, storing list of locus_id for each contig chromosomes. This is used in individual genotype summary page.

 

 

 

    -- -- --                           

    dn_gty_rsCnt_byChr                 

This table is a de-normalized table, storing genotyped snp count for each individual on chromosomes.

 

 

 

    -- -- --                            

    dn_gty_rsCnt_byLocus               

This table is a de-normalized table, storing genotyped snp count for each individual on a locus.

 

 

 

    MethodClass                        

    MethodClass                         

 

 

 

 

        -- -- --                       

        validation_status              

Added validation_status. Value 0 means the method is computational. 1 - method can be considered validating when multiple ss  are in the same rs cluster.

 

 

 

 

 

 

    SNPContigLocusId                   

    SNPContigLocusId                   

Added mrna_ver and protein_ver. Contig_acc, mrna_acc and protein_acc fields are changed to varcahr(15) to be consistent with accession fields in other tables such as SNPContigLoc.  The following int field are changed to tinyint: contig_ver, fxn_class, reading_frame to save space.

 

 

 

    SubmittedIndividual                

    SubmittedIndividual                

 

        -- -- --                       

        loc_ind_id                     

Submitter individual id can be case sensitive. For ex. Mouse strain name use case in special ways. This field is needed to preserve case information.

 

Build 118 Schema Change

Date last updated: Nov. 14, 2003

Please see data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b118.html

 

Build 117

Build 118

Note

 

 

 

    -- -- --                           

    dgMapLink                          

Added mapping data for dog.

 

 

 

    -- -- --                           

    dgMapLinkPid                       

Added mapping data for dog.

 

 

 

    dn_batchCount                      

    dn_batchCount                      

 

        build_id                        

        -- -- --                       

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    dn_handleCount                     

    dn_handleCount                     

 

        build_id                       

        -- -- --                       

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    dn_IND_batchCount                  

    dn_IND_batchCount                  

 

        build_id                       

        -- -- --                       

We will keep snp counts in each batch for current batch only. So there is no need for this column.

 

 

 

    SubSNPGenotypeSum                  

    FreqSummaryBySsPop                 

This table is renamed to be consistent with the two tables used to compute this summary: GtyFreqBySsPop and GtyFreqBySsPop. The new name clearly shows that the data ins for each SubSNP within a specific population.

        source                         

        source                         

 

          Datatype:char(1)             

          Datatype:char(2)             

Use char(2): GF - source data is genotype frequency, IG - source data is individual genotype, AF - source data is allele frequency.

        non_founder_cnt                

        non_founder_ind_cnt            

Renamed so it is clear that the number in this column count of individuals, not chromosomes.

        -- -- --                       

        het                            

heterozygosity

        -- -- --                       

        het_se                         

standard error for the above heterozygosity.

 

 

 

    mmContigInfo                       

    mmContigInfo                       

The changes in this table is the same in ContigInfo. Please see change description for ContigInfo in build 117.

        contig_class                    

        -- -- --                       

 

        -- -- --                       

        group_term                     

 

        -- -- --                       

        group_label                    

 

        -- -- --                       

        contig_label                   

 

 

 

 

    mmSNPContigLocusId                 

    mmSNPContigLocusId                 

 

        -- -- --                       

        mrna_ver                        

Added this to keep track the version of the mrna accession, like the contig_ver.

        -- -- --                       

        protein_ver                    

Added this to keep track the version of a protein accession, like the contig_ver.

 

 

 

    mmSNPMapInfo                       

    mmSNPMapInfo                       

The changes in this table are the same in ContigInfo. Please see change description for ContigInfo in build 117.

        -- -- --                        

        ref_cnt                        

 

        -- -- --                       

        assembly_cnt                   

 

 

 

 

    -- -- --                           

    SNPAlleleFreq                      

This new table keeps the average allele frequency for refSNP.

 

 

 

    -- -- --                           

    SNPGtyFreq                         

This new table keeps the average genotype freqeuncy for refSNP.

 

 

 

    SubmittedIndividual                

    SubmittedIndividual                

 

        -- -- --                       

        loc_ind_alias                  

Added this column to track the different id for a submitted individual for genotype data.

 

 

 

    t_vwSNP_avgFreq                    

    -- -- --                           

This is replaced with SNPAlleleFreq.

 

 

 

    t_vwSNP_MinorAlleleCnt2More        

    -- -- --                           

This  table is no longer needed.

 

 

 

    t_vwSubPopGty_alleleFreq           

    -- -- --                           

This table is replaced with GtyFreqBySsPop.

 

 

 

 t_vwSubSNP_validationByMinorAlleleCnt 

    -- -- --                            

This table is no longer needed.

 

 

 

 

Build 117 Schema Change

 

Date last updated: Oct. 16, 2003

Note:  Data dictionary and ER diagram for build 117 will be available on Oct.17, 2003.

 

Build 116

Build 117

Note

 

 

 

    Allele                             

    Allele                             

Removed 3 columns in Allele table because an allele may have many motifs (ex. (GAT)10AAT(GAT)5). So keep allele motif relationship in enw table AlleleMotif.

        motif_id                       

        -- -- --                       

 

        repeat_cnt                      

        -- -- --                       

 

        rev_allele                     

        -- -- --                       

 

        -- -- --                       

        last_updated_time              

 

 

 

 

    -- -- --                           

 

Added this table to save computed allele frequency per subsnp and population. The frequency computed from individual genotype takes the highest precedence which then takes precedence over the allele frequency computed from genotype frequency. When none of the above data is available, we use the submitted allele frequency.

 

 

 

    -- -- --                           

    AlleleMotif                        

Added this table to keep the relationship between an allele and all its motif.

 

 

 

    ContigInfo           

ContigInfo

contig_class was used to track the haplotype data of humans and the strain data of mouse for each of the various contigs. But it became clear that different ncbi assembly tasks required different ways of grouping the alternatives so the single category contig_class was broken out into three fields. For dbSNP, the group_term gives the best breakdown of the various alternative forms, but contig_label and/or group_label give a better idea of the common name associated with haplotype, strain, alternate_assembly, etc.

        contig_class                   

        -- -- --                       

 

        -- -- --                        

        group_term                     

id used to group contigs together for annotation purposes. This allows
                annotaters to group contigs by assembly (see note below for PAR
                special case).
                Possible values:
                        ref_[strain/haplotype] (haplotype human, strain others)
                        alt_[strain/haplotype]
                        ref_PAR
                        alt_assembly
                the ref_ and alt_ prefixes assume that these are contigs
                of type NT/NW and the NCBI-assembly team will be loading these to ID.
                ref_PAR: The pseudoautosomal region requires special processing.
                The Y contigs of the PAR region will have this group_term to
                make sure the NCBI-assembly team don't penalize features in this
  region for going to two chromosomes. These contigs will be joined back into the
                reference assembly upon loading to MapViewer
                alt_assembly: this designation is used when the NCBI-assembly
  team have an extra assembly
                for a chromosome and the NCBI-assembly team don't want to load
  the annotation/contigs from this chromosome to ID.

        -- -- --                       

        group_label                    

Label used to label an assembly in the MapViewer.
                On a non-sequence based map, this term can serve to group
                objects into a common coordinate system.
                Example values:
                        reference
                        C57BL/6J
                        129_substrain

        -- -- --                       

        contig_label                   

Label used to label a particular contig in an assembly. For
                example, in mouse, the NCBI-assembly team have grouped all contigs that are of
                type 129/??? for annotation purposes. When the NCBI-assembly team display a
                particular contig, it would be useful to display the
                actual strain name (129/Sv, 129/Ola, etc).
                Example values:
                        C57BL/6J
                        129/SvJ
                        129/SvEvTac

 

 

 

    -- -- --                            

    GtyFreqBySsPop                  

Added this table to save computed genotype frequency per subsnp and population. The frequency computed from individual genotype takes the precedence over submitted genotype frequency.

 

 

 

    ObsVariation                       

    ObsVariation                       

 

        -- -- --                       

        var_flag                       

refers to VarFlagCode.code

 

 

 

    PopClassUniqCount                   

    -- -- --                           

 Not used.

 

 

 

    -- -- --                           

    rnContigExon                       

Added mapping information for rat, with prefix 'rn' added to mapping tables. This note also applies to the following 7 tables.

 

 

 

    -- -- --                           

    rnContigInfo                       

 

 

 

 

    -- -- --                           

    rnMapLink                          

 

 

 

 

    -- -- --                           

    rnMapLinkPid                       

 

 

 

 

    -- -- --                           

    rnMapList                          

 

 

 

 

    -- -- --                            

    rnSNPContigLoc                     

 

 

 

 

    -- -- --                           

    rnSNPContigLocusId                 

 

 

 

 

    -- -- --                           

    rnSNPMapInfo                        

 

 

 

 

    SnpFunctionCode                    

    SnpFunctionCode                    

 

        -- -- --                       

        is_coding                      

Added this column for internal query.

 

 

 

    SNPMapInfo                         

    SNPMapInfo                         

 

        -- -- --                       

        ref_cnt                        

set greater than zero if the snp maps onto the reference strand.

        -- -- --                       

        assembly_cnt                   

Counts the number of alternate assemblies (excluding reference) which the snp hits.

 

 

 

    sts_markers                        

    -- -- --                            

 

 

 

 

    SubPop                             

    SubPop                             

Change some float column to real to save storage space.

 

 

 

    -- -- --                           

    ValidGtyByVariation                 

Internal use currently.

 

 

 

    -- -- --                           

    VarFlagCode                        

Some submitted variation pattern is problematic. Define codes to track problems.

 

Build 116 Schema Change

 

Date: Aug 7, 2003

 

The following details all schema changes between build 116 and 115, sorted by table names.

Table names are in bold. Only columns are indented and only columns with changes are shown under the table. See data dictionary for full table details.

  -- -- --‘ means no corresponding objects ( table or column) for the build.

For example, in build 116, pop_cnt is added to table dn_batchCount.

 

Please see ER diagram for more details: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/erd_dbSNP.pdf

Please see data dictionary at: ftp://ftp.ncbi.nih.gov/snp/mssql/schema/DataDictionary_b116.html