Computation of Average Heterozygosity and Standard Error for dbSNP RefSNP Clusters

John Spouge, Lon Phan, and Stephen Sherry
NCBI, 2000-2002

Average heterozygosity is computed for each refSNP cluster based on all the variation data submitted for ss# members. There are three types of variation data in dbSNP: direct measures of heterozygosity in a sample, "binned" allele frequency estimates that can only be resolved to a small number of classes, and point estimates based on moderate to large sample sizes. Estimates of heterozygosity are computed for each class and then summed with each term weighted by its standard error. This produces a linear estimate of h with minimum variance.