Send to

Choose Destination
Gene. 2006 Feb 1;366(2):316-24. Epub 2005 Nov 28.

Sequence context analysis of 8.2 million single nucleotide polymorphisms in the human genome.

Author information

Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298, USA.


We analyzed n-mers (n=3-8) in the local environment of 8,249,446 human SNPs and compared their distribution with that in the genome reference sequences. The results revealed that the short sequences, which contained at least one CpG dinucleotide, occurred more frequently in the local SNP sequences than in the genome sequences. To exclude the hypermutability effect of the methylated CpG dinucleotides on the sequence context of SNPs, we examined the distribution patterns for each of the six categories of substitution. We observed the similar pattern (i.e., CpG-containing n-mers vs. non-CpG-containing n-mers) in SNP categories A/G, C/T and C/G but the opposite pattern in category A/T. We next identified 34,928 putative CpG islands in the human genome and located 133,591 SNPs within these islands. In the CpG islands, CpG SNPs were 3.92-fold less prevalent relative to the presence of CpG dinucleotides. Conversely, in the human genome, the frequency of CpG dinucleotides at the polymorphic sites was 6.09 times that in the genome reference sequences. These results support the previous views of mutational suppression at the CpG sites in the CpG islands and hypermutability of the methylated CpG dinucleotides that are prevalent in the non-CpG island sequences in the human genome. Our study represents a comprehensive investigation of the sequence context of SNPs in the human genome and in human CpG islands.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center