Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 1993 Dec 15; 90(24): 11995–11999.

Number of CpG islands and genes in human and mouse.


Estimation of gene number in mammals is difficult due to the high proportion of noncoding DNA within the nucleus. In this study, we provide a direct measurement of the number of genes in human and mouse. We have taken advantage of the fact that many mammalian genes are associated with CpG islands whose distinctive properties allow their physical separation from bulk DNA. Our results suggest that there are approximately 45,000 CpG islands per haploid genome in humans and 37,000 in the mouse. Sequence comparison confirms that about 20% of the human CpG islands are absent from the homologous mouse genes. Analysis of a selection of genes suggests that both human and mouse are losing CpG islands over evolutionary time due to de novo methylation in the germ line followed by CpG loss through mutation. This process appears to be more rapid in rodents. Combining the number of CpG islands with the proportion of island-associated genes, we estimate that the total number of genes per haploid genome is approximately 80,000 in both organisms.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.2M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Daniels DL, Plunkett G, 3rd, Burland V, Blattner FR. Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science. 1992 Aug 7;257(5071):771–778. [PubMed]
  • Oliver SG, van der Aart QJ, Agostoni-Carbone ML, Aigle M, Alberghina L, Alexandraki D, Antoine G, Anwar R, Ballesta JP, Benit P, et al. The complete DNA sequence of yeast chromosome III. Nature. 1992 May 7;357(6373):38–46. [PubMed]
  • Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261–282. [PubMed]
  • Bickmore WA, Bird AP. Use of restriction endonucleases to detect and isolate genes from mammalian cells. Methods Enzymol. 1992;216:224–244. [PubMed]
  • Bird A, Taggart M, Frommer M, Miller OJ, Macleod D. A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell. 1985 Jan;40(1):91–99. [PubMed]
  • Cooke HJ, Smith BA. Variability at the telomeres of the human X/Y pseudoautosomal region. Cold Spring Harb Symp Quant Biol. 1986;51(Pt 1):213–219. [PubMed]
  • Nichols J, Evans EP, Smith AG. Establishment of germ-line-competent embryonic stem (ES) cells using differentiation inhibiting activity. Development. 1990 Dec;110(4):1341–1348. [PubMed]
  • Antequera F, Boyes J, Bird A. High levels of de novo methylation and altered chromatin structure at CpG islands in cell lines. Cell. 1990 Aug 10;62(3):503–514. [PubMed]
  • Shmookler Reis RJ, Goldstein S. Variability of DNA methylation patterns during serial passage of human diploid fibroblasts. Proc Natl Acad Sci U S A. 1982 Jul;79(13):3949–3953. [PMC free article] [PubMed]
  • Cooper DN, Taggart MH, Bird AP. Unmethylated domains in vertebrate DNA. Nucleic Acids Res. 1983 Feb 11;11(3):647–658. [PMC free article] [PubMed]
  • Long EO, Dawid IB. Repeated genes in eukaryotes. Annu Rev Biochem. 1980;49:727–764. [PubMed]
  • Larsen F, Gundersen G, Lopez R, Prydz H. CpG islands as gene markers in the human genome. Genomics. 1992 Aug;13(4):1095–1107. [PubMed]
  • Bird AP, Taggart MH. Variable patterns of total DNA and rDNA methylation in animals. Nucleic Acids Res. 1980 Apr 11;8(7):1485–1497. [PMC free article] [PubMed]
  • Bird AP, Taggart MH, Gehring CA. Methylated and unmethylated ribosomal RNA genes in the mouse. J Mol Biol. 1981 Oct 15;152(1):1–17. [PubMed]
  • Fischel-Ghodsian N, Nicholls RD, Higgs DR. Long range genome structure around the human alpha-globin complex analysed by PFGE. Nucleic Acids Res. 1987 Aug 11;15(15):6197–6207. [PMC free article] [PubMed]
  • Martin-Gallardo A, McCombie WR, Gocayne JD, FitzGerald MG, Wallace S, Lee BM, Lamerdin J, Trapp S, Kelley JM, Liu LI, et al. Automated DNA sequencing and analysis of 106 kilobases from human chromosome 19q13.3. Nat Genet. 1992 Apr;1(1):34–39. [PubMed]
  • McCombie WR, Martin-Gallardo A, Gocayne JD, FitzGerald M, Dubnick M, Kelley JM, Castilla L, Liu LI, Wallace S, Trapp S, et al. Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3. Nat Genet. 1992 Aug;1(5):348–353. [PubMed]
  • Lavia P, Macleod D, Bird A. Coincident start sites for divergent transcripts at a randomly selected CpG-rich island of mouse. EMBO J. 1987 Sep;6(9):2773–2779. [PMC free article] [PubMed]
  • Colombo P, Yon J, Garson K, Fried M. Conservation of the organization of five tightly clustered genes over 600 million years of divergent evolution. Proc Natl Acad Sci U S A. 1992 Jul 15;89(14):6358–6362. [PMC free article] [PubMed]
  • Cooper DN, Krawczak M. Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes. Hum Genet. 1989 Sep;83(2):181–188. [PubMed]
  • Sved J, Bird A. The expected equilibrium of the CpG dinucleotide in vertebrate genomes under a mutation model. Proc Natl Acad Sci U S A. 1990 Jun;87(12):4692–4696. [PMC free article] [PubMed]
  • Jones PA, Rideout WM, 3rd, Shen JC, Spruck CH, Tsai YC. Methylation, mutation and cancer. Bioessays. 1992 Jan;14(1):33–36. [PubMed]
  • Bird AP, Taggart MH, Nicholls RD, Higgs DR. Non-methylated CpG-rich islands at the human alpha-globin locus: implications for evolution of the alpha-globin pseudogene. EMBO J. 1987 Apr;6(4):999–1004. [PMC free article] [PubMed]
  • Milner CM, Campbell RD. Genes, genes and more genes in the human major histocompatibility complex. Bioessays. 1992 Aug;14(8):565–571. [PubMed]
  • Weber B, Collins C, Kowbel D, Riess O, Hayden MR. Identification of multiple CpG islands and associated conserved sequences in a candidate region for the Huntington disease gene. Genomics. 1991 Dec;11(4):1113–1124. [PubMed]
  • Carlock L, Wisniewski D, Lorincz M, Pandrangi A, Vo T. An estimate of the number of genes in the Huntington disease gene region and the identification of 13 transcripts in the 4p16.3 segment. Genomics. 1992 Aug;13(4):1108–1118. [PubMed]
  • Brooks-Wilson AR, Smailus DE, Goodfellow PJ. A cluster of CpG islands at D10S94, near the locus responsible for multiple endocrine neoplasia type 2A (MEN2A). Genomics. 1992 Jun;13(2):339–343. [PubMed]
  • Bonetta L, Kuehn SE, Huang A, Law DJ, Kalikin LM, Koi M, Reeve AE, Brownstein BH, Yeger H, Williams BR, et al. Wilms tumor locus on 11p13 defined by multiple CpG island-associated transcripts. Science. 1990 Nov 16;250(4983):994–997. [PubMed]
  • Bickmore WA, Sumner AT. Mammalian chromosome banding--an expression of genome organization. Trends Genet. 1989 May;5(5):144–148. [PubMed]
  • Holmquist GP. Evolution of chromosome bands: molecular ecology of noncoding DNA. J Mol Evol. 1989 Jun;28(6):469–486. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • BioAssay
    PubChem BioAssay links
  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • Compound
    PubChem Compound links
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...