• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Nov 2000; 10(11): 1690–1696.
PMCID: PMC310957

Single Nucleotide Polymorphisms in Wild Isolates of Caenorhabditis elegans


Caenorhabditis elegans (isolate N2 from Bristol, UK) is the first animal of which the complete genome sequence was available. We sampled genomic DNA of natural isolates of C. elegans from four different locations (Australia, Germany, California, and Wisconsin) and found single nucleotide polymorphisms (SNPs) by comparing with the Bristol strain. SNPs are under-represented in coding regions, and many were found to be third base silent codon mutations. We tested 19 additional natural isolates for the presence and distribution of SNPs originally found in one of the four strains. Most SNPs are present in isolates from around the globe and thus are older than the latest contact between these strains. An exception is formed by an isolate from an island (Hawaii) that contains many unique SNPs, absent in the tested isolates from the rest of the world. It has been noticed previously that conserved genes (as defined by homology to genes in Saccharomyces cerevisiae) cluster in the chromosome centers. We found that the SNP frequency outside these regions is 4.5 times higher, supporting the notion of a higher rate of evolution of genes on the chromosome arms.

Caenorhabditis elegans is the first animal of which the genome was sequenced (The C. elegans Sequencing Consortium 1998). Recently, the genome sequence of Drosophila has also become available (Adams et al. 2000). C. elegans is a sexually-reproducing animal, but the egg-laying animals are actually hermaphrodites: They produce some sperm that they can use to self-fertilize. Self-fertilization quickly results in inbred lines. Although the generation time of C. elegans is ~3–4 days, it is likely that in the wild the average time of clonal expansion without male–female mating is much longer. The strain Bristol N2, of which the genome sequence was determined, was isolated from mushroom compost in Bristol, UK, before 1956 (Nicholas et al. 1959; Fatt and Dougherty 1963) and frozen by John Sulston in 1969 (Brenner 1974). This animal occurs worldwide; isolates have been found on all continents except Antarctica (Hodgkin and Doniach 1997). Based on restriction fragment length polymorphisms (RFLPs) associated with Tc1 transposons, at least 20 races were defined. Previous research has indicated that spontaneous mutation rates in C. elegans are low (Anderson 1995), except for transposon insertions in strains that show germ-line transposition. Most strains have been stored frozen since their isolation from nature (Hodgkin and Doniach 1997). For this reason we consider it likely that the single nucleotide polymorphism (SNP) pattern we observe in the strains is identical to that of the original isolate. In this paper we sampled the genome of different natural isolates of C. elegans for SNPs. We investigated the nature of the polymorphisms and determined how they are distributed over the chromosomes and whether we could see differences between coding and noncoding regions. We also investigated how SNPS are distributed over natural isolates from over the globe, and we used this to infer relationships among them. We found that SNP patterns can be shared between strains and that SNP levels are elevated on the chromosome arms.


SNP Frequencies

We performed shotgun sequence analysis of 970 random clones from several natural isolates (AB1, CB4857, RC301, and TR403), which resulted in ~730 kb of sequence information (Table (Table1),1), and we searched for SNPs by comparing with the Bristol N2 sequence. In total we found 366 SNPs. SNPs were defined as small substitutions, deletions, or insertions, mostly of 1–3 nucleotides (Jakubowski and Kornfeld 1999). TR403 has the lowest frequency of SNPs (on average 1 in 8750 bp; Table Table1)1) and CB4857 the highest (1 in 1445 bp). Table Table22 shows the source of all natural isolates used in this study.

Table 1
Sequence Statistics
Table 2
Strains Analyzed in This Study

The majority (90%) are point mutations of one nucleotide, but we also encounter small deletions or insertions, and substitutions of 2 bp (Fig. (Fig.1A).1A). The data set does not permit a determination of which sequence was the ancestral and which the derived sequence. Therefore, the Bristol N2 sequence was taken as ancestral. We found transitions to be over-represented: Sixty-one percent of all single base pair substitutions are transitions. This is also the case for SNPs recently described for human (Hacia et al. 1999) and Drosophila (Petrov and Hartl 1999). We analyzed the distribution of SNPs over coding versus noncoding DNA. For the classification we used the Genefinder predictions (unpublished software developed by P. Green and L. Hillier). We find 15% of the SNPs in exons, 85% of the SNPs in non-exon DNA. Twenty-seven percent of the genome is exonic (The C. elegans Sequencing Consortium 1998); so there is a twofold under-representation of SNPs in exons. Of the SNPs in coding regions, 53% do not change the coded amino acid; there is a clear bias for SNPs in third base positions (Fig. (Fig.1B).1B). As expected, selective removal of deleterious mutations has played an important part in the generation of the SNP pattern as we observe it today (Stenico et al. 1994; Shabalina and Kondrashov 1999).

Figure 1
Nature of the SNPs. (A) The number of base pairs involved in substitutions, insertions, and deletions is indicated. Note that we find on average one SNP per 2000 bp. After checking the Bristol N2 sequence for 63 potential SNPs, we found 3 to be a sequence ...

SNPs Are Shared

We initially found each SNP in one natural isolate. To check whether they also occurred in other natural isolates, the study was broadened by inclusion of a set of six additional natural isolates. We analyzed all SNPs that change the recognition site of a restriction enzyme, and in addition we sequenced all remaining SNPs on autosomes I and III. We found that most SNPs are not unique to the isolate they were detected in and probably preceded the latest contact between the strains; of 109 SNPs that were tested in other strains even within this small set of natural isolates, only 25 were found to be unique (Fig. (Fig.2A).2A). Isolates from the same geographical region are not necessarily similar. For example, the two Australian strains AB1 and AB4 are not at all identical (one has its chromosome II pattern largely in common with Bristol N2, the other with CB4857 and KR314); nevertheless the X chromosomes are almost identical for the SNPs tested.

Figure 2
Comparison of SNPs among different natural isolates. (A) The presence of SNPs originally identified in one strain was tested in nine other strains and Bristol N2 by RFLP analysis or sequencing. SNP-containing clones are shown vertically along the chromosomes ...

To further investigate the diversity of strain variants at one geographical location, we also sampled a collection of strains that had all been isolated in California. Some were isolated at several time points from the same vegetable garden or flower bed (Hodgkin and Doniach 1997). These strains were tested for all SNPs that can be visualized by RFLP. We found strains to have essentially three different SNP patterns (Fig. (Fig.2B).2B). Some were largely similar to the English N2 strain, others to CB4857 from Claremont, but an intermediate type was also observed. Some phylogenies have been proposed for natural isolates of C. elegans, based on phenotypical traits (Dion and Brun 1971; Egilmez et al. 1995; Abdul Kader and Cote 1996; Hodgkin and Doniach 1997) as well as genome typing (Egilmez et al. 1995; Hodgkin and Doniach 1997). Figure Figure22 indicates that one cannot speak of lineages in the strictest sense, because the similarity between strains is different for separate regions of the genome. No clear correlation between DNA variation and the geographical origin was seen. Analysis of genetic diversity within and between Arabidopsis thaliana ecotypes resulted in similar findings (Innan et al. 1997; Breyne et al. 1999).

We investigated at a microlevel how SNPs were distributed between the strains; we sequenced, now in a directed fashion, the environment of two regions that seemed highly polymorphic based on the number of SNPs found in shotgun clones (H04J21 for CB4857; W08E3 for AB1). Within these regions we found the level of polymorphism with Bristol N2 to be 1 in 200 bp for CB4857 (117 SNPs in 23 kb) and 1 in 170 bp for AB1 (86 SNPs in 14.5 kb), compared with 1 in 1800 bp for the whole SNP set, indicating the presence of highly polymorphic regions. The presence of SNPs in the other natural isolates was tested by sequencing 1-kb regions and also stretches 5 kb and 50 kb on each side of these regions. We find that the SNPs in the natural isolates do occur in tracts (Fig. (Fig.3).3). Presumably C. elegans lineages have been reproductively isolated for prolonged times, accumulated many mutations, and then came into contact again with other populations, resulting in polymorphic tracts. Many SNPs of the strain from Hawaii (CB4856) were absent in all other tested isolates. This is also true for a large set of SNPs from many different regions of the CB4856 genome (R. Waterston, pers. comm.), which were tested for their presence in the other strains: SNPs (63 of 90) of CB4856 were absent in the other nine strains (S. Wicks et al., in prep). This suggests that this strain has been reproductively isolated and diverged significantly. From the strains studied in this paper, this is the only one from an isolated island.

Figure 3
Detailed comparison of SNPs among isolates in SNP-rich regions near clone W08E3 on chromosome I (A) and clone H04J21 on chromosome III (B). Regions of ~1 kb flanking the highly polymorphic clones were analyzed by directed sequencing at 1, 5, and ...

Elevated SNP Levels Suggest Higher Evolutionary Rates

It was found previously that the rate of meiotic recombination is more than five times higher on the chromosome arms than in the central clusters (Barnes et al. 1995). The genome sequence revealed that genes with similarity to the yeast genome were more frequent on the autosome centers than on the arms (Fig. (Fig.4A),4A), whereas inverted and tandem repeats clustered mainly on the arms (The C. elegans Sequencing Consortium 1998; Surzycki and Belknap 2000). The authors suggested that possibly there was a higher evolutionary rate on the chromosome arms. Can we still find indications for a differential evolution rate in these segments of the genome? Figure Figure4A4A shows the distribution of the SNPs we found on chromosome I: Polymorphism levels are elevated on the arms. A systematic analysis of the SNP density on all the autosomes revealed that the SNPs were not distributed randomly (χ2 = 40.28, P < 0.001) but were elevated on the autosome arms. We found SNPs to be 4.5 times more abundant on the arms (L and R; Fig. Fig.4B)4B) of the autosomes than on the central region (C; Fig. Fig.4B),4B), whereas the shotgun clones were distributed uniformly (Fig. (Fig.4C).4C). These data support the notion of more rapid evolvement of DNA on the autosome arms in the current C. elegans species.

Figure 4
SNP distribution on the chromosomes. (A) DNA repeats and SNPs on chromosome I are found mainly on the chromosome arms, whereas similarity with yeast genes is highest on the autosome centers (figure adapted from The C. elegans Sequencing Consortium 1998 ...


We have characterized DNA polymorphisms in four C. elegans strains that were isolated from diverse geographical locations. We checked SNPs in the strain in which they were found and also checked the original Bristol N2 sequence by analyzing PCR products derived from N2 DNA. Any mutation that occurred during the 10 or more years of lab culturing of Bristol N2 would show up as a SNP that was unique for Bristol N2. We found none of those, suggesting that spontaneous mutations in Bristol N2 are extremely rare and that, by analogy, the SNPs we detect in natural isolates existed before the strains were isolated from nature.

Most polymorphisms do not alter the coding amino acid; they are found in introns or on the third base of an exon, supporting the idea of gene conservation. Furthermore, higher levels of polymorphism are found on the autosome arms than on the centers. In combination with the finding that more conserved sequences are found mainly in the chromosome centers, this suggests a higher tolerance of polymorphisms on the arms. It remains to be explained which mechanisms are responsible for the intriguing differences between the arms and the central clusters.

Analysis of the 24 isolates suggests that SNPs are shared between different strains. Most SNPs have probably occurred by mutagenic events before the last contact between the different continents. At one geographical location, worm types can be found that are similar to worms from several other locations (i.e., Californian strains). This could be explained by the idea of long range dispersal, spreading of worms as dauer larvae in soil adhering to birds or other animals (Hodgkin and Doniach 1997). The SNP patterns described here are in agreement with the classification of worm races made by Hodgkin and Doniach (1997). For example, based on their Tc1 pattern, plug formation, and clumping phenotype, these authors suggested that CB4858 was similar to AB4. AB4 and CB4858 show almost the same SNP pattern; only one SNP on chromosome V is not present in CB4858.

The SNPs found in this study can also be used efficiently as a tool for gene mapping. SNP markers can easily be detected by sequencing or restriction digestion, and they do not interfere with subtle phenotypes. A simple cross can be used to determine linkage to a chromosome (Jakubowski and Kornfeld 1999; S. Wicks, in prep.).

Analysis of SNP patterns can be used to further characterize the natural history of the worm against the background of a sequenced genome. The analysis is probably facilitated by the hermaphrodite lifestyle of C. elegans, which results in inbreeding and, thus, extended conservation of haplotypes. With the exception of the Hawaiian CB4856 strain, continent-specific genotypes were not recognized, and not many SNPs were unique and unshared between isolates from different regions of the world. SNPs were searched in a limited set of strains. It cannot be excluded that analysis of other natural isolates might reveal a clear subspecies that diverged significantly from all other known isolates. The analysis of SNP patterns within a global species of which the genome sequence is known such as C. elegans can provide a new perspective on many aspects of population biology and evolution.


Strains and Genomic DNA Isolation

Natural isolates of C. elegans were obtained from the Caenorhabditis elegans Genetics Center (University of Minnesota, St. Paul). Initially, polymorphisms to Bristol N2 were searched in AB1 from Australia, CB4857 from California, RC301 from Germany, and TR403 from Wisconsin. Later, SNPs were also verified in other natural isolates (Table (Table2).2). The origin of these strains is extensively described in a paper by Hodgkin and Doniach (1997). Genomic DNA of C. elegans strains N2, CB4857, AB1, RC301, and TR403 was isolated as described by Sulston and Hodgkin (1988).

Cloning of Genomic DNA and SNP Detection

Genomic DNA (~20 μg ) was partially digested with 10 units of Sau3A1 (Roche Molecular Biochemicals) in a 50-μL reaction containing 1× SuRECut buffer A for 5 min at room temperature and loaded on a 1% 1× TAE–agarose gel. After electrophoresis, fragments between 1000 bp and 1500 bp were purified from the gel by freezing the excized bands in liquid nitrogen in separate tubes and centrifuging 10 min at maximum speed. Supernatants were extracted twice with phenol–chloroform, and DNA was precipitated with 0.1 volume of NaAc (pH 5.2) and 2.5 volumes of ethanol. After centrifugation, the pellet was redissolved in 50 μL H20. To create an overhang at the 3′ end for more efficient cloning in a pGEM-T vector (Promega), DNA was incubated with 5 units of Taq polymerase (GIBCO BRL) for 20 min at 72°C in 1× PCR buffer with 0.2 mm dNTPS. Fragments were subsequently ligated into the vector and transformed into DH5α cells. Transformants were grown on ampicillin selective plates and used for sequencing with SP6 and T7 primers on an ABI 377 sequencer. Sequence traces were aligned to the Bristol N2 sequence using the C. elegans BLAST server. All sequence differences with N2 were confirmed by visually analyzing the raw data to exclude mistakes in base calling. Clones with similarity to repetitive sequences were discarded. For confirmation, oligonucleotides were designed to amplify the genomic region containing the SNP. After amplification of 1 μL of genomic DNA (20 ng/μL) and 20 μL of PCR mix [4.4 pmoles of each oligonucleotide, 0.5 unit ofTaq polymerase (GIBCO BRL), 2 μm of each dNTP in 50 mm KCl, 20 mm Tris-HCl (pH 8.3), 1.5 mm MgCl2] with 35 cycles (1 min at 95°C, 1 min at 52°C, and 1 min at 72°C), the presence of a SNP was confirmed. This was done either by sequencing of the PCR product or by digestion of the product with a restriction enzyme. SNPs will be submitted to the SNP database of the Sanger Centre.


We thank Amanda McMurray and Jane Rogers of the Sanger Centre and Roelof Pruntel for help in sequencing, Dr. R.H. Waterston for sharing unpublished data, Dr. J. Hodgkin and the Caenorhabditis Genetics Center for strains, and Dr. Stephen Wicks for critical reading of the manuscript.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.


E-MAIL ln.wank.boin@kretsalp; FAX 31302516554.

Article published online before print: Genome Res., 10.1101/gr.147100.

Article and publication are at www.genome.org/cgi/doi/10.1101/gr.147100.


  • Abdul Kader N, Cote MG. Isolation, identification and characterization of some strains of Caenorhabditis elegans (Maupas, 1900) from Quebec. Fund App Nemat. 1996;19:381–389.
  • Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. [PubMed]
  • Anderson P. Mutagenesis. In: Epstein HF, Shakes DC, editors. Methods in cell biology. Caenorhabditis elegans: Modern biological analysis of an organism. San Diego, CA: Academic Press; 1995. pp. 31–48.
  • Barnes TM, Kohara Y, Coulson A, Hekimi S. Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans. Genetics. 1995;141:159–179. [PMC free article] [PubMed]
  • Brenner S. The genetics of Caenorhabditis elegans. Genetics. 1974;77:71–94. [PMC free article] [PubMed]
  • Breyne P, Rombaut D, Van Gysel A, Van Montagu M, Gerats T. AFLP analysis of genetic diversity within and between Arabidopsis thaliana ecotypes. Mol & Gen Genet. 1999;261:627–634. [PubMed]
  • The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans : A platform for investigating biology. Science. 1998;282:2012–2018. [PubMed]
  • Dion M, Brun JL. Genetic mapping of the free-living nematode Caenorhabditis elegans Maupas 1900, var. Bergerac. I. Study of two dwarf mutants. Mol & Gen Genet. 1971;112:133–151. [PubMed]
  • Egilmez NK, Ebert RH, II, Shmookler Reis RJ. Strain evolution in Caenorhabditis elegans: Transposable elements as markers of interstrain evolutionary history. J Mol Evol. 1995;40:372–381. [PubMed]
  • Fatt HV, Dougherty EC. Genetic control of differential heat tolerance in two strains of the nematode Caenorhabditis elegans. Science. 1963;141:266–267. [PubMed]
  • Hacia JG, Fan JB, Ryder O, Jin L, Edgemon K, Ghandour G, Mayer RA, Sun B, Hsie L, Robbins CM, et al. Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays. Nat Genet. 1999;22:164–167. [PubMed]
  • Hodgkin J, Doniach T. Natural variation and copulatory plug formation in Caenorhabditis elegans. Genetics. 1997;146:149–164. [PMC free article] [PubMed]
  • Innan H, Terauchi R, Miyashita NT. Microsatellite polymorphism in natural populations of the wild plant Arabidopsis thaliana. Genetics. 1997;146:1441–1452. [PMC free article] [PubMed]
  • Jakubowski J, Kornfeld K. A local, high-density, single-nucleotide polymorphism map used to clone Caenorhabditis elegans cdf-1. Genetics. 1999;153:743–752. [PMC free article] [PubMed]
  • Nicholas WL, Dougherty EC, Hansen EL. Axenic cultivation of C. briggsae (Nematoda: Rhabditidae) with chemically undefined supplements; comparative studies with related nematodes. Ann NY Acad Sci. 1959;77:218–236.
  • Petrov DA, Hartl DL. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc Natl Acad Sci. 1999;96:1475–1479. [PMC free article] [PubMed]
  • Shabalina SA, Kondrashov AS. Pattern of selective constraint in C. elegans and C. briggsae genomes. Genet Res. 1999;74:23–30. [PubMed]
  • Stenico M, Lloyd AT, Sharp PM. Codon usage in Caenorhabditis elegans: Delineation of translational selection and mutational biases. Nucleic Acids Res. 1994;22:2437–2446. [PMC free article] [PubMed]
  • Sulston J, Hodgkin J. Methods. In: Wood WB, editor. The nematode Caenorhabditis elegans. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory; 1988. pp. 587–606.
  • Surzycki SA, Belknap WR. Repetitive-DNA elements are similarly distributed on Caenorhabditis elegans autosomes. Proc Natl Acad Sci. 2000;97:245–249. [PMC free article] [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...