Logo of dnaresOxford JournalsDNA ResearchAbout this journalContact this journalSubscriptionsCurrent issueArchiveSearch
DNA Res. 2009 Jun; 16(3): 187–193.
Published online 2009 Apr 10. doi:  10.1093/dnares/dsp005
PMCID: PMC2695772

Development of Genome-wide Simple Sequence Repeat Markers Using Whole-genome Shotgun Sequences of Sorghum (Sorghum bicolor (L.) Moench)


Simple sequence repeat (SSR) markers with a high degree of polymorphism contribute to the molecular dissection of agriculturally important traits in sorghum (Sorghum bicolor (L.) Moench). We designed 5599 non-redundant SSR markers, including regions flanking the SSRs, in whole-genome shotgun sequences of sorghum line ATx623. (AT/TA)n repeats constituted 26.1% of all SSRs, followed by (AG/TC)n at 20.5%, (AC/TG)n at 13.7% and (CG/GC)n at 11.8%. The chromosomal locations of 5012 SSR markers were determined by comparing the locations identified by means of electronic PCR with the predicted positions of 34 008 gene loci. Most SSR markers had a similar distribution to the gene loci. Among 970 markers validated by fragment analysis, 67.8% (658 of 970) markers successfully provided PCR amplification in sorghum line BTx623, with a mean polymorphism rate of 45.1% (297 of 658) for all SSR loci in combinations of 11 sorghum lines and one sudangrass (Sorghum sudanense (Piper) Stapf) line. The product of 5012 and 0.678 suggests that ∼3400 SSR markers could be used to detect SSR polymorphisms and that more than 1500 (45.1% of 3400) markers could reveal SSR polymorphisms in combinations of Sorghum lines.

Key words: sorghum (Sorghum bicolor (L.) Moench), simple sequence repeat (SSR), fragment analysis, genome-wide

Sorghum (Sorghum bicolor (L.) Moench) is the world’s fifth most important cereal crop, after wheat, rice, maize and barley1 and was grown on 43 million hectares in 2004 (http://faostat.fao.org/). There is a great demand for sorghum for human food, livestock feed and green manure production. As in other crop species, it is necessary to improve traits with economic value, including yield potential, sugar content and disease resistance, to enhance sorghum’s potential as a crop. To dissect the morphological and physiological traits of sorghum at a genetic level, many types of molecular markers have been developed, including restriction-fragment-length polymorphisms (RFLPs),24 amplified-fragment-length polymorphisms5 and simple sequence repeats (SSRs).611 SSR markers are mostly codominant, are readily amplified by polymerase chain reaction (PCR) and are effective at detecting genotype variation caused by a high degree of polymorphism. However, only a few hundred sorghum SSR markers are publicly available for use in fine mapping and map-based cloning of genes of interest.

Among the five major cereal crops, the genome of sorghum is the second smallest (750 Mb) after that of rice (440 Mb) and is between one-third and one-quarter the size of the maize genome (2500 Mb).12 Sorghum is more closely related to maize and sugarcane than to rice, and it shared a common ancestor as recently as 18–25 million years ago with maize13 and 10 million years ago with sugarcane.14 Owing to its small genome size and evolutionary divergence, sorghum is a potentially important model plant among the panicoid grasses. In sorghum, whole-genome shotgun (WGS) sequencing by means of methylation filtration has tagged 96% of the sorghum genes,15 and recently the whole genome was sequenced and annotated.16 These sequence data were produced by the U.S. Department of Energy Joint Genome Institute (http://www.jgi.doe.g.ov/) in collaboration with the research community. The most recent versions of the Sbi1 and Sbi1_4 annotations use pseudo-chromosome alignment (JGI Phytozome Sorghum bicolor website: http://www.phytozome.net/sorghum). The availability of sorghum genomic sequences now provides an opportunity to extend the bank of SSR markers. The sorghum genome sequence project identified ∼71 000 SSRs in the genome;16 however, no information on the SSR motifs and primers is available.

We have developed a new set of SSR markers to facilitate the genetic and molecular dissection of sorghum genes that encode traits with economic value, including quantitative traits. This was achieved by using the following procedures: (i) design of primer pairs in regions flanking the SSR motifs from WGS sequences; (ii) mapping of the genomic positions of designed SSR primer pairs by means of electronic PCR (ePCR)17 and comparison of the positions of the new SSR markers with gene loci predicted using the Sbi1_4 annotations and (iii) experimental validation of representative SSR markers by means of fragment analysis among 11 sorghum lines and one sudangrass line.

The screening of 570 794 WGS sequences (accession numbers CL147592-CL197752, CC058553-CC059980, BZ329127-BZ342789, BZ342901-BZ352342, BZ365856-BZ368372, BZ369686-BZ370012, BZ421595-BZ424357, BZ625682-BZ629992, BZ779555-BZ781928, CW020594-CW502582 and CW512190-CW514008) from the ATx623 line of sorghum for di-, tri-, tetra- and pentanucleotide repeats provided 11 684 sequences suitable for the design of SSR markers using the SSRIT software.18 To minimize duplication, the 11 684 sequences were screened for redundancy by comparisons among the primer sequences and by means of BLASTN analysis against genomic sequences. Of the 570 794 sequences, 0.98% (5599) contained unique SSR markers that were non-redundant and non-overlapping in sorghum chromosomes. These sequences are referred to as SSR markers and are provided in Supplementary Table S1. Primer sets for SSR markers were automatically designed by using the Perl script srchssr2.pl19 to control the Primer3 core program.20 All parameters of srchssr2.pl were set at the default values.

An analysis of the association between SSR motifs and the rate of polymorphism is important for the development of effective SSR markers. Table 1 summarizes the relative frequency of the 10 most common SSR motifs among the set of 5599 SSR markers. The most common dinucleotide motifs were AT/TA (26.1%), AG/TC (20.5%), AC/TG (13.7%) and CG/GC (11.8%). McCouch et al.21 reported that AT-rich motifs in rice have a larger number of repeats and longer repeat tracks than other dinucleotide motifs and were associated with high rates of polymorphism. In our study, the most repetitive SSR motif with a perfect repeat was (TAT)65 in the AT-rich SSR marker SB4593. Moreover, the mean repeat number for the (AT/TA)n motif with three or more repeats was 6.74, which is approximately twice the mean repeat number of 3.31 calculated for the (CG/GC)n motif. We therefore concluded that sorghum SSR markers with an AT-rich motif would show higher polymorphism than SSR markers with other dinucleotide motifs.

Table 1
Distributions of the 10 most common SSR motifs in the data set of 5599 newly developed SSR markers (which are described in Supplementary Table S1)

WGS sequence analysis by means of methylation filtration15 is likely to lead to the design of SSR markers from hypomethylated or low-copy regions of the sorghum genome. Bedell et al.15 revealed a 200% increase in the number of SSRs in WGS sequences identified by means of methylation filtration compared with the number identified without filtration and an increase in the overall proportion of GC-rich trinucleotide sequences. Similarly, we found that the most frequent trinucleotide motif was GC-rich (CGC/GCG)n and that the rate of appearance of this motif was 3.9%. GC-rich and trinucleotide motifs have also been reported as highly frequent motifs among the documented SSRs found in expressed sequence tags of wheat,22 barley,23 sugarcane,24 perennial ryegrass25 and maize.26 GC-rich and trinucleotide-based microsatellites are most likely to be derived from the coding region of the genome.

Map positions for 89.5% (5012 of 5599) of the markers were provided by means of ePCR. Failures to obtain map positions for some 10.5% of the markers may have resulted from the design of the primers on the basis of the WGS sequences of the male-sterile line ATx623, which would not have been mapped when the Sbi1 data derived from the male sterility-maintenance line BTx623 were used for ePCR analysis. To identify target genes and analyze quantitative trait loci in detail, it is necessary to prioritize the development of markers in gene-rich regions. The distribution of our SSR markers appears to be preferentially located at positions with 34 008 gene loci (Fig. 1), and it can be attributed to WGS sequences covering 96% of the sorghum genes over 65% of the genome length.15 The low number of designed SSR markers near the centromeres of the chromosomes is consistent with the low annotation rate for genes at these locations (Fig. 1). It is possible that these loci correspond to methylated or multi-copied genes.

Figure 1
Distribution of SSR markers within the sorghum chromosomes. Bars at each chromosome represent the positions of the 34 008 gene loci,16 SSR markers developed in this study, polymorphic SSR markers without an AT/TA motif detected by means of fragment analysis ...

The ePCR mapping of 317 SSR markers established in previous studies69,11 identified the chromosomal locations of 260 of the markers, of which 21 had 50 bp or more of their sequences in common (Supplementary Table S2). This result shows that 239 of the previous SSR markers have different positions within the sorghum genome. A comparison of the distribution of the 5012 markers identified here with the distribution of the 239 non-redundant markers revealed that only 61 of the 5012 markers had 50 bp or more of their sequences in common within the 10 sorghum chromosomes (Supplementary Table S3). Furthermore, Fig. 1 shows that the previously reported SSR markers covered only a small proportion of sorghum’s chromosomes. In particular, there was only one marker (Xtxp6) on the short arm of chromosome 6. Together, these results show that most of the SSR markers in the set designed here are unique and cover the regions that currently lack a sufficient number of SSR markers in each sorghum chromosome.

We mapped a set of 970 randomly chosen markers that had fewer AT/TA repeats to possible chromosome locations by means of ePCR and tested the results by means of fragment analysis. Forward and reverse primers were redesigned to provide accurate genotyping with post-PCR fluorescent labeling. AC, AG or AT was added to the 5′ end of the forward primers, and GTTT was added to the 5′ end of the reverse primers. PCR amplification was then performed and the PCR product was directly labeled with fluorescence-labeled R110-ddUTP by the single-tube method.27 The labeled PCR products were then diluted and mixed with Hi-Di formamide containing a size standard. The heat-denatured products were then analyzed with an ABI Prism 3700 Genetic Analyzer, and the resulting allele data were analyzed with GeneMapper v3.7 software.

We screened for SSR polymorphisms among a total of 12 lines: four male sterility-maintenance sorghum lines (BTx623, BTx624, MS79B, Nakei MS-3B), seven inbred lines (74LH3213, Challwaxy Sorghum, JN43, SIL-05, bmr-6, Sennkinnshiro, Takakibi) and one sudangrass line (Greenleaf). To do so, we used the following criteria to define the fragment size that would be used for genotyping. (i) If successful amplification was observed for BTx623 and the size difference was ≤5 bp between the fragment size closest to the expected PCR size in BTx623 and the expected fragment size in BTx623, then the corresponding SSR marker was used for the subsequent analysis. (ii) An SSR polymorphism was detected on the basis of a difference of ≥2 bp between the corresponding fragment sizes of each combination in the 12 lines. The mean frequency of total SSR polymorphisms between any two of the 12 lines (Table 2) was nearly identical to that used for calculating both the gene diversity index (D)28 and the degree of differentiation (δT).29

Table 2
Mean rates (%) of SSR polymorphisms detected by means of fragment analysis in 11 sorghum lines and one sudangrass line (Greenleaf)

Of the 970 markers, 822 (84.7%) were identified as useful for the detection of polymorphisms; those excluded were susceptible to the experimental error peculiar to fragment analysis. These results are consistent with previous reports of the successful amplification of 90%8 and 85%7 of target SSRs. Moreover, 67.8% (658 of 970) of the SSR markers used for successful amplification in BTx623 revealed a PCR product size that corresponded to the expected product size (Supplementary Table S4). The genotypes of the 12 lines were determined by means of PCR amplification using the 658 primer sets. The mean value of the SSR polymorphisms was 45.1% between any two lines in all 12 lines, including sudangrass (Table 2). The combinations between pairs of sorghum lines (i.e. excluding sudangrass) showed a lower polymorphism rate (43.0%, Table 2). However, the mean difference was only 2.1%, which is not high. In previous studies, mean values of polymorphism rate for SSR loci were reported as 0.546 and 0.698, but these values were detected by using gel electrophoresis, which has lower sensitivity than fragment analysis. Moreover, in a previous report of fragment analysis,9 the estimated polymorphism rate (46.0%) was similar to our mean value of SSR polymorphisms (45.1%). This suggests that the SSR polymorphism rate might be affected more by differences in the materials used in the experiments than by the sensitivity of the analytical methods.

In general, the diversity of SSRs developed from cDNA-derived RFLP probe sequences is lower than the genetic variation detected by using SSR loci isolated from SSR-enriched genomic libraries. Schloss et al.9 also suggested that the low variation in SSR loci developed from RFLP probes was associated with a small proportion of dinucleotide motifs and differences in the length of the repeat units. Because of problems related to stutter and noise, we did not attempt to validate SSR markers with the (AT/TA)n motif by using fragment analysis. However, if a set of SSR marker motifs contains markers with an (AT/TA)n motif and an alternative highly sensitive method could be used for polymorphism detection, we would expect to see the same or a higher level of diversity than obtained here, since the AT-rich motif has a greater number of repeat units than most other motifs (Table 1).

We did not observe a clear correlation between the distributions of the number of SSR motifs on each chromosome. The distribution of the 558 SSR markers for which we detected polymorphisms in the combination of the 12 lines appears in Fig. 1. These markers were not remarkably localized on each chromosome compared with the distribution of 239 previously developed markers. These results suggest that whether or not newly developed SSR markers include an (AT/TA)n motif, these markers will be helpful in the construction of a linkage map for molecular dissection of sorghum’s agronomic traits.

Validation data from our fragment analyses allowed estimation of the polymorphism frequency within the 5012 loci. When the rate of successful amplification with BTx623 was 67.8% and the mean value of the SSR polymorphisms was 45.1%, ∼3400 (67.8% of 5012) were identified as suitable markers for the detection of SSR polymorphisms among Sorghum spp. lines and ∼1500 (45.1% of 3400) markers could potentially indicate SSR polymorphisms in the combinations of Sorghum spp. lines. From this result and an estimated total genomic sequence length of 750 Mb, we estimated that the density of SSR markers developed here for use in polymorphism detection would be approximately one per 220 kb. However, because the gene-rich region and the newly developed markers tend to be located at positions away from the centromere, the distance between the SSR markers is likely to be <220 kb in the gene-rich regions. For this reason, the SSR markers developed here have the potential to be utilized not only as DNA markers for use in breeding but also as a means of map-based cloning. The use of public genomic sequences may provide additional markers in regions with fewer SSR markers.


This work was supported by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Genomics for Agricultural Innovation, SOR 0006).


Edited by Satoshi Tabata


1. Doggett H. 2nd Ed. New York: John Wiley and Sons, Inc.; 1988. Sorghum.
2. Pereira M. G., Lee M., Bramel-Cox P., Woodman W., Doebley J., Whitkus R. Construction of an RFLP map in sorghum and comparative mapping in maize. Genome. 1994;37:236–243. [PubMed]
3. Nagamura Y., Tanaka T., Nozawa H., Kaidai H., Kasuga S., Sasaki T. Syntenic regions between rice and sorghum genomes. In: Nakagawa H., Kobayashi M., editors. Proceedings of the International Workshop; Utilization of Transgenic Plant and Genome Analysis in Forage Crops. Tochigi: National Grassland Research Institute; 1998. pp. 97–103.
4. Bowers J. E., Abbey C., Anderson S., et al. A high-density genetic recombination map of sequence-tagged sites for sorghum, as a framework for comparative structural and evolutionary genomics of tropical grains and grasses. Genetics. 2003;165:367–386. [PMC free article] [PubMed]
5. Menz M. A., Klein R. R., Mullet J. E., Obert J. A., Unruh N. C., Klein P. E. A high-density genetic map of Sorghum bicolor (L.) Moench based on 2926 AFLP, RFLP and SSR markers. Plant Mol. Biol. 2002;48:483–499. [PubMed]
6. Brown S. M., Hopkins M. S., Mitchell S. E., et al. Multiple methods for the identification of polymorphic simple sequence repeats (SSRs) in sorghum [Sorghum bicolor (L.) Moench] Theor. Appl. Genet. 1996;93:190–198. [PubMed]
7. Bhattramakki D., Dong J., Chhabra A. K., Hart G. E. An integrated SSR and RFLP linkage map of Sorghum bicolor (L.) Moench. Genome. 2000;43:988–1002. [PubMed]
8. Kong L., Dong J., Hart G. E. Characteristics, linkage-map positions, and allelic differentiation of Sorghum bicolor (L.) Moench DNA simple-sequence repeats (SSRs) Theor. Appl. Genet. 2000;101:438–448.
9. Schloss J., Mitchell E., White M., et al. Characterization of RFLP probe sequences for gene discovery and SSR development in Sorghum bicolor (L.) Moench. Theor. Appl. Genet. 2002;105:912–920. [PubMed]
10. Tao Y. Z., Jordan D. R., McIntyre C. L., Henzell R. G. Construction of a genetic map in a sorghum recombinant inbred line using probes from different sources and its comparison with other sorghum maps. Aust. J. Agric. Res. 1998;49:729–736.
11. Taramino G., Tarchini R., Ferrario S., Lee M., Pé M. E. Characterization and mapping of simple sequence repeats (SSRs) in Sorghum bicolor. Theor. Appl. Genet. 1997;95:66–72.
12. Arumuganathan K., Earle E. D. Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 1991;9:208–218.
13. Doebley J., Durbin M., Golenberg E. M., Clegg M. T., Ma D. P. Evolutionary analysis of the large subunit of carboxylase (rbcL) nucleotide sequence among the grasses (Gramineae) Evolution. 1990;44:1097–1108.
14. Sobral B. W. S., Braga D. P. V., LaHood E. S., Keim P. Phylogenetic analysis of chloroplast restriction enzyme site mutations in the Saccharinae Griseb. subtribe of the Andropogoneae Dumort. tribe. Theor. Appl. Genet. 1994;87:843–853. [PubMed]
15. Bedell J. A., Budiman M. A., Nunberg A., et al. Sorghum genome sequencing by methylation filtration. PLoS Biol. 2005;3:e13. [PMC free article] [PubMed]
16. Paterson A. H., Bowers J. E., Bruggmann R., et al. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457:551–556. [PubMed]
17. Schuler G. D. Sequence mapping by electronic PCR. Genome Res. 1997;7:541–550. [PMC free article] [PubMed]
18. Temnykh S., DeClerck G., Lukashova A., Lipovich L., Cartinhour S., McCouch S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. 2001;11:1441–1452. [PMC free article] [PubMed]
19. Fukuoka H., Nunome T., Minamiyama Y., Kono I., Namiki N., Kojima A. Read2Marker: a data processing tool for microsatellite marker development from a large data set. Biotechniques. 2005;39:472, 474, 476. [PubMed]
20. Rozen S., Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. In: Misener S., Krawetz S. A., editors. Methods in Molecular Biology. Totowa: Humana Press Inc; 2000. pp. 365–386. [PubMed]
21. McCouch S. R., Teytelman L., Xu Y., et al. Development and mapping of 2240 new SSR markers for rice (Oryza sativa L.) DNA Res. 2002;9:199–207. [PubMed]
22. Yu J. K., Dake T. M., Singh S., et al. Development and mapping of EST-derived simple sequence repeat markers for hexaploid wheat. Genome. 2004;47:805–818. [PubMed]
23. Thiel T., Michalek W., Varshney R., Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) Theor. Appl. Genet. 2003;106:411–422. [PubMed]
24. Cordeiro G. M., Casu R., McIntyre C. L., Manners J. M., Henry R. J. Microsatellite markers from sugarcane (Saccharum spp.) ESTs cross transferable to erianthus and sorghum. Plant Sci. 2001;160:1115–1123. [PubMed]
25. Asp T., Frei U. K., Didion T., Nielsen K. K., Lubberstedt T. Frequency, type, and distribution of EST-SSRs from three genotypes of Lolium perenne, and their conservation across orthologous sequences of Festuca arundinacea, Brachypodium distachyon, and Oryza sativa. BMC Plant Biol. 2007;7:36. [PMC free article] [PubMed]
26. Kantety R. V., La Rota M., Matthews D. E., Sorrells M. E. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. Plant Mol. Biol. 2002;48:501–510. [PubMed]
27. Inazuka M., Tahira T., Hayashi K. One-tube post-PCR fluorescent labeling of DNA fragments. Genome Res. 1996;6:551–557. [PubMed]
28. Nei M. Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA. 1973;70:3321–3323. [PMC free article] [PubMed]
29. Gregorius H. R. The relationship between the concepts of genetic diversity and differentiation. Theor. Appl. Genet. 1987;74:397–401. [PubMed]

Articles from DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes are provided here courtesy of Oxford University Press
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Compound
    PubChem Compound links
  • GSS
    Published GSS sequences
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...