Logo of biolprocBioMed CentralBiomed Central Web Sitesearchsubmit a manuscriptregisterthis articleBiological Procedures OnlineJournal Front Page
Biol Proced Online. 2003; 5: 63–68.
Published online 2003 Mar 4. doi:  10.1251/bpo47
PMCID: PMC152575

Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison


Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA.

Keywords: DNA, satellite; evolution, molecular; phylogeny


Tandem arrayed repetitive DNA sequences, known as satellite DNA, commonly exist in the centromeric regions of vertebrate chromosomes. Satellite DNA has evolved through the changes in copy numbers and nucleotide sequences (1 for review). Although some centromeric satellite DNA is known to participate in the construction of functional centromeres (2-6), their nucleotide sequences are highly variable. Because of their higher sequence diversity among closely related species, satellite DNA sequences are often utilized for phylogenetic and taxonomic analyses (7-11). Garrido-Ramos et al. (10) determined the nucleotide sequences of centromeric satellite members from Sparidae fishes and showed that at least two monophyletic groups exist within the family. To accomplish this, they reconstructed the phylogeny of Sparidae by comparing the consensus satellite DNA sequences of the respective species. They took this approach because the genetic distances between repeat units in the same species were smaller than the distances between repeat units in different species. The mode of evolutionary alteration of satellite DNA sequences may vary among different species, however. Thus, in some instances the “consensus sequence” may not be most representative of member sequences. In addition, although the results of Garrido-Ramos et al. (10) suggested the evolutionary rate difference among the species, quantitative analysis on the evolutionary rate was yet unperformed.

The intraspecific sequence divergence in members of a satellite DNA family is likely to be affected by two factors: the evolutionary rate and the amount of time since the latest sequence homogenization event. Within the species, satellite DNA exhibits internal sequence variability depending on a ratio between the mutation and homogenization/fixation (12). In the present work, the interspecific phylogenetic distances and intraspecific sequence variation in Sparidae satellite DNA were re-examined to obtain more precise information about the mode of evolution of satellite DNA. We estimated the relative evolutionary rate of each species and evaluated the differences in the time after the latest event in concerted evolution.

Comparison and alignment of monomer satellite sequences

The nucleotide sequences of the satellite DNA in six Sparidae fishes were retrieved from the GenBank/EMBL/DDBJ International Databases. We have analyzed here a total of thirty-four satellite members of which nucleotide sequences were determined for cloned genomic DNA (not a PCR-amplified DNA). They were aligned by minimizing the SI(k) scores (see below) and are shown in Fig. Fig.11.

Fig. 1
Alignment of nucleotide sequences of Sparidae satellite DNA monomeric units. Dashes (-) indicate the sites of gaps. Nucleotide position 1 is located at the C residue of the HindIII restriction site. Sequence origins and accession numbers are given at ...

The measure SI(k) was described previously (11) and successfully used to align the nucleotide sequences of the gene coding for DNA topoisomerase (13). As noted by Garrido-Ramos et al. (10), the region around position 170 contains numerous gaps (insertions and deletions), and there is insufficient homology among the DNA sequences to align interspecific members. Kato (14) proposed a monomer register in satellite DNA, obtained by examining the subrepeat organization, and the gaps appear to exist at the junctions of the registered monomers (Fig. (Fig.22).

Fig. 2
Subrepeat alignment of Percoidei satellite monomeric units. A satellite DNA member from Diplodus annularis (Z48694) exemplifies the subrepeat organization. ...

Amplification of unit length monomers might have introduced species-specific differences into this region, probably via a process of recombination, and it seems reasonable to hypothesize that the amplification, combined with the changing satellite DNA sequences, causes speciation. In that context, the species-specific regions were excluded and the regions spanning positions 1 to 159 and 177 to 187 were used for the phylogenetic analysis described below (total of 170 positions).

Sequence variation within the species

Intraspecific sequence variations were evaluated using the measure SI(k) defined as follows;

An external file that holds a picture, illustration, etc.
Object name is katoeqn1.jpg

and SIGM, which is defined as follows;

An external file that holds a picture, illustration, etc.
Object name is katoeqn2.jpg

where n ik is the relative frequency of nucleotide i (i =A, C, G, or T) at position k of the aligned sequence, N is the number of entire positions (N=170 in the present work)and SIGM is the geometric mean of SI(k) for N positions. SIGM can be written as a function of time t (see Appendix),

SIGM =[1+3 exp(-8λt/3)]/4 [3]

where λ is the average rate of substitution per site per evolutionary time unit, and t is the time after sequence homogenization (concerted evolution). Table Table11 summarizes the SIGM scores for the respective species, and Fig. Fig.33 shows the distribution of mean SI(k) scores for six Sparidae fishes.

Table 1
Intraspecific variation of satellite DNA
Fig. 3
Distribution of SI(k) scores in satellite DNA. The geometric means of six Sparidae fishes are calculated for each position and plotted against the nucleotide sequence.

The variable sites are clustered at the edge of subregion E and within subregion F, but rarely occur in the middle of subregion E (see Fig. Fig.2).2). This may mean that subregion E conforms to a particular structural domain crucial to the functionality of satellite DNA. Warburton et al. (15) showed that the size of the recombination window within which sequence similarity is conserved is about 20 bp. Subregion E of the Sparidae satellite DNA may thus serve as a window for recombination with respect to sequence homogenization.

The average number of substitutions per site after sequence homogenization was estimated by calculating λt from the observed SIGM scores (Table (Table1).1). The λt score is a product of the evolutionary rate and the time after the sequence homogenization event. Moreover, evolutionary distance between two DNA sequences can be evaluated using the Jukes-Cantor’s distance (J-C d), which measures the size of 2λt (where t is the time after the divergence of two DNA sequences). The distance J-C d is expressed as the ratio of common nucleotides in two aligned DNA sequences (q), and q can be written as follows (16, 17); note that the right side of equation [5] has the same form as equation [3].

J-C d = 2λt = -3[ln{(-1+4q)/3}]/4 [4]

q =[1+3 exp(-8λt/3)]/4 [5]

The J-C d scores were calculated for every pair of member sequences using the program Dnadist included in PHYLIP ver. 3.5c (18); the average scores of the interspecific distances are listed in Table Table2,2, and the intraspecific averages of J-C d are listed in Table Table1.1. The estimates of λt obtained using the two procedures are in good agreement (Fig. (Fig.4),4), which suggests that nucleotide frequency calculation is an effective way to describe intraspecific divergence within a satellite DNA family.

Table 2
Average scores of interspecific distances (J-C d)
Fig. 4
Comparison of 2lt scores obtained using two different protocols. The 2λt scores calculated for six Sparidae fishes by pairwise sequence comparison (J-Cd, vertical axis) are plotted against those obtained by nucleotide frequency calculation (SIGM, ...

Interspecific and intraspecific relationships between Sparidae satellite DNA

The evolutionary distances between two populations (interspecific divergence) can be estimated by calculating nucleotide frequency differences as described (11). It will be underestimated, however, if there is any instance of sequence homogenization at the monomeric level of the satellite DNA in the lineage. Moreover, the magnitude of the error will depend on the length of time after the sequence homogenization occurred. In the case of primate alpha-satellite DNA, the sequence homogenization events occurred at the level of higher order repeats (HORs) and not at the monomeric level. Thus the nucleotide frequency calculation within respective HORs has been successfully used to define the distances between satellite arrays and to reconstruct the phylogenetic relationships of the HORs (11). On the other hand, because sequence homogenization events may have occurred at the monomeric level in Sparidae satellite DNA, the interspecific distances between satellite DNA members should be evaluated by pairwise sequence comparison. We have used two distance measures, J-C d (16) and Kimura’s distance (19), to evaluate the phylogenetic relationships of satellite. As mentioned by Garrido-Ramos et al. (10), members from the same species clustered together, indicating that the concerted evolution occurred after speciation. Figure Figure55 shows a phylogenetic tree of six Sparidae fishes reconstructed using the interspecific average of J-C d scores.

Fig. 5
Unrooted Fitch-Margoliash tree for six Sparidae fishes. The branch lengths and tree topology were computed using the program "Fitch" (18) according to the method of Fitch and Margoliash (22). Distance matrix of average J-Cd scores (Table (Table ...

The phylogenetic trees drawn from the two distance matrices (J-C d and the distance measure based on Kimura’s two parameters model) were identical (data not shown). We found that the order of branching within the Diplodus cluster differed from that described by Garrido-Ramos et al. (10): we observed the closest relative of Diplodus annularis to be D. puntazzo, not D. sargus. In their work, a neighbor-joining tree indicated the clustering of D. annularis and D. sargus with lower bootstrapping probability, and a UPGMA tree exhibited the same topology with higher bootstrapping probability. This situation may be caused by differences among the evolutionary rates within the genus Diplodus, and it is suggested that the evolutionary rate of D. puntazzo is higher than those of D. sargus and D. annularis. In addition, the satellite DNA of D. bellottii has apparently evolved much faster than the others (longest branch in Fig. Fig.55).

Evaluating evolutionary rate differences among Diplodus species

The length of the branch connecting the common ancestor of Diplodus to D. bellottii is much larger than the branches connecting the common ancestor to the other Diplodus species (Fig. (Fig.5).5). Taken together with the data in Table Table1,1, this finding indicates that the evolutionary change in the nucleotide sequence occurred more frequently in D. bellottii than in the others. Because the length of each branch in the phylogenetic tree represents a λt score and because the time after the bifurcation should be same in each case, the relative evolutionary rates of the different lineages can be estimated from the branch length. Assuming that the evolutionary rate of a common ancestor is the average of those of the descendants, the branch lengths of A to F in Fig. Fig.55 can be written as follows;

A=λ0 t 0 [6]

B=λ1 t 0 [7]

C=λ2 t 1 [8]

D=λ3 t 2 [9]

E=(λ01) t 3/2 [10]

F=(λ012)t 4/3 [11]

where λn denotes the evolutionary rate of each lineage, and it is assumed that

t 2=t 4+t 1=t 4+t 3+t 0. [12]

The branch lengths were calculated from the J-C d scores using the program Fitch included in PHYLIP ver. 3.5c (18); the relative values of λ and t are summarized in Table Table3.3. The relative evolutionary rates of satellite DNA were apparently higher in D. bellottii (branch D) and D. puntazzo (branch B) than in other Diplodus species.

Table 3
Relative evolutionary rate and time for each branch

Therefore, in order to assess the differences in the evolutionary rates, the distances between each species in the genus Diplodus and two outgroup species (Spondyliosoma cantharus and Lithognathus mormyrus) are compared. Average distances and the standard deviations are listed in Table Table44.

Table 4
Average distance between Diplodus and outgroup species

The data show that D. bellottii has a significantly higher evolutionary rate than the other Diplodus species (p<0.001, two sample t-test with Welch’s correction), and the distance between D. puntazzo and the outgroups tended to be larger than that between D. sarugus and the outgroups, but not significantly so (p<0.1, two sample t-test with Welch’s correction and Mann-Whitney test), as the differences in average scores were small.

Table Table55 shows the relative times after sequence homogenization, which were calculated for Diplodus fishes from the relative evolutionary rates (Table (Table3)3) and intraspecific variations (Table (Table11).

Table 5
Relative evolutionary time after sequence homogenization

The results suggest that sequence homogenization events occurred in D. puntazzo more recently than the other three Diplodus species. The frequency with which sequence homogenization occurs may vary with fish species, although the trigger is as yet unknown. Elder and Turner (20) showed that sequence homogenization events occur very frequently in pupfish, and the homogenized segments are rapidly fixed in the respective local populations. Charlesworth et al. (21) have theorized that copy number affects the evolutionary rate of a certain family of repetitive DNA. Thus, the different evolutionary rates in Diplodus fishes might reflect differences in satellite copy number.


In the present work, intraspecific similarity of satellite DNA was effectively evaluated by the nucleotide frequency calculation in the populations as well as calculation of distances that estimated the number of substitution per site between two sequences. Based on the analyses of fish satellite DNA as an example, different evolutionary rate and occurrence of sequence homogenization have been observed. The results obtained here suggested the different mode of evolution of satellite DNA in closely-related species.


Given n1, n2, n3, n4 as the relative frequency of four nucleotides (n1 + n2 + n3 + n4=1) at position k and at time t, the differential equations describing nucleotide frequency are written as follows;

dn1/dt= n1(1-λ)+(1-n1)λ/3 -n1=λ/3-4λn1/3 [13]

dn2/dt= n2(1-λ)+(1-n2)λ/3 -n2=λ/3-4λn2/3 [14]

dn3/dt= n3(1-λ)+(1-n3)λ/3 -n3=λ/3-4λn3/3 [15]

dn4/dt= n4(1-λ)+(1-n4)λ/3 -n4=λ/3-4λn4/3. [16]

At the time sequence homogenization occurred (t=0), n1 was 1, and n2, n3, and n4 were zero.

Thus, the solutions of the differential equations are as follows,

n1= [1+3 exp(-4λt/3)]/4 [17]

n2= n3= n4= [1- exp(-4λt/3)]/4. [18]

SI(k) can thus be written as SI(k)=(n1)2+(n2)2+(n3)2+(n4)2=[1+3 exp(-8λt/3)]/4. [19]


  • Ugarkovic D, Plohl M. Variation in satellite DNA profiles – cause and effects. EMBO J. 2002;21:5955–5959. doi: 10.1093/emboj/cdf612. [PMC free article] [PubMed] [Cross Ref]
  • Willard HF. Chromosome-specific organization of human alpha satellite DNA. Amer J Human Genet. 1985;37:524–532. [PMC free article] [PubMed]
  • Willard HF. Centromeres of mammalian chromosomes. Trends Genet. 1990;6:410–416. doi: 10.1016/0168-9525(90)90302-M. [PubMed] [Cross Ref]
  • Masumoto H, Masukata H, Muro Y, Nozaki N, Okazaki T. A human centromere antigen (CENP-B) interact with a short specific sequence in alphoid DNA, a human centromeric alphoid. J Cell Biol. 1989;109:1963–1973. [PMC free article] [PubMed]
  • Zinkowski RP, Meyne J, Brinkley BR. The centromere-kinetochore complex: a repeat subunit model. J Cell Biol. 1989;113:1091–1110. [PMC free article] [PubMed]
  • Ikeno M, Masumoto H, Okazaki T. Distribution of CENP-B boxes reflected in CREST centromere antigenic sites on long-range α-satellite DNA arrays of human chromosome 21. Human Mol Genet. 1994;3:1245–1257. [PubMed]
  • Laursen HB, Jørgensen AL, Jones C, Bak AL. Higher rate of evolution of X chromosome α-repeat DNA in human than in the great apes. EMBO J. 1992;11:2367–2372. [PMC free article] [PubMed]
  • Franck JPC, Kornfield I, Wright JM. The utility of SATA satellite DNA sequences for inferring phylogenetic relationships among the three major genera of Tilapiine Ciclid fishes. Mol Phylogenet Evol. 1994;3:10–16. [PubMed]
  • Garrido-Ramos MA, Jamilena M, Lozano R, Rejon CR, Rejon MR. The EcoRI centromeric satellite DNA of the Sparidae family (Pisces, Perciformes) contains a sequence motive common to other vertebrate centromeric satellite DNA. Cytogenet Cell Genet. 1995;71:345–351. [PubMed]
  • Garrido-Ramos MA, de la Herran R, Jamilena M, Lozano R, Rejon CR, Rejon MR. Evolution of centromeric satellite DNA and its use in phylogenetic studies of the Sparidae family (Pisces, Perciformes). Mol Phylogenet Evol. 1999;12:200–204. [PubMed]
  • Kato M, Kato A, Shimizu N. A method for evaluating phylogenetic relationship of α-satellite DNA suprachromosomal family by nucleotide frequency calculation. Mol Phylogenet Evol. 1999;13:329–335. [PubMed]
  • Dover GA. Molecular drive in multigene families: how biological novelties arise, spread and are assimilated. Trends Genet. 1986;2:159–165. doi: 10.1016/0168-9525(86)90211-8. [Cross Ref]
  • Kato M, Ozeki M, Kikuchi A, Kanbe T. Phylogenetic relationship and mode of evolution of yeast DNA topoisomerase II gene in the pathogenic Candida species. Gene. 2001;272:275–281. doi: 10.1016/S0378-1119(01)00526-1. [PubMed] [Cross Ref]
  • Kato M. Structural bistability of repetitive DNA elements featuring CA/TG dinucleotide steps and mode of evolution of satellite DNA. Eur J Biochem. 1999;265:204–209. doi: 10.1046/j.1432-1327.1999.00714.x. [PubMed] [Cross Ref]
  • Warburton PE, Waye JS, Willard HF. Nonrandom localization of recombination events in human alpha satellite repeat unit variants: Implications for higher order structural characteristics within centromeric heterochromatin. Mol Cell Biol. 1993;13:6520–6529. [PMC free article] [PubMed]
  • Jukes TH, Cantor CR. Evolution of protein molecules. In Mammalian Protein Metabolism (Munro, H. N. ed.), pp. 21-132. Academic Press, New York, 1969.
  • Nei M. Molecular Evolutionary Genetics. Columbia University Press, New York, 1987.
  • Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.5c, Distributed by the author, Department of Genetics, University of Washington, Seattle, 1993.
  • Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980;16:111–120. [PubMed]
  • Elder Jr JF, Turner BJ. Concerted evolution at the population level: pupfish HindIII satellite DNA sequences. Proc Natl Acad Sci USA. 1994;91:994–998. [PMC free article] [PubMed]
  • Charlesworth B, Sniegowski P, Stephan W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature. 1994;371:215–220. [PubMed]
  • Fitch WM, Margoliash E. Construction of phylogenetic trees. Science. 1967;155:279–284. [PubMed]
  • Page RDM. TreeView for Macintosh (PPC) version 1.5. Distributed by the author, Division of Environmental and Evolutionary Biology, IBLS, University of Glasgow, Glasgow, 1998.

Articles from Biological Procedures Online are provided here courtesy of BioMed Central
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...


  • Nucleotide
    Primary database (GenBank) nucleotide records reported in the current articles as well as Reference Sequences (RefSeqs) that include the articles as references.
  • PubMed
    PubMed citations for these articles
  • Taxonomy
    Taxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...