Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Nov 2001; 11(11): 1958–1967.
PMCID: PMC311132

The Complete Sequence of the Zebrafish (Danio rerio) Mitochondrial Genome and Evolutionary Patterns in Vertebrate Mitochondrial DNA


We describe the complete sequence of the 16,596-nucleotide mitochondrial genome of the zebrafish (Danio rerio); contained are 13 protein genes, 22 tRNAs, 2 rRNAs, and a noncoding control region. Codon usage in protein genes is generally biased toward the available tRNA species but also reflects strand-specific nucleotide frequencies. For 19 of the 20 amino acids, the most frequently used codon ends in either A or C, with A preferred over C for fourfold degenerate codons (the lone exception was AUG: methionine). We show that rates of sequence evolution vary nearly as much within vertebrate classes as between them, yet nucleotide and amino acid composition show directional evolutionary trends, including marked differences between mammals and all other taxa. Birds showed similar compositional characteristics to the other nonmammalian taxa, indicating that the evolutionary trend in mammals is not solely due to metabolic rate and thermoregulatory factors. Complete mitochondrial genomes provide a large character base for phylogenetic analysis and may provide for robust estimates of phylogeny. Phylogenetic analysis of zebrafish and 35 other taxa based on all protein-coding genes produced trees largely, but not completely, consistent with conventional views of vertebrate evolution. It appears that even with such a large number of nucleotide characters (11,592), limited taxon sampling can lead to problems associated with extensive evolution on long phyletic branches.

Mitochondria provide the primary source of cellular ATP in eukaryotes via the process of oxidative phosphorylation (Saraste 1999). In animals, extranuclear mitochondrial genomes are typically circular, and with few exceptions, code for 13 subunits of the oxidative phosphorylation machinery as well as genes for two rRNA subunits and 22 tRNAs (Boore 1999). Mutations in mitochondrial DNA (mtDNA) have a number of known deleterious effects. At least 50 base substitutions and hundreds of insertion/deletion mutations have been identified in human mtDNA (MITOMAP 2000), with effects ranging from degenerative diseases (Wallace 1999) to aging (Michikawa et al. 1999) to cancer (Polyak et al. 1998; Fliss et al. 2000). In addition to their role as the powerhouse of the cell, mitochondria are also involved in regulating programed cell death (apoptosis) (Green and Reed 1998; Susin et al. 1999), and mutagenic reactive oxygen species are generated in the process of energy production (Croteau and Bohr 1997). Pathologies can result directly from the loss of ATP production in affected tissues, the build-up of oxygen radicals due to downstream blockage of the oxidative phosphorylation pathway, or unregulated apoptosis (for review, see Wallace 1999).

Hundreds of mitochondria and thousands of mtDNAs are inherited maternally through the cytoplasm of the oocyte (Lightowlers et al. 1997). If a zygote receives more than one form of mtDNA (heteroplasmy), different forms can be randomly distributed to daughter cells during cell division and, over many cell generations, can drift to high or low frequencies in various cell lineages (Hauswirth and Laipis 1982). Thus, if one of the mutant forms is deleterious, disease may affect lineages where it reaches sufficiently high frequency (Wallace 1999). Somatic mutations in mtDNA appear to behave similarly and may be a significant source of mitochondrial disease. Given the central role of mitochondria in cell physiology, mutations (either inherited or somatic) are probably responsible for many developmental abnormalities. Although a high frequency of mutant mtDNA molecules is likely to be lethal during embryogenesis, oocytes with moderate to low levels of heteroplasmy occur at detectable levels (Lightowlers et al. 1997). Mitochondrial mutations probably affect a number of both general and tissue-specific developmental processes; however, the role of mitochondria in early development has not been well characterized.

Structurally, most animal mitochondrial genomes contain the same 37 genes, and among vertebrates the gene order is highly conserved (Brown 1985; Boore 1999). Vertebrate mitochondrial genomes are typically ~16 kb and are extremely compact with no introns and few, if any, intergenic spacers. The only significant noncoding sequence is the control region, which is involved in regulating transcription and replication (Clayton 1982; Shadel and Clayton 1997) and is usually <5% of the total genome size. Since its endosymbiotic origin around 1.5 billion yr ago, a substantial fraction of original mitochondrial genes have moved to the nucleus (Gray et al. 1999). The products of many of these genes remain essential for oxidative phosphorylation or housekeeping functions and are selectively transported into the mitochondria after translation in the cytoplasm (Schatz 1996). A result of this evolutionary trend is that some mitochondrial abnormalities are due to mutations in genes that now reside in the nucleus and are inherited in a Mendelian fashion, rather than through the maternal, haploid inheritance of mtDNA (Wallace 1999). Knowledge of gene location (and thus mode of inheritance) is therefore essential for accurate characterization of the developmental–genetic basis of mitochondrial abnormalities.

Mitochondrial genomes from ~100 species have now been sequenced (Boore 1999; Curole and Kocher 1999). Their small size and relative autonomy from the nucleus makes mitochondrial genomes valuable windows on the process of genome evolution and with respect to cytonuclear interactions (Babcock and Asmussen 1996; Gray et al. 1999). Mitochondrial sequences have also proven to be of great utility in molecular phylogenetic studies, and complete genome sequences have provided valuable insights into deeper-level phylogenetic problems (e.g., Zardoya and Meyer 1997; Boore and Brown 1998; Mindell et al. 1998; Naylor and Brown 1998; Rasmussen and Arnason 1999).

The zebrafish, Danio rerio, (Actinopterygii, Cyprinidae) has become a prominent vertebrate genetic and developmental model system (Detrich et al. 1998). Here we describe the zebrafish mitochondrial genome with respect to gene content and organization, codon usage, nucleotide composition, and putative functional motifs. We also use a phylogeny based on the mitochondrial genomes of 36 vertebrate species to characterize the zebrafish relative to other vertebrates. We test evolutionary hypotheses pertaining to nucleotide and amino acid composition, genome-wide patterns of variability, and evolutionary rates among major vertebrate lineages. The wild-type zebrafish mitochondrial genome and its relation to other vertebrates may provide an important baseline for studies of development, disease, and mitochondrial function. It will also complement the sequence of the zebrafish nuclear genome that may be completed in the near future.


The total length of the zebrafish mitochondrial genome was 16,596 bp, within 100 bp of the length of other teleost fishes. It is particularly close to two other cyprinid fishes, only 18 bp longer than the goldfish (Carassius auratus) and 21 bp longer than the carp (Cyprinus carpio). The zebrafish gene order and content (Table (Table1)1) is identical to the common vertebrate form. The genome contains 13 protein-coding genes: seven subunits of the NADH ubiquinone oxidoreductase complex (ND), one subunit of the ubiquinol cytochrome c oxidoreductase complex (Cytb), three subunits of the cytochrome c oxidase complex (CO), and two subunits of ATP synthase (ATP). Also contained are small (12S) subunit and large (16S) subunit ribosomal RNA genes and 22 tRNA genes. A noncoding control region contains the origin of heavy strand replication, and the origin of light strand replication is found within a complex of five tRNA genes. Sequence data provided no evidence of length or nucleotide site heteroplasmy in the individual zebrafish studied.

Table 1
Organization of the Zebrafish Mitochondrial Genome

Protein-Coding Genes

Salient features of zebrafish mitochondrial genes are listed in Table Table1.1. All but one of the protein-coding genes begin with the orthodox ATG start codon; the lone exception is COI, which begins with GTG. Stop codons include seven TAA and three TAG. The COII, ND4, and Cytb genes do not possess proper stop codons but do show a terminal T or TA. This condition is not uncommon among vertebrate mitochondrial genes, and it appears that TAA stop codons are created via posttranscriptional polyadenylation (Ojala et al. 1981). Reading frames of two pairs of genes, ATP8–ATP6 and ND4L–ND4, each overlap by seven nucleotides, and ND5–ND6 overlap by four nucleotides. This is also a common vertebrate feature, although in mammals ATP8 and ATP6 overlap by 40–46 bp. A number of other genes share one or two nucleotides in common with adjacent tRNA genes (Table (Table11).

Nucleotide composition (Table (Table2)2) reflects a vertebrate bias against G on the light strand. (For all protein genes except ND6, the heavy strand serves as the template for transcription; however, we discuss gene sequences in terms of the light strand, which is the sense strand with respect to mRNA. Only for ND6 is the heavy strand the sense strand). The bias against G is particularly marked at third codon positions, where only 7% of sites are G. A complementary bias is maintained in ND6, in which C is found at only 12% of all sites and only 3% of third positions.

Table 2
Nucleotide Frequencies and Proportions in Zebrafish Protein-coding Genes

Codon usage patterns in zebrafish are shown in Table Table3.3. Three other vertebrates (another fish, a frog, and human) are shown for comparison. For amino acids with fourfold degenerate third positions, codons ending in A are always the most frequent in zebrafish, followed in frequency by codons ending in T or C. Among twofold degenerate codons, C appears to be used somewhat more than T. Consistent with the overall bias against G, G is the least common third position nucleotide in all categories except for arginine and glycine codons (where G is similar in frequency to C and T but still much less than A). These patterns are generally similar across vertebrate groups, although the frog, Xenopus, shows a tendency to use T more frequently than C (Roe et al. 1985).

Table 3
Comparison of Codon Usage (Number of Codons) among Zebrafish and Selected Vertebrates

Because there are only 22 tRNAs in the mitochondria, there is only one specific tRNA species for most amino acids. The exceptions are leucine and serine, which have two tRNAs serving six possible codons each. Thus, of the possible codons for any amino acid, only one (occasionally two) will be perfectly complementary to the available tRNA anticodon, and translation of the others must involve nonspecific (wobble) base pairing. For each amino acid, only codons ending in A or C will be perfectly matched by a complementary tRNA anticodon (Table (Table3).3). The matched nucleotide is A for fourfold codons and twofold purine codons, and C for twofold pyrimidine codons (except methionine, AUG, which also serves as a start codon). Thus in zebrafish, the most frequently used codons are those with matching tRNAs except for isoleucine, methionine, and phenylalanine. This indicates that there may be some advantage to matched codons and anticodons in protein translation, although, as seen in Xenopus, the phenomenon is not universal.

Codon bias may be associated with strand-specific nucleotide bias in mtDNA. It is noteworthy that, for ND6, 16 of the 20 amino acids are preferentially specified by codons that do not match the available tRNA. This is almost exclusively due to the high number of codons ending in T. Strand-specific substitution bias favoring A over T on the light strand and thus T over A on the complementary heavy strand appears to be the explanation. This bias is particularly evident at third codon positions, where substitutions are less likely to alter encoded amino acids and nucleotides are freer to vary. For all protein genes on the light strand (= sense strand), the frequency of A at third positions is 44% and the frequency of T is 27%. Conversely, for ND6 (sense strand = heavy strand) the frequency of A at third positions is 24% and the frequency of T is 52%. It therefore appears that nucleotide bias is a strand-specific phenomenon rather than being strictly a result of codon preference for translation efficiency. The preference for A may also be related to transcription efficiency, as ATP is generally the most common ribonucleotide available in mitochondria (Xia 1996). The substantial bias against G may also be due in part to selection against less stable G nucleotides on the light strand, which is exposed as a single strand for a considerable length of time during the asymmetrical replication of mtDNA (Clayton 1982).

Transfer and Ribosomal RNA Genes

All zebrafish mitochondrial tRNA genes possess anticodons that match the vertebrate mitochondrial genetic code. Each tRNA sequence may be folded into a cloverleaf structure with 7 bp in the aminoacyl stem, 5 bp in the TψC and anticodon stems, and 4 bp in the DHU stem. tRNA stem regions include numerous noncomplementary and T–G base pairings, several of which are shared with carp. Such mutations appear to accumulate in mitochondrial genes, in part because mtDNA is not subject to the process of recombination, which may facilitate elimination of deleterious mutations (Lynch 1997). As in other vertebrates, it appears that CCA nucleotides are added posttranscriptionally to the 3′ ends to form mature, functional species (Roe et al. 1985).

Both ribosomal gene sequences may be folded into secondary structures. Potential secondary structures have free energies of −220.3 kcal/mole for the 12S subunit and −335.6 kcal/mole for the 16S subunit. Stem regions appear to be conserved, whereas loop regions are somewhat more variable relative to other vertebrate sequences. The functional requirement for specific base pairing appears to constrain the evolution of stems relative to some portions of loops. This pattern is consistent with phylogenetic studies of a number of vertebrate groups (e.g., Sullivan et al. 1995).

Noncoding Sequences

The major noncoding region (control region) in mtDNA regulates replication and transcription (Clayton 1982,1991; Shadel and Clayton 1997). The primary sequence of much of the control region does not appear to be particularly important for regulatory function, as this region shows extensive variability across taxonomic groups and even among closely related species. The 950-bp zebrafish control region was much less similar to other fishes than were the coding sequences, with numerous nucleotide substitutions and insertions and deletions. However, several important regulatory elements are present. Conserved sequence blocks (CSBs) 1–3, found in the 3′ end of the control region, appear to be involved in positioning RNA polymerase both for transcription and for priming replication (Clayton 1991; Shadel and Clayton 1997). All three of these elements are identifiable in zebrafish and they show strong similarity to CSBs identified by authors of other vertebrate sequences (Roe et al. 1985; Foran et al. 1988) (Table (Table4).4). Another relatively conserved element is the termination associated sequence (TAS) located at the 5′ end of the control region. This sequence appears to act as a signal for termination of D-loop strand (7S DNA) synthesis. A sequence present in zebrafish (tacataaaatgcat) shows limited similarity with TAS sequences identified in other vertebrates. Regulatory elements in the zebrafish also show strong similarity to a North American cyprinid fish, the spotfin shiner (Cyprinella spiloptera; Broughton and Dowling 1994). Between tRNAasn and tRNAcys of zebrafish is a 32-bp noncoding sequence similar to the origin of light strand replication (OL) in other vertebrates. Secondary structures at OL appear to act as initiation signals for light strand replication (Wong and Clayton 1985). The zebrafish sequence may potentially form such a secondary structure consisting of a perfect 11-bp stem and a 14-bp loop. All of these features reflect strong conservation of vertebrate mitochondrial regulatory elements.

Table 4
Sequences of Conserved mtDNA Regulatory Elements

Mitochondrial Genome Evolution


Phylogenetic analysis was used to estimate relationships among the zebrafish and 35 other vertebrates and to assess historical information content of mitochondrial genomes. The majority of complete mitochondrial sequences in GenBank are from mammals; some of these were not included to provide more balanced sampling of vertebrate groups. Maximum parsimony, maximum likelihood, and distance-based phylogenetic methods each have strengths and weaknesses under different evolutionary conditions (Swofford et al. 1996); thus, substantial differences in trees estimated by different methods may indicate violation of assumptions specific to each. Maximum likelihood using the general time reversible model (Lanave et al. 1984) yielded the tree that best matched the generally accepted phylogeny of vertebrates (Fig. (Fig.1).1). The position of zebrafish in all trees was as expected: sister to the other members of the family Cyprinidae (carp and goldfish) within actinopterygian fishes.

Figure 1
Maximum likelihood estimate of vertebrate phylogeny (−ln likelihood = 239855.997). Likelihood analysis used the GTR + I + Γ model with parameter values estimated from the maximum parsimony ...

Relationships of the coelacanth and reptiles did not match the expected phylogeny in trees produced by any method in which sea lamprey was used as the outgroup, and their positions varied considerably with different methods. The coelacanth was sister to bony fishes rather than the expected sister-group to tetrapods for all methods. The amphibians were basal to all other ingroup taxa (including fishes) in the parsimony analysis and they were the sister-group to fishes in the distance tree using LogDet distances (Steel 1994). However, when lamprey was excluded and no outgroup was specified, the coelacanth and amphibians assumed their expected positions for most methods (data not shown).

Samples for each of the problematic groups were small and the included taxa are at the ends of long phyletic branches. Extreme branch-length variation is known to cause problems for all phylogenetic methods because multiple changes per nucleotide site resulting from extensive evolution on long branches can obscure historical signal and lead to spurious similarity between long-branch taxa. Maximum likelihood appears to be least susceptible to this condition (Hillis et al. 1994; Huelsenbeck 1995). Alternatively, LogDet distances are relatively insensitive to nucleotide frequency heterogeneity and may yield trees different from methods that do not account for nucleotide composition when this attribute varies substantially among taxa. We note that in tests of nucleotide composition homogeneity (not shown), nearly all comparisons of taxa showed significant differences, probably due (in part) to the greater statistical power afforded by large sample sizes (i.e., many nucleotides per sequence). However, because taxa with similar nucleotide frequencies were not always grouped together, the present results appear to be primarily due to problems associated with long branches. We suggest that relationships among basal tetrapods may be better resolved when long branches are broken up as additional mitochondrial genome sequences become available. This issue is particularly relevant to the outgroup. The lamprey was on the longest branch on the tree and the extensive amount of evolution between it and the ingroup taxa has likely obscured much historical signal, resulting in misleading character-state polarities. The more accurate result obtained when lamprey was excluded supports this hypothesis. The position of the turtle (rather than the lizard) as sister to birds in all analyses conflicts with most hypotheses of reptilian evolution, although a similar result has been observed previously (Zardoya and Meyer 1998) and may also be due to sparse taxon sampling and long branches. However, the turtle sequence appears to be unusual with respect to nucleotide and amino acid composition (see following). Within mammals, where taxon sampling is denser, relationships are largely as expected (for all methods). The nontraditional arrangement of the nonplacental mammals as a monophyletic group and the sister-group relationship of perissodactyls with the carnivore plus pinnipeds are consistent with recent evidence (Penny et al. 1999; Waddell et al. 1999). Ultimately, the fact that complete protein-coding sequences were not sufficient to correctly resolve all relationships emphasizes the importance of extensive taxon sampling even when the amount of data per taxon is high.

Evolutionary Genomic Patterns

Among vertebrate lineages there is substantial variation in thermal regulation, metabolic rate, and generation time. Despite extreme conservation of mitochondrial gene content and order, variation in these factors could influence genome-wide evolutionary rates, nucleotide composition, and/or functional constraints on proteins. Therefore, we might predict differences in genomic features that are independent of selection acting on individual genes, such as a correlation between rates of sequence divergence and body size or thermal habit (endotherms vs. ectotherms), two factors correlated with metabolic rate (Martin and Palumbi 1993).

Rates of Evolution across the Mitochondrial Genome

Nucleotide and amino acid substitution rates vary across mitochondrial genomes and within individual genes. Figure Figure22 shows a strong correlation between regional nucleotide variation and amino acid variation for all protein-coding genes. The magnitude of variability at third codon positions is consistently higher than at first and second positions, but the spatial pattern of variability is consistent over all positions. Rates of amino acid change appear to be lowest for the COI and Cytb genes and somewhat higher for others. All of the ND genes as well as ATPase6 show a clear pattern of alternating regions of high and low variability. For Cytb it has been shown that the most variable regions frequently correspond to hydrophobic membrane-spanning domains of the protein (Irwin et al. 1991; Naylor et al. 1995). Across the genome there is not perfect correspondence between variability and hydropathy (Engelman et al. 1986); however, peaks of variability appear to coincide with extremes of hydropathy, either positive or negative. This indicates that in many polar or nonpolar regions the specific identity of amino acid residues may be less important, relative to function, than simply maintaining the relative polarity of the domain.

Figure 2
Variability of nucleotides, amino acids, and hydropathy across the mitochondrial genome. The sequence is of the concatenated protein-coding genes and the bar at top indicates the limits of each gene. The sequence for ND6 is the reverse complement to maintain ...

Rates of Evolution among Lineages

There are two primary methods for evaluating molecular evolutionary rates. Absolute rates may be obtained by relating the amount of time that two taxa have been evolving independently (based on fossil or stratigraphic divergence dates) to the amount of change that has accumulated on particular lineages. Alternatively, relative rates may be evaluated by comparing the amount of change in two lineages since they diverged from a common ancestor, and do not require dates to be known. The latter method involves counting the number of unique nucleotide differences on each lineage relative to a third (the outgroup). The test statistic is asymptotically chi-square distributed and can be used to determine whether the number of changes on two lineages is significantly different (Tajima 1993). In the present case with 36 species, the number of possible combinations of two species plus an outgroup is prohibitively large; we therefore report selected comparisons with the phylogenetically closest outgroups (Table (Table5).5). Absolute rates are more difficult to quantify because few divergence dates are known with confidence, and branch lengths (the amount of divergence) on the phylogenetic tree seem disproportionately short for older taxa. Hence, we report only relative rate comparisons.

Table 5
Relative Rate Tests among Pairs of Vertebrate Lineages

Relative rate tests indicate that evolutionary rates vary considerably within all vertebrate groups. Nearly all comparisons among taxonomic classes showed significant differences; however, many intraclass comparisons were significantly different as well (e.g., zebrafish–cod, chicken–crow, platypus–mouse, cat–rhino; Table Table5).5). Rate differences within major groups are frequently as great as differences between groups. For example, the rate for the chicken is not significantly different from the rate for the lizard, yet rates for chicken and crow are significantly different, apparently due to a rate increase in the crow lineage. Rates in birds differ from mammals in some cases (chicken–human, crow–mouse) but not others (chicken–mouse, crow–rat). The fact that rates differ among some endothermic vertebrates but may be similar between endotherms and ectotherms indicates that there are factors other than metabolic rate (and its associated oxygen radical production) that have important influence on rates of sequence evolution.

Nucleotide and Amino Acid Frequencies

Nucleotide frequencies varied by codon position and by taxonomic group (Table (Table6).6). At first and second positions, the relative frequencies were T > C? [gt-or-equal, slanted] A > G and at third positions, relative frequencies were A > C > T > G. Although nucleotide frequencies and GC percent at third positions varied greatly within groups and there were no clear trends among groups, first and second position GC percent varied less and there was a clear difference between mammals (always <44%) and nonmammals (>45%, except for the turtle). This difference corresponds to a marked difference in amino acid composition of translated proteins. Amino acids with G or C in the first or second position of their codons were much more frequent in nonmammals than in mammals (Table (Table7).7). In particular, the frequency distribution of arginine (CGN) is completely nonoverlapping between the two groups, alanine (GCN) is also nonoverlapping except for the turtle, and glycine (GGN) is nearly so. For proline (CCN), the nonprimate mammals show frequencies lower than nonmammals; however, the primates appear to reverse this trend, having frequencies comparable to nonmammals. In a complementary way, A- and U-containing codons are more frequent in mammals. For example, isoleucine (AUY) is completely nonoverlapping and asparagine (AAY) is nearly so. Tyrosine (UAY) would be completely nonoverlapping except (again) for the turtle. Thus, whereas third position nucleotide frequencies vary widely from lineage to lineage, first and second position frequencies vary less and they reflect a strong difference in amino acid frequencies among mammalian and nonmammalian taxa. It is curious that the greater frequency of G- and C-containing codons in nonmammals is opposite of the expectation that organisms operating under cooler temperatures should contain more A and T and that G and C should be more frequent under warmer metabolic conditions.

Table 6
Range of Nucleotide Frequencies (Percent) by Taxonomic Group and Codon Position
Table 7
Frequency Range of Selected Amino Acids among All Mitochondrial Proteins for Taxonomic Groups

Tests of amino acid frequency homogeneity within vertebrate groups (Table (Table8)8) indicate that significant departure from homogeneity only occurs when the sample contains both amniotes and nonamniotes or mammals and nonmammals. It appears that there is substantially more variation among the nonmammals or nonamniotes than within the mammals or amniotes. It is also noteworthy that endothermic birds are more similar to ectotherms than to the endothermic mammals in amino acid frequency. These results indicate that common ancestry and long-term historical trends may have more influence on nucleotide and amino acid composition than does thermal habit or metabolic rate.

Table 8
Tests of Amino Acid Frequency Homogeneitya for all 13 Genes within Selected Groups of Vertebrate Taxa


Despite extensive variation in mitochondrial genome structure among animal phyla, gene order and content are identical among zebrafish and the majority of vertebrates. Of 56 known vertebrate mitochondrial genomes, 44 have this same gene order, including all 30 placental mammals and 12 fishes (all but the sea lamprey) (Boore 1999). Gene order variants exist in some birds (Desjardins and Morais 1990), crocodilians (Janke and Arnason 1997), and snakes (Kumazawa et al. 1998); however, their taxonomic scope tends to be limited. The patterns of strand-specific nucleotide bias and unequal codon usage seen in the zebrafish are also conserved among vertebrates. In contrast, there are substantial differences in evolutionary rates and nucleotide and amino acid composition among vertebrate groups. The most striking finding is that rates of sequence evolution vary widely both within and among major groups, whereas nucleotide composition tends to be conserved within groups but varies substantially between mammals and all other taxa. This trend is also shown at the amino acid level where lower GC percent in mammals (particularly at first and second positions) results in amino acids with A + T-containing codons being more frequent than those with G + C-containing codons. All other taxa show higher GC percent and corresponding differences in amino acid composition. Whether this is based on differences in nucleotide bias at the genome level or on adaptive significance of proteins with different amino acid composition is not clear. However, these differences appear to be independent of selection on specific genes and metabolic rate or thermal habit.


Mature specimens of Danio rerio, strain ABC, were obtained from the University of Oregon Zebrafish Facility. Total DNA was isolated from ~25 mg of muscle tissue from a single individual. The preparation used the QIAamp tissue kit (QIAGEN) with DNA resuspended in 10 mM Tris (pH 8.0). Generation of sequencing templates used shotgun cloning of mtDNA. The mitochondrial genome was amplified by long PCR in two overlapping fragments of ~9 kb each. Reactions used Herculase DNA polymerase blend (Stratagene) according to supplier recommendations. One primer set consisted of 16S-AH (5′ atgtttttggtaaacaggcg 3′) located in the 16S rRNA gene and Arg-BL (5′ caagacccttgatttcggctca 3′) in the arginine tRNA gene. The other primer set consisted of 16S-BL2 (5′ tggtgcagccgctattaagg 3′) in the 16S rRNA gene and NAP-2H (5′ tggagcttctacgtgrgcttt 3′) in the ND4 gene. The two fragments overlapped over ND4L and part of ND4. The small gap between the divergent 16S primers was amplified (and sequenced separately) with primers 16S-ArL (5′ ctcggcaacacaagcctcgc 3′) and 16S-BrH (5′ tyayagatagaaactgacctgg 3′). Products of long PCR were purified with QIAquick PCR purification columns (QIAGEN) and randomly fragmented using a nebulizer. Broken ends of DNA strands were repaired with Klenow polymerase and phosphorylated with T4 polynucleotide kinase. Fragments in the range of 600–3000 bp were selectively recovered from a 1.0% agarose gel. These fragments were blunt-end ligated into (SmaI cut, dephosphorylated) pUC19 at 4°C overnight and transformed into Escherichia coli JM109 cells. White colonies were picked and grown in 96-deep well blocks (1.75 mL Luria Bertani medium plus ampicillin per well).

Plasmids from four blocks (= 384 clones) were purified via alkaline lysis methods using a Hydra microdispenser (Robbins Scientific). Sequencing used universal primers, BigDye terminator chemistry, and ABI 377 instruments. Individual sequences were assembled and analyzed with phred/phrap/CONSED (Ewing et al. 1998; Gordon et al. 1998). A few small gaps were closed via sequencing with specific primers on whole-genome templates generated via long-PCR using primers 16S-AH and 16S-BL2. Sequence coverage was sufficient to achieve a CONSED error rate of 0.01%. The zebrafish sequence was manipulated and aligned with Sequencher (GeneCodes). Genes were identified by alignment and comparison with sequences from other vertebrate taxa, particularly the common carp (Cyprinus carpio; Chang et al. 1994) and goldfish (Carassius auratus; Murakami et al. 1998). The complete sequence is available under GenBank accession no. AC024175.

Protein-coding genes from 35 vertebrate species obtained from GenBank were concatenated and aligned using Sequencher and corrected by eye to preserve reading frame. Regions of gene overlap and regions of unique insertions (gap in all but one sequence) were excluded, yielding a total alignment of 11,592 nucleotides. Of these, 8,189 were variable and 7,354 were parsimony informative (variants occurring in two or more sequences). Phylogenetic analyses using maximum parsimony, maximum likelihood, and neighbor-joining were performed with the computer program PAUP* ver. 4b8 (Swofford 2001). Parsimony searches used 100 random addition sequence replicates with TBR branch swapping and equal weighting of all characters/character-states. Maximum likelihood analysis used empirical mean nucleotide frequencies and empirical transition:transversion ratios. Other model parameters for likelihood were obtained by optimization on trees obtained either from maximum parsimony or neighbor-joining with LogDet distances. These parameters included the shape parameter (α?) for a discrete (four-class) gamma approximation of among site rate variation, the estimated proportion of invariant sites, and values for nucleotide change-rate matrices. The choice of likelihood rate model was based on model performance using the likelihood ratio test. Given the large numbers of taxa and characters, maximum likelihood analyses were computationally intensive and only a few rearrangements (<200) were evaluated for each search. However, all likelihood searches using the GTR model quickly converged on the same tree, regardless of starting parameters. Calculation of codon usage and nucleotide and amino acid frequencies used the computer program DAMBE ver. 4.0.43 (Xia 2000) and relative rate tests were conducted with MEGA ver. 2.0 (Kumar et al. 2001).


We thank W. Trevarrow for providing zebrafish specimens and S. Kenton for assistance with computational analysis and interpretation. Funding for this work was provided by grants from the National Institutes of Health and National Science Foundation EPSCoR program to B.A.R.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.


E-MAIL ude.uo@nothguorbr; FAX (405) 325-7702.

Article published on-line before print: Genome Res., 10.1101/gr.156801.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.156801.


  • Babcock C, Asmussen MA. Effects of differential selection in the sexes on cytonuclear polymorphism and disequilibria. Genetics. 1996;144:839–853. [PMC free article] [PubMed]
  • Boore JL. Animal mitochondrial genomes. Nucleic Acids Res. 1999;27:1767–1780. [PMC free article] [PubMed]
  • Boore JL, Brown WM. Big trees from little genomes: Mitochondrial gene order as a phylogenetic tool. Curr Opin Genet Dev. 1998;8:668–674. [PubMed]
  • Broughton RE, Dowling TE. Length variation in mitochondrial DNA of the minnow Cyprinella spiloptera. Genetics. 1994;138:179–190. [PMC free article] [PubMed]
  • Brown W M. The mitochondrial genome of animals. In: MacIntyre RJ, editor. Molecular evolutionary genetics. New York: Plenum; 1985. pp. 95–130.
  • Chang Y-S, Huang F-L, Lo T-B. The complete nucleotide sequence and gene organization of carp (Cyprinus carpio) mitochondrial genome. J Mol Evol. 1994;38:138–155. [PubMed]
  • Clayton DA. Replication of animal mitochondrial DNA. Cell. 1982;28:693–705. [PubMed]
  • ————— Nuclear gadgets in mitochondrial DNA replication and transcription. Trends Biochem Sci. 1991;16:107–111. [PubMed]
  • Croteau DL, Bohr VA. Repair of oxidative damage to nuclear and mitochondrial DNA in mammalian cells. J Biol Chem. 1997;272:25409–25412. [PubMed]
  • Curole JP, Kocher TD. Mitogenomics: Digging deeper with complete mitochondrial genomes. Trends Ecol Evol. 1999;14:394–398. [PubMed]
  • Desjardins P, Morais R. Sequence and gene organization of the chicken mitochondrial genome. J Mol Biol. 1990;212:599–634. [PubMed]
  • Detrich HW, Westerfield M, Zon LI. The zebrafish: Biology. New York: Academic Press; 1998.
  • Engelman DM, Steitz TA, Goldman A. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biomol Struct. 1986;15:321–353. [PubMed]
  • Ewing B, Hillier L, Wendl M, Green P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. [PubMed]
  • Fliss MS, Usadel H, Caballero OL, Wu L, Buta MR, Eleff SM, Jen J, Sidransky D. Facile detection of mitochondrial DNA mutations in tumors and bodily fluids. Science. 2000;287:2017–2019. [PubMed]
  • Foran DR, Hixson JE, Brown WM. Comparisons of ape and human sequences that regulate mitochondrial DNA transcription and D-loop synthesis. Nucleic Acids Res. 1988;13:5841–5861. [PMC free article] [PubMed]
  • Gordon D, Abajian C, Green P. Consed: A graphical tool for sequence finishing. Genome Res. 1998;8:195–202. [PubMed]
  • Gray MW, Burger G, Lang BF. Mitochondrial evolution. Science. 1999;283:1476–1481. [PubMed]
  • Green DR, Reed JC. Mitochondria and apoptosis. Science. 1998;281:1309–1312. [PubMed]
  • Hauswirth WW, Laipis PJ. Mitochondrial DNA polymorphism in a maternal lineage of Holstein cows. Proc Nat Acad Sci. 1982;79:4686–4690. [PMC free article] [PubMed]
  • Hillis DM, Huelsenbeck JP, Cunningham CW. Application and accuracy of molecular phylogenies. Science. 1994;264:671–677. [PubMed]
  • Huelsenbeck JP. Performance of phylogenetic methods in simulation. Syst Biol. 1995;44:17–48.
  • Irwin DM, Kocher TD, Wilson AC. Evolution of the cytochrome b gene of mammals. J Mol Evol. 1991;32:128–144. [PubMed]
  • Janke A, Arnason U. The complete mitocondrial genome of Alligator mississippiensis and the separation between recent Archosauria (birds and crocodiles) Mol Biol Evol. 1997;14:1266–1272. [PubMed]
  • Kumar, S, Tamura, K., Jakobsen, I.B., and Nei, M. 2001. MEGA2: Molecular Evolutionary Genetics Analysis software, Bioinformatics (In press). [PubMed]
  • Kumazawa Y, Ota H, Nishida M, Ozawa T. The complete nucleotide sequence of a snake (Dinodon semicarinatus) mitochondrial genome with two identical control regions. Genetics. 1998;150:313–329. [PMC free article] [PubMed]
  • Lanave C, Preparata G, Saccone C, Serio G. A new method for calculating evolutionary substitution rates. J Mol Evol. 1984;20:86–93. [PubMed]
  • Lightowlers RN, Chinnery PF, Turnbull DM, Howell N. Mammalian mitochondrial genetics: Heredity, heteroplasmy and disease. Trends Genet. 1997;13:450–455. [PubMed]
  • Lynch M. Mutation accumulation in nuclear, organelle, and prokaryotic genomes: Transfer RNA genes. Mol Biol Evol. 1997;14:914–925. [PubMed]
  • Martin AP, Palumbi SR. Body size, metabolic rate, generation time, and the molecular clock. Proc Nat Acad Sci. 1993;90:4087–4091. [PMC free article] [PubMed]
  • Michikawa Y, Mazzucchelli F, Bresolin N, Scarlato G, Attardi G. Aging-dependent large accumulation of point mutations in the human mtDNA control region for replication. Science. 1999;286:774–779. [PubMed]
  • Mindell DP, Sorenson MD, Dimcheff EE. Multiple independent origins of mitochondrial gene order in birds. Proc Nat Acad Sci. 1998;95:10693–10697. [PMC free article] [PubMed]
  • MITOMAP: A Human Mitochondrial Genome Database. 2000. Center for Molecular Medicine, Emory University, Atlanta, GA. http://www.gen.emory.edu/mitomap.html.
  • Murakami M, Yamashita Y, Fujitani H. The complete sequence of mitochondrial genome from a gynogenetic triploid ‘ginbuna’ (Carassius auratus langsdorfi) Zool Sci. 1998;15:335–337. [PubMed]
  • Naylor GJP, Brown WM. Amphioxus mitochondrial DNA, Chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998;47:61–76. [PubMed]
  • Naylor GJP, Collins TM, Brown WM. Hydrophobicity and phylogeny. Nature. 1995;373:565–566. [PubMed]
  • Ojala D, Montoya J, Attardi G. tRNA punctuation model of RNA processing in human mitochondria. Nature. 1981;290:470–474. [PubMed]
  • Penny D, Hasegawa M, Waddell PJ, Hendy MD. Mammalian evolution: Timing and implications from using the LogDeterminant transform for proteins of differing amino acid composition. Syst Biol. 1999;48:76–93. [PubMed]
  • Polyak K, Li Y, Zhu H, Lengauer C, Willson JK, Markowitz SD, Trush MA, Kinzler KW, Vogelstein B. Somatic mutations of the mitochondrial genome in human colorectal tumours. Nat Genet. 1998;20:291–293. [PubMed]
  • Rasmussen A-S, Arnason U. Phylogenetic studies of complete mitochondrial DNA molecules place cartilaginous fish within the tree of bony fish. J Mol Evol. 1999;48:118–123. [PubMed]
  • Roe BA, Ma D-P, Wilson RK, Wong JF-H. The complete nucleotide sequence of the Xenopus laevis mitochondrial genome. J Biol Chem. 1985;260:9759–9774. [PubMed]
  • Saraste M. Oxidative phosphorylation at the fin de siecle. Science. 1999;283:1488–1493. [PubMed]
  • Schatz G. The protein import system of mitochondria. J Biol Chem. 1996;271:31763–31766. [PubMed]
  • Shadel GS, Clayton DA. Mitochondrial DNA maintenance in vertebrates. Annu Rev Biochem. 1997;66:409–435. [PubMed]
  • Shimodaira H, Hasegawa M. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999;16:1114–1116.
  • Steel MA. Recovering a tree from the leaf colourations it generates under a Markov model. Appl Math Lett. 1994;7:19–23.
  • Sullivan J, Holsinger KE, Simon C. Among-site rate variation and phylogenetic analysis of 12S rRNA in sigmodontine rodents. Mol Biol Evol. 1995;12:988–1001. [PubMed]
  • Susin SA, Lorenzo HK, Zamzami N, Marzo I, Snow BE, Brothers GM, Mangion J, Jacotet E, Constantini P, Loeffler M, et al. Molecular characterization of mitochondrial apoptosis-inducing factor. Nature. 1999;397:441–446. [PubMed]
  • Swofford DL. PAUP*: Phylogenetic analysis using parsimony and other methods. Sunderland, MA: Sinauer; 2001.
  • Swofford DL, Olsen GJ, Waddell PJ, Hillis DM. Phylogenetic inference. In: Hillis DM, Moritz C, Mable B, editors. Molecular systematics. Sunderland, MA: Sinauer; 1996. pp. 407–514.
  • Tajima F. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics. 1993;135:599–607. [PMC free article] [PubMed]
  • Waddell PJ, Cao Y, Hasegawa M, Mindell DP. Assessing the cretaceous superordinal divergence times within birds and placental mammals by using whole mitochondrial protein sequences and an extended statistical framework. Syst Biol. 1999;48:119–137. [PubMed]
  • Wallace DC. Mitochondrial diseases in man and mouse. Science. 1999;283:1482–1488. [PubMed]
  • Wong TW, Clayton DA. In vitro replication of human mitochondrial DNA: Accurate initiation at the origin of light-strand synthesis. Cell. 1985;42:951–958. [PubMed]
  • Xia X. Maximizing transcription efficiency causes codon usage bias. Genetics. 1996;144:1309–1320. [PMC free article] [PubMed]
  • ————— . Data analysis in molecular biology and evolution. Boston: Kluwer Academic Publishers; 2000.
  • Zardoya R, Meyer A. The complete DNA sequence of the mitochondrial genome of a ‘living fossil’, the coelacanth (Latimeria chalumnae) Genetics. 1997;146:995–1010. [PMC free article] [PubMed]
  • ————— Complete mitochondrial genome suggests diapsid affinities of turtles. Proc Nat Acad Sci. 1998;95:14226–14231. [PMC free article] [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...