![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2005 Heron Publishing—Victoria, Canada Identification of replication origins in archaeal genomes based on the Z-curve method 1 Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060, China 2 Department of Physics, Tianjin University, Tianjin 300072, China * Corresponding author (Email: ctzhang/at/tju.edu.cn) Received July 15, 2004; Accepted August 31, 2004. This article has been cited by other articles in PMC.Abstract
The Z-curve is a three-dimensional curve that
constitutes a unique representation of a DNA sequence, i.e., both the
Z-curve and the given DNA sequence can be uniquely
reconstructed from the other. We employed Z-curve
analysis to identify one replication origin in the
Methanocaldococcus jannaschii genome, two replication
origins in the Halobacterium species NRC-1 genome and
one replication origin in the Methanosarcina mazei
genome. One of the predicted replication origins of
Halobacterium species NRC-1 is the same as a
replication origin later identified by in vivo experiments. The
Z-curve analysis of the Sulfolobus
solfataricus P2 genome suggested the existence of three
replication origins, which is also consistent with later experimental
results. This review aims to summarize applications of the
Z-curve in identifying replication origins of
archaeal genomes, and to provide clues about the locations of as yet
unidentified replication origins of the Aeropyrum
pernix K1, Methanococcus maripaludis S2,
Picrophilus torridus DSM 9790 and Pyrobaculum
aerophilum str. IM2 genomes.
Keywords: Halobacterium, Methanocaldococcus jannaschii, Methanosarcina mazei Introduction
The Archaea are a group of prokaryotes that were recognized in 1977 as
an independent monophyletic domain of life
(Woese and Fox 1977). The
evolutionary relationships among the Archaea and the other domains of
life, the Bacteria and the Eukarya, are uncertain. However, based on
similarities in the proteins involved, the process of replication in
archaea appears to be more closely related to that in eukarya than in
bacteria (Edgell and Doolittle
1997, Tye 2000,
MacNeill 2001,
Giraldo 2003,
Kelman and Hurwitz 2003). Our
understanding of archaeal replication mechanisms has advanced
dramatically in the past few years
(Bernander 2000,
2003,
Kelman 2000,
Tye 2000,
Bohlke et al. 2002,
Grabowski and Kelman 2003,
Kelman and Kelman 2003), and it
appears that archaea have a simplified version of the eukaryotic
replication apparatus. Clarification of the archaeal replication
mechanism is therefore important not only to the understanding of
archaeal replication, but also for the insight it may provide into the
replication mechanisms of eukarya.
Replication initiates bidirectionally at a specific locus called the
origin of replication. Knowing the positions and sequences of
replication origins is critical to understanding the initiation phase of
replication. Replication origins have currently been identified in vivo
for only four of the 19 available archaeal genomes
(Myllykallio et al. 2000,
Maisnier-Patin et al. 2002,
Berquist and DasSarma 2003,
Matsunaga et al. 2003,
Lundgren et al. 2004,
Robinson et al. 2004). The
experimental methods for identifying replication origins in vivo are
reliable, but time-consuming and labor-intensive. In
silico analysis, however, is fast and suitable for handling a
large number of genomes. In addition, in some experimental methods,
e.g., as used to identify the replication origin of
Halobacterium species NRC-1
(Berquist and DasSarma 2003), the
replication origin must first be located approximately in a known
sequence.
With the advent of the post-genomic era, genomic data are accumulating
exponentially. High-throughput methods for genome annotations, e.g.,
replication origin identification, are thus needed to meet the challenge
of interpreting this information. The identification of replication
origins based on in silico analysis has been the
subject of intensive study during the past few years. The GC skew method
was first proposed to detect nucleotide composition asymmetry around the
replication origin (Lobry
1996a). Other algorithms were later proposed to
tackle the same task (Grigoriev
1998, McLean et al. 1998,
Mrazek and Karlin 1998,
Salzberg et al. 1998,
Rocha et al. 1999).
The Z-curve is a three-dimensional curve that
constitutes a unique representation of a DNA sequence, i.e., for the
Z-curve and the given DNA sequence, each can be
uniquely reconstructed from the other
(Zhang and Zhang 1991,
1994). We have used
Z-curve analysis to identify one replication origin in
the Methanocaldococcus jannaschii genome
(Zhang and Zhang
2004b), two replication origins in the
Halobacterium species NRC-1 genome
(Zhang and Zhang
2003c) and one replication origin in the
Methanosarcina mazei genome
(Zhang and Zhang 2002). One
predicted replication origin of Halobacterium species
NRC-1 is the same as the replication origin later identified by in vivo
experiments (Berquist and DasSarma
2003). The Z-curve analysis suggested the
existence of three replication origins in the Sulfolobus
solfataricus P2 genome, and indicated their approximate
locations (Zhang and Zhang
2003c), the results being consistent with the
results of subsequent in vivo studies
(Lundgren et al. 2004,
Robinson et al. 2004).
This review summarizes past applications of the Z-curve
in identifying replication origins in archaeal genomes, and applies the
same technique in the search for clues about the locations of as yet
unidentified archaeal replication origins.
The Z-curve representation of genome sequences
The Z-curve is a three-dimensional curve that provides
a unique representation of a DNA sequence in that the DNA sequence and
the Z-curve can each be uniquely reconstructed from the
other. Therefore, the Z-curve contains all the
information that the corresponding DNA sequence carries. The resulting
curve has a zigzag shape, hence the name Z-curve. A DNA
sequence can be analyzed by studying the corresponding
Z-curve. One of the advantages of the
Z-curve is its intuitiveness; the entire
Z-curve of a genome can be viewed on a computer screen
or on paper, regardless of genome length, thus allowing both global and
local compositional features of genomes to be easily grasped. By
combining use of the Z-curve with statistical analysis,
better results may be obtained.
The Z-curve is composed of a series of nodes,
P0, P1,
P2, ..., PN,
with coordinates xn,
yn and zn
(n = 0, 1, 2, …, N, where
N is the length of the DNA sequence), which are
uniquely determined by the Z-transform of a DNA
sequence (Zhang and Zhang 1991,
1994,
Zhang et al. 2003):
where An,
Cn,
Gnand Tn are the
cumulative occurrence numbers of A, C, G and T, respectively, in the
subsequence from the first base to the nth base in the
sequence. We define A0 =
C0 = G0 =
T0 = 0, and therefore,
x0 = y0 =
z0 = 0. Here R,
Y, M, K,
W and S represent the purine,
pyrimidine, amino, keto, weak hydrogen (H) bond and strong H bond bases,
respectively, according to the Recommendation 1984 by the NC-IUB
(Cornish-Bowden 1985). The
Z-curve is defined as the sequential connection of the
nodes P0, P1,
P2, ..., PN with
straight lines. Note that the Z-curve always starts
from the origin of the three-dimensional coordinate system. Once the
coordinates xn,
yn and zn
(n = 1, 2, …, N) of a
Z-curve are given, the corresponding DNA sequence can
be reconstructed from the so-called inverse
Z-transform:
where An +
Cn +
Gn+
Tn = n.
The three components of the Z-curve,
xn, yn and
zn, represent three independent
distributions that completely describe the DNA sequence being studied.
The components xn,
yn and zn
display the distributions of purine versus pyrimidine (R vs. Y), amino
versus keto (M vs. K) and strong H-bond versus weak H-bond (S vs. W)
bases along the sequence, respectively. In the subsequence constituted
from the first base to the nth base of the sequence,
when purine bases (A and G) are in excess of pyrimidine bases (C and T),
xn > 0, otherwise,
xn < 0, and when the numbers of purine
and pyrimidine bases are identical, xn = 0.
Similarly, when amino bases (A and C) are in excess of keto bases (G and
T), yn > 0, otherwise,
yn < 0, and when the numbers of amino and
keto bases are identical, yn = 0. Finally,
when weak H-bond bases (A and T) are in excess of strong H-bond bases (G
and C), zn > 0, otherwise,
zn < 0, and when the numbers of weak and
strong H-bond bases are identical, zn = 0.
The xn and yn
components are termed RY and MK disparity curves, respectively. The AT
and GC disparity curves are defined by (xn +
yn)/2 and (xn
– yn)/2, which shows the excess of A
over T and G over C, respectively, along the genome. The RY and MK
disparity curves, as well as AT and GC disparity curves, can be used to
predict replication origins. Figure
1
Replication origin identification in the Methanocaldococcus jannaschii genome
Methanocaldococcus jannaschii is an autotroph that
grows at pressures greater than 20 MPa and at temperatures up to 94
°C (Jones et al. 1983). As
the first completely sequenced archaeon
(Bult et al. 1996), M.
jannaschii is notorious for the difficulty it presents to those
seeking to identify its replication origins. Despite extensive efforts,
the locations of the replication origins of this species remain elusive
8 years after the publication of its complete genome sequence. Ambiguous
results were obtained in identifying the replication origins of
M. jannaschii based on all in silico
genome analyses, which usually assess biases in nucleotide, codon and
oligomer usages (Salzberg et al.
1998, Lopez et al. 1999,
Rocha et al. 1999). Recently, a
technique called marker frequency analysis was successfully applied in
vivo to identify the location of the replication origin of the archaeon
Archaeoglobus fulgidus. It failed, however, in the case
of M. jannaschii
(Maisnier-Patin et al. 2002).
Distinguishing it from other archaea, the genome of M.
jannaschii was generally thought to lack a clear
cdc6 homologue
(Bernander 2000).
The RY disparity curve for the M. jannaschii genome
shows a global minimum at the position of about 695 kb, indicating that
the genome changes from CT-rich to AG-rich at this site
(Figure 2a
A closer look at the region revealed that an intergenic region of about
700 bp between the cdc6 homologue and an adjacent gene
has many characteristics of a replication origin. This intergenic region
is between the ORF MJ0773 and MJ0774, from 694,540–695,226 bp of
the genome. The region is 687 bp in length and is highly AT-rich (80%).
In addition, there are multiple copies of direct repeat elements and AT
stretches. This region contains almost all the features of known
replication origins and is, therefore, very likely a true replication
origin, which has been designated oriC1
(Zhang and Zhang
2004b).
Recently, marker frequency analysis was successfully applied in vivo to
identify the location of a replication origin of A.
fulgidus. However, M. jannaschii displayed a
complex pattern of marker frequency distributions with multiple peaks
and valleys. An intriguing explanation proposed for this pattern is that
it reflects the presence of multiple replication origins
(Maisnier-Patin et al. 2002). The
features of the MK disparity curve for M. jannaschii
are consistent with this hypothesis.
The MK disparity curve for M. jannaschii shows four
extremes, including one probable replication origin associated with the
oriC1 (Figure 2a Replication origin identification in the Halobacterium species NRC-1 and Sulfolobus solfataricus genomes
Halobacterium NRC-1 belongs to the obligatorily
halophilic Halobacterium species, and is an
experimental model among archaea. The exact locations of all replication
origins have not been identified, although the possibility of multiple
replication origins was suggested based on the GC-skew analysis
(Ng et al. 2000,
Kennedy et al. 2001).
The RY and MK disparity curves show two relatively sharp and two
relatively broad peaks. Interestingly, two of the three
cdc6 genes are located at the positions of the two
sharp peaks (Figure 2b
The putative replication origin oriC1 is at the
intergenic region close to the cdc6-1
gene, which is from 921,863–922,014 bp. The oriC1
contains two long direct repeats. The putative replication
origin oriC2 is at the intergenic region close to the
cdc6-3 gene, which is from
1,806,444–1,807,229 bp. In addition, two helicase genes were
located about 20 kb away from these two regions, respectively
(Zhang and Zhang
2003c). Soon afterwards, a replication origin of
Halobacterium NRC-1 was identified in vivo by
Berquist and DasSarma (2003).
These authors found that sequences located up to 750 bp upstream of the
orc7 gene (cdc6-3)
translational start, plus the orc7 gene and 50 bp
downstream, are sufficient to endow the plasmid with replication
ability. Further, they found that the sequence within the 750-bp region
upstream of orc7 contains a nearly perfect inverted
repeat of 31 bp, which flanks an extremely AT-rich stretch of 189 bp.
The region containing these inverted repeats and AT-rich stretch is
within the predicted oriC2, 1,806,444–1,807,229
bp (Zhang and Zhang
2003c).
A breakthrough in the study of archaeal replication origins was the
demonstration that S. solfataricus has multiple
replication origins. This is the first archaeon found to have multiple
replication origins, referred to as oriC1 and
oriC2, according to the nomenclature of
Lundgren et al. (2004) and
Robinson et al. (2004). The
replication origins oriC1 and oriC2
are located at sites close to cdc6-1
and cdc6-3, respectively
(Robinson et al. 2004).
Interestingly, the RY disparity curve for the archaeon S.
solfataricus shows a global maximum around the position of the
cdc6-3 genes, whereas the MK disparity
curve shows a maximum at the position of
cdc6-1
(Figure 2c Replication origin identification in the Methanosarcina mazei genome
The archaeon Methanosarcina mazei and related species
have great ecological importance, because they are the only organisms
that ferment acetate, methylamines and methanol to methane, carbon
dioxide and ammonia. Since acetate is the precursor of 60% of the
methane produced on Earth, these organisms contribute significantly to
the production of this greenhouse gas
(Deppenmeier et al. 2002).
Both RY and MK disparity curves for M. mazei show a
global maximum at about 1600 kb and a minimum at about 3600 kb
(Figure 2d Common features of archaeal replication origins
So far, replication origins of four archaea have been identified in
vivo. Two replication origins have been identified in the S.
solfataricus P2 genome by 2-D gel analysis
(Robinson et al. 2004) and the
approximate location of the third was suggested by marker frequency
analysis (Lundgren et al. 2004).
One replication origin has been identified in Pyrococcus
abyssi GE5 based on oligomer skew analysis, which was later
confirmed in vivo (Lopez et al.
1999, Myllykallio et al.
2000, Matsunaga et al.
2003). An autonomously replicating sequence element has been
identified in Halobacterium sp. NRC-1
(Berquist and DasSarma 2003). The
marker frequency analysis showed a candidate region of a replication
origin in A. fulgidus; however, the exact location of
the replication origin has not been determined
(Maisnier-Patin et al. 2002).
Common features of archaeal replication origins can be summarized based
on what is known about replication origins identified in vivo. Except
that of A. fulgidus, all identified replication origins
are associated with an extreme in one of the components of the
Z-curve. In addition, the extremes associated with
replication origins are relatively sharp compared with those associated
with replication termini, probably because termination sometimes occurs
at multiple loci. These replication origins are located immediately
beside a cdc6 gene. This is similar to the case in
bacteria, where a gene coding for DnaA is frequently close to the
oriC (Mackiewicz et al.
2004). Replication origins are highly rich in AT content. The
identified replication origins have AT stretches, as well as multiple
copies of direct or inverted repeat elements. Furthermore, some
replication origins, e.g., those of S. solfataricus,
contain conserved Cdc6 binding elements.
Based on the above conserved features, some putative replication origins
have been identified by in silico analysis, but have
yet to be confirmed in vivo. These include a replication origin of
Methanothermobacter thermautotrophicus str. Delta H
(Lopez et al. 1999), a
replication origin of Methanosarcina acetivorans C2A
(Galagan et al. 2002), one of the
two putative replication origins in Halobacterium sp.
NRC-1 (Zhang and Zhang
2003c), a replication origin in the M.
mazei genome (Zhang and Zhang
2002) and a replication origin in the M.
jannaschii genome (Zhang and
Zhang 2004b). A replication origin of
Pyrococcus furiosus DSM 3638 and a replication origin
of Pyrococcus horikoshii OT3 were identified based on
homologue analysis with Pyrococcus abyssi
(Lopez et al. 1999). In
addition, a replication origin of Thermoplasma
acidophilum DSM 1728 was predicted based on different
nucleotide skews; however, other conserved features of archaeal
replication origins, e.g., the close proximity to a
cdc6 gene and the presence of repeat elements, were not
mentioned (Ruepp et al. 2000).
Furthermore, one replication origin of Methanopyrus
kandleri AV19 was predicted based on the GC-skew analysis;
however, the figure of GC-skew provided by the authors does not seem to
have a clear minimum or maximum at the site of predicted replication
origin (Slesarev et al. 2002).
Furthermore, various components of the Z-curve show a
complex pattern in the case of M. kandleri
(Figure
3a
Besides the above common features observed among replication origins,
there are some differences. For instance, sometimes all disparity curves
(MK, RY, AT and GC) show a global maximum or minimum for a given origin,
whereas in other cases, only one or a subset of curves shows significant
peaks. In addition, in the A. fulgidus genome, although
an approximate region of replication origin was suggested by marker
frequency analysis, both Z-curve
(Figure 3b
A reasonable procedure for identifying replication origins by the
Z-curve method appears to be: (1) generate RY, MK, AT
and GC disparity curves for the available genomes; and (2) if there is a
minimum or maximum in any of the curves, investigate the regions around
each extreme for some replication origin specific features such as the
presence of cdc6 genes or AT- rich intergenic regions
that contain repeats.
Z-curve analysis of archaeal genomes with unknown replication origins
In seven out of the 19 available archaeal genomes, replication origins
have yet to be identified, and clues to some of their locations have not
been found. These seven genomes are Aeropyrum pernix
K1, Methanococcus maripaludis S2, Nanoarchaeum
equitans Kin4-M, Picrophilus torridus DSM
9790, Pyrobaculum aerophilum str. IM2,
Sulfolobus tokodaii str. 7 and Thermoplasma
volcanium GSS1. Among these seven genomes, the
Z-curves for N. equitans Kin 4-M and
S. tokodaii str. 7 have a complex pattern, i.e., no
global minima or maxima (Figures 3c
The RY and MK disparity curves for T. volcanium GSS1
show a similar pattern to that of T. acidophilum DSM
1728 and have a global minimum and maximum (data not shown), suggesting
the presence of a single replication origin. However, no replication
origin specific features, such as the presence of a
cdc6 gene, could be found around the
Z-curve extremes. The Z-curves for the
remaining four genomes, A. pernix K1, M.
maripaludis S2, P. torridus DSM 9790 and
P. aerophilum str. IM2 show some replication
origin-specific features at the extremes, and thus provide additional
clues to regions that may contain replication origins.
Robinson et al. (2004) found some
conserved Cdc6 binding elements across archaeal genomes. In the
A. pernix K1 genome, such an element is located at 445
kb of the genome (Robinson et al.
2004). At 445 kb, the GC disparity curve shows a minimum,
implying that the nucleotide composition changes around this site
(Figure 4a
A putative replication origin has been assigned in the M.
jannaschii DSM 2661 genome
(Zhang and Zhang
2004b). A relative of M.
jannaschii DSM 2661, M. maripaludis S2, has
been sequenced recently. The AT disparity curve for M.
maripaludis S2 shows a global minimum, suggesting the presence
of a replication origin around this site. In addition, the pattern of
the AT disparity curve for M. maripaludis is similar to
the RY disparity curve of M. jannaschii (compare
Figures 4b
The RY disparity curve for the P. torridus DSM 9790
genome shows a global minimum at the position 650 kb
(Figure 4c
Among the 19 available archaeal genomes, the Z-curves
for the genomes of four species show a complex pattern, with no clear
global minima or maxima: M. kandleri AV19, A.
fulgidus DSM 4304, N. equitans Kin4-M and
S. tokodaii str. 7
(Figure 3 Comparison of the Z-curve method with others
Various methods for the graphical representation of DNA sequences have
been proposed, such as the H curve
(Hamori and Ruskin 1983), the
game representation (Jeffrey
1990), color DNA tetragram
(Pickover 1992) and the
two-dimensional DNA walk (Gates
1986, Lobry
1996b). It was shown that most are special cases
of the Z-curve, and an extensive comparison between the
Z-curve and other methods proposed before 1994 was
detailed in Zhang and Zhang
(1994). It is noteworthy that the so-called purine excess and
keto excess (Freeman et al. 1998)
are identical to the x and y
components of the Z-curve, which was proposed 4 years
earlier (Zhang and Zhang 1994).
Traditionally, the GC skew analysis is often used to assess the
nucleotide compositional asymmetry around the replication origin. The GC
skew is defined as (C – G)/(C + G), where C and G are the number
of C and G residues in a sliding window
(Lobry 1996a).
Later, a method of cumulative GC skew without sliding windows was
proposed, which is thought to give better resolution
(Grigoriev 1998). Because the
Z-curve provides a unique representation of a DNA
sequence, it contains all the information that the DNA sequence carries.
Therefore, the Z-curve is not any DNA walk, but almost
all DNA walks are special cases of the Z-curve or functions of
xn, yn and
zn. For instance, the cumulative GC skew is
equal to (yn –
xn)/(n –
zn) (see Equation 1). Indeed, almost all the
replication origins that were identified based on the GC skew, including
those of bacteria, viruses and mitochondria, are indicated by a change
in polarity in the Z-curve
(Zhang et al. 2003). However, for
some genomes, e.g., that of S. solfataricus, GC skew
failed to show the compositional asymmetry around the replication
origins that is detected with the Z-curve
(Zhang and Zhang
2003c).
Availability of the Z-curve drawing software
Software has been developed to facilitate the use of the
Z-curve. The software, Zplotter online, draws and
manipulates the Z-curve online, based on a user’s
input sequence. With this software, RY, MK, AT and GC disparity curves
can be shown for a user’s DNA sequence in the forward (5′ to
3′) and inverted (3′ to 5′) directions and for their
complementary strands. The resolution of any local parts of each curve
can be arbitrarily adjusted with the built-in zoom function. The
Z-curve coordinates can also be shown by putting the
cursor at the site of interest. In addition, a user can download the
local version of the Zplotter program and run it on their own computer.
This software is freely available from the Z-curve
database (Zhang et al. 2003) at
http://tubic.tju.edu.cn/zcurve/.
Perspective
In bacteria, replication initiates at a unique site, whereas in eukarya,
replication occurs at multiple sites along the genome. A recent
breakthrough was the demonstration that the archaeon S.
solfataricus has at least two replication origins—the
first example of the presence of multiple replication origins in archaea
(Robinson et al. 2004).
Eukaryotic genomes, such as the human genome, have thousands of
replication origins, thus complicating the study of replication. In this
respect, the simplified version of eukaryotic replication, i.e.,
archaeal replication that utilizes two or three replication origins, is
an excellent model, especially for the study of how the cell coordinates
replications occuring at multiple origins. The Z-curve
analysis for the Halobacterium species NRC-1 and
M. jannaschii shows the possibility that these genomes
also have multiple replication origins, and some candidate sites are
suggested, e.g., the second replication origin of
Halobacterium species NRC-1 is suggested to be
921,863–922,014 bp of the genome
(Zhang and Zhang
2003c,
2004b). It is
hoped that further in vivo studies will confirm the multiple replication
origins in the Halobacterium species NRC-1 and
M. jannaschii genomes.
The Z-curve is a powerful tool for in
silico identification of archaeal and bacterial replication
origins. Because the Z-curve contains all the
information that the corresponding DNA sequence carries, the DNA
sequence can be studied by geometrical methods with the
Z-curve, which is nicely complementary to widely used
mathematical methods. Consequently, the Z-curve has
been used for many purposes in addition to the identification of
replication origins. For instance, algorithms based on the
Z-curve have been used to recognize protein-coding
genes in both prokaryotic (Guo et al.
2003) and eukaryotic genomes
(Zhang and Wang 2000).
Furthermore, it has been shown that the algorithm based on the
Z-curve is among the best available for gene
recognition (Gao and Zhang 2004).
The Z-curve has also been used in isochore
identification (Zhang and Zhang
2003a,
2004a),
detection of horizontally transferred genomic islands
(Zhang and Zhang
2004c), comparative genomics
(Zhang and Zhang
2003b), and in studying the distribution of
nucleotide composition (Ou et al.
2003). With the availability of an increasing number of complete
genome sequences, it is hoped that the Z-curve may play
a more and more important role in genome research.
Acknowledgments
The present study was supported in part by the 973 Project of China
(Grant 1999075606).
References R1. Bernander R. Chromosome replication, nucleoid segregation and cell division in archaea. Trends Microbiol. 2000;8:278–283. [PubMed] R2. Bernander R. The archaeal cell cycle: current issues. Mol. Microbiol. 2003;48:599–604. [PubMed] R3. Berquist B.R., DasSarma S. An archaeal chromosomal autonomously replicating sequence element from an extreme halophile, Halobacterium sp. strain NRC-1. J. Bacteriol. 2003;185:5959–5966. [PubMed] R4. Bohlke K., Pisani F.M., Rossi M., Antranikian G. Archaeal DNA replication: spotlight on a rapidly moving field. Extremophiles. 2002;6:1–14. [PubMed] R5. Brochier C., Forterre P., Gribaldo S. Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biol. 2004;5:R17. [PubMed] R6. Bult C.J., White O., Olsen G.J., et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
. Science. 1996;273:1058–1073. [PubMed] R7. Cohen G.N., Barbe V., Flament D., et al. An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abyssi
. Mol. Microbiol. 2003;47:1495–1512. [PubMed] R8. Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 1985;13:3021–3030. [PubMed] R9. Deppenmeier U., Johann A., Hartsch T., et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 2002;4:453–461. [PubMed] R10. Edgell D.R., Doolittle W.F. Archaea and the origin(s) of DNA replication proteins. Cell. 1997;89:995–998. [PubMed] R11. Fitz-Gibbon S.T., Ladner H., Kim U.J., Stetter K.O., Simon M.I., Miller J.H. Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum
. Proc. Natl. Acad. Sci. USA. 2002;99:984–989. [PubMed] R12. Freeman J.M., Plasterer T.N., Smith T.F., Mohr S.C. Patterns of genome organization in bacteria. Science. 1998;279:1827. R13. Futterer O., Angelov A., Liesegang H., Gottschalk G., Schleper C., Schepers B., Dock C., Antranikian G., Liebl W. Genome sequence of Picrophilus torridus and its implications for life around pH 0. Proc. Natl. Acad. Sci. USA. 2004;101:9091–9096. [PubMed] R14. Galagan J.E., Nusbaum C., Roy A., et al. The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res. 2002;12:532–542. [PubMed] R15. Gao F., Zhang C.T. Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics. 2004;20:673–681. [PubMed] R16. Gates M.A. A simple way to look at DNA. J. Theor. Biol. 1986;119:319–328. [PubMed] R17. Giraldo R. Common domains in the initiators of DNA replication in bacteria, archaea and eukarya: combined structural, functional and phylogenetic perspectives. FEMS Microbiol. Rev. 2003;26:533–554. [PubMed] R18. Grabowski B., Kelman Z. Archaeal DNA replication: eukaryal proteins in a bacterial context. Annu. Rev. Microbiol. 2003;57:487–516. [PubMed] R19. Grigoriev A. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 1998;26:2286–2290. [PubMed] R20. Guo F.B., Ou H.Y., Zhang C.T. ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. Nucleic Acids Res. 2003;31:1780–1789. [PubMed] R21. Hamori E., Ruskin J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. J. Biol. Chem. 1983;258:1318–1327. [PubMed] R22. Jeffrey H.J. Chaos game representation of gene structure. Nucleic Acids Res. 1990;18:2163–2170. [PubMed] R23. Jones W.J., Leigh J.A., Mayer F., Woese C.R., Wolfe R.S. Methanococcus jannaschii sp. nov., an extremely thermophilic methanogen from a submarine hydrothermal vent. Arch. Microbiol. 1983;136:254–261. R24. Kawarabayasi Y., Sawada M., Horikawa H., et al. Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. DNA Res. 1998;5:55–76. [PubMed] R25. Kawarabayasi Y., Hino Y., Horikawa H., et al. Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res. 1999;6:83–101, 145–152. [PubMed] R26. Kawarabayasi Y., Hino Y., Horikawa H., et al. Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain 7. DNA Res. 2001;8:123–140. [PubMed] R27. Kawashima T., Amano N., Koike H., et al. Archaeal adaptation to higher temperatures revealed by genomic sequence of Thermoplasma volcanium
. Proc. Natl. Acad. Sci. USA. 2000;97:14257–14262. [PubMed] R28. Kelman Z. The replication origin of archaea is finally revealed. Trends Biochem. Sci. 2000;25:521–523. [PubMed] R29. Kelman Z., Hurwitz J. Structural lessons in DNA replication from the third domain of life. Nat. Struct. Biol. 2003;10:148–150. [PubMed] R30. Kelman L.M., Kelman Z. Archaea: an archetype for replication initiation studies? Mol. Microbiol. 2003;48:605–615. [PubMed] R31. Kennedy S.P., Ng W.V., Salzberg S.L., Hood L., Dassarma S. Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res. 2001;11:1641–1650. [PubMed] R32. Klenk H.P., Clayton R.A., Tomb J.F., et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus
. Nature. 1997;390:364–370. [PubMed] R33. Lecompte O., Ripp R., Puzos-Barbe V., Duprat S., Heilig R., Dietrich J., Thierry J.C., Poch O. Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. Genome Res. 2001;11:981–993. [PubMed] R34. Liu J., Smith C.L., Deryckere D., Deangelis K., Martin G.S., Berger J.M. Structure and function of Cdc6/Cdc18: implications for origin recognition and checkpoint control. Mol. Cell. 2000;6:637–648. [PubMed] R35. Lobry J.R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 1996;13:660–665. [PubMed] R36. Lobry J.R. A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. Biochimie. 1996;78:323–326. [PubMed] R37. Lopez P., Philippe H., Myllykallio H., Forterre P. Identification of putative chromosomal origins of replication in archaea. Mol. Microbiol. 1999;32:883–886. [PubMed] R38. Lundgren M., Andersson A., Chen L., Nilsson P., Bernander R. Three replication origins in Sulfolobus species: synchronous initiation of chromosome replication and asynchronous termination. Proc. Natl. Acad. Sci. USA. 2004;101:7046–7051. [PubMed] R39. Mackiewicz P., Zakrzewska-Czerwinska J., Zawilak A., Dudek M.R., Cebrat S. Where does bacterial replication start? Rules for predicting the oriC region. Nucleic Acids Res. 2004;32:3781–3791. [PubMed] R40. MacNeill S.A. Understanding the enzymology of archaeal DNA replication: progress in form and function. Mol. Microbiol. 2001;40:520–529. [PubMed] R41. Maisnier-Patin S., Malandrin L., Birkeland N.K., Bernander R. Chromosome replication patterns in the hyperthermophilic euryarchaea Archaeoglobus fulgidus and Methanocaldococcus (Methanococcus) jannaschii
. Mol. Microbiol. 2002;45:1443–1450. [PubMed] R42. Marchler-Bauer A., Anderson J.B., Deweese-Scott C., et al. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 2003;31:383–387. [PubMed] R43. Matsunaga F., Norais C., Forterre P., Myllykallio H. Identification of short ‘eukaryotic’ Okazaki fragments synthesized from a prokaryotic replication origin. EMBO Rep. 2003;4:154–158. [PubMed] R44. McLean M.J., Wolfe K.H., Devine K.M. Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes. J. Mol. Evol. 1998;47:691–696. [PubMed] R45. Mrazek J., Karlin S. Strand compositional asymmetry in bacterial and large viral genomes. Proc. Natl. Acad. Sci. USA. 1998;95:3720–3725. [PubMed] R46. Myllykallio H., Lopez P., Lopez-Garcia P., Heilig R., Saurin W., Zivanovic Y., Philippe H., Forterre P. Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon. Science. 2000;288:2212–2215. [PubMed] R47. Ng W.V., Kennedy S.P., Mahairas G.G., et al. Genome sequence of Halobacterium species NRC-1. Proc. Natl. Acad. Sci. USA. 2000;97:12176–12181. [PubMed] R48. Ou H.Y., Guo F.B., Zhang C.T. Analysis of nucleotide distribution in the genome of Streptomyces coelicolor A3(2) using the Z curve method. FEBS Lett. 2003;540:188–194. [PubMed] R49. Pickover C.A. DNA and protein tetragrams: biological sequences as tetrahedral movements. J. Mol. Graph. 1992;10:2–6, 17. [PubMed] R50. Robb F.T., Maeder D.L., Brown J.R., Diruggiero J., Stump M.D., Yeh R.K., Weiss R.B., Dunn D.M. Genomic sequence of hyperthermophile, Pyrococcus furiosus: implications for physiology and enzymology. Methods Enzymol. 2001;330:134–157. [PubMed] R51. Robinson N.P., Dionne I., Lundgren M., Marsh V.L., Bernander R., Bell S.D. Identification of two origins of replication in the single chromosome of the archaeon Sulfolobus solfataricus
. Cell. 2004;116:25–38. [PubMed] R52. Rocha E.P., Danchin A., Viari A. Universal replication biases in bacteria. Mol. Microbiol. 1999;32:11–16. [PubMed] R53. Ruepp A., Graml W., Santos-Martinez M.L., et al. The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum
. Nature. 2000;407:508–513. [PubMed] R54. Salzberg S.L., Salzberg A.J., Kerlavage A.R., Tomb J.F. Skewed oligomers and origins of replication. Gene. 1998;217:57–67. [PubMed] R55. She Q., Singh R.K., Confalonieri F., et al. The complete genome of the crenarchaeon Sulfolobus solfataricus P2. Proc. Natl. Acad. Sci. USA. 2001;98:7835–7840. [PubMed] R56. Slesarev A.I., Mezhevaya K.V., Makarova K.S., et al. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. USA. 2002;99:4644–4649. [PubMed] R57. Smith D.R., Doucette-Stamm L.A., Deloughery C., et al. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J. Bacteriol. 1997;179:7135–7155. [PubMed] R58. Tye B.K. Insights into DNA replication from the third domain of life. Proc. Natl. Acad. Sci. USA. 2000;97:2399–2401. [PubMed] R59. Waters E., Hohn M.J., Ahel I., et al. The genome of Nanoarchaeum equitans: insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA. 2003;100:12984–12988. [PubMed] R60. Woese C.R., Fox G.E. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc. Natl. Acad. Sci. USA. 1977;74:5088–5090. [PubMed] R61. Zhang C.T., Wang J. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 2000;28:2804–2814. [PubMed] R62. Zhang C.T., Zhang R. Analysis of distribution of bases in the coding sequences by a diagrammatic technique. Nucleic Acids Res. 1991;19:6313–6317. [PubMed] R63. Zhang R., Zhang C.T. Z curves, an intutive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 1994;11:767–782. [PubMed] R64. Zhang R., Zhang C.T. Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method. Biochem. Biophys. Res. Commun. 2002;297:396–400. [PubMed] R65. Zhang C.T., Zhang R. An isochore map of the human genome based on the Z curve method. Gene. 2003;317:127–135. [PubMed] R66. Zhang R., Zhang C.T. Identification of genomic islands in the genome of Bacillus cereus by comparative analysis with Bacillus anthracis
. Physiol. Genomics. 2003;16:19–23. [PubMed] R67. Zhang R., Zhang C.T. Multiple replication origins of the archaeon Halobacterium species NRC-1. Biochem. Biophys. Res. Commun. 2003;302:728–734. [PubMed] R68. Zhang C.T., Zhang R. Isochore structures in the mouse genome. Genomics. 2004;83:384–394. [PubMed] R69. Zhang R., Zhang C.T. Identification of replication origins in the genome of the methanogenic archaeon, Methanocaldococcus jannaschii
. Extremophiles. 2004;8:253–258. [PubMed] R70. Zhang R., Zhang C.T. A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I. Bioinformatics. 2004;20:612–622. [PubMed] R71. Zhang C.T., Zhang R., Ou H.Y. The Z curve database: a graphic representation of genome sequences. Bioinformatics. 2003;19:593–599. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Proc Natl Acad Sci U S A. 1977 Nov; 74(11):5088-90.
[Proc Natl Acad Sci U S A. 1977]Cell. 1997 Jun 27; 89(7):995-8.
[Cell. 1997]Proc Natl Acad Sci U S A. 2000 Mar 14; 97(6):2399-401.
[Proc Natl Acad Sci U S A. 2000]Mol Microbiol. 2001 May; 40(3):520-9.
[Mol Microbiol. 2001]FEMS Microbiol Rev. 2003 Jan; 26(5):533-54.
[FEMS Microbiol Rev. 2003]Science. 2000 Jun 23; 288(5474):2212-5.
[Science. 2000]Mol Microbiol. 2002 Sep; 45(5):1443-50.
[Mol Microbiol. 2002]J Bacteriol. 2003 Oct; 185(20):5959-66.
[J Bacteriol. 2003]EMBO Rep. 2003 Feb; 4(2):154-8.
[EMBO Rep. 2003]Proc Natl Acad Sci U S A. 2004 May 4; 101(18):7046-51.
[Proc Natl Acad Sci U S A. 2004]Mol Biol Evol. 1996 May; 13(5):660-5.
[Mol Biol Evol. 1996]Nucleic Acids Res. 1998 May 15; 26(10):2286-90.
[Nucleic Acids Res. 1998]J Mol Evol. 1998 Dec; 47(6):691-6.
[J Mol Evol. 1998]Proc Natl Acad Sci U S A. 1998 Mar 31; 95(7):3720-5.
[Proc Natl Acad Sci U S A. 1998]Gene. 1998 Sep 14; 217(1-2):57-67.
[Gene. 1998]Nucleic Acids Res. 1991 Nov 25; 19(22):6313-7.
[Nucleic Acids Res. 1991]J Biomol Struct Dyn. 1994 Feb; 11(4):767-82.
[J Biomol Struct Dyn. 1994]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]Biochem Biophys Res Commun. 2002 Sep 20; 297(2):396-400.
[Biochem Biophys Res Commun. 2002]Nucleic Acids Res. 1991 Nov 25; 19(22):6313-7.
[Nucleic Acids Res. 1991]J Biomol Struct Dyn. 1994 Feb; 11(4):767-82.
[J Biomol Struct Dyn. 1994]Bioinformatics. 2003 Mar 22; 19(5):593-9.
[Bioinformatics. 2003]Nucleic Acids Res. 1985 May 10; 13(9):3021-30.
[Nucleic Acids Res. 1985]J Bacteriol. 2003 Oct; 185(20):5959-66.
[J Bacteriol. 2003]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]J Bacteriol. 2003 Oct; 185(20):5959-66.
[J Bacteriol. 2003]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Science. 1996 Aug 23; 273(5278):1058-73.
[Science. 1996]Gene. 1998 Sep 14; 217(1-2):57-67.
[Gene. 1998]Mol Microbiol. 1999 May; 32(4):883-6.
[Mol Microbiol. 1999]Mol Microbiol. 1999 Apr; 32(1):11-6.
[Mol Microbiol. 1999]Mol Microbiol. 2002 Sep; 45(5):1443-50.
[Mol Microbiol. 2002]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):383-7.
[Nucleic Acids Res. 2003]Mol Cell. 2000 Sep; 6(3):637-48.
[Mol Cell. 2000]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Mol Microbiol. 2002 Sep; 45(5):1443-50.
[Mol Microbiol. 2002]Science. 1996 Aug 23; 273(5278):1058-73.
[Science. 1996]Gene. 1998 Sep 14; 217(1-2):57-67.
[Gene. 1998]Proc Natl Acad Sci U S A. 2000 Oct 24; 97(22):12176-81.
[Proc Natl Acad Sci U S A. 2000]Genome Res. 2001 Oct; 11(10):1641-50.
[Genome Res. 2001]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]J Bacteriol. 2003 Oct; 185(20):5959-66.
[J Bacteriol. 2003]Proc Natl Acad Sci U S A. 2004 May 4; 101(18):7046-51.
[Proc Natl Acad Sci U S A. 2004]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]J Mol Microbiol Biotechnol. 2002 Jul; 4(4):453-61.
[J Mol Microbiol Biotechnol. 2002]Biochem Biophys Res Commun. 2002 Sep 20; 297(2):396-400.
[Biochem Biophys Res Commun. 2002]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Proc Natl Acad Sci U S A. 2004 May 4; 101(18):7046-51.
[Proc Natl Acad Sci U S A. 2004]Mol Microbiol. 1999 May; 32(4):883-6.
[Mol Microbiol. 1999]Science. 2000 Jun 23; 288(5474):2212-5.
[Science. 2000]EMBO Rep. 2003 Feb; 4(2):154-8.
[EMBO Rep. 2003]Nucleic Acids Res. 2004; 32(13):3781-91.
[Nucleic Acids Res. 2004]Mol Microbiol. 1999 May; 32(4):883-6.
[Mol Microbiol. 1999]Genome Res. 2002 Apr; 12(4):532-42.
[Genome Res. 2002]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]Biochem Biophys Res Commun. 2002 Sep 20; 297(2):396-400.
[Biochem Biophys Res Commun. 2002]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]DNA Res. 1999 Apr 30; 6(2):83-101, 145-52.
[DNA Res. 1999]Nature. 1997 Nov 27; 390(6658):364-70.
[Nature. 1997]Mol Microbiol. 2002 Sep; 45(5):1443-50.
[Mol Microbiol. 2002]Proc Natl Acad Sci U S A. 2000 Oct 24; 97(22):12176-81.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2002 Apr 2; 99(7):4644-9.
[Proc Natl Acad Sci U S A. 2002]Mol Microbiol. 1999 May; 32(4):883-6.
[Mol Microbiol. 1999]Proc Natl Acad Sci U S A. 2004 May 4; 101(18):7046-51.
[Proc Natl Acad Sci U S A. 2004]Mol Microbiol. 2003 May; 48(3):605-15.
[Mol Microbiol. 2003]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Mol Microbiol. 1999 May; 32(4):883-6.
[Mol Microbiol. 1999]Science. 2000 Jun 23; 288(5474):2212-5.
[Science. 2000]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Genome Biol. 2004; 5(3):R17.
[Genome Biol. 2004]Proc Natl Acad Sci U S A. 2003 Oct 28; 100(22):12984-8.
[Proc Natl Acad Sci U S A. 2003]DNA Res. 2001 Aug 31; 8(4):123-40.
[DNA Res. 2001]Nature. 1997 Nov 27; 390(6658):364-70.
[Nature. 1997]J Biol Chem. 1983 Jan 25; 258(2):1318-27.
[J Biol Chem. 1983]Nucleic Acids Res. 1990 Apr 25; 18(8):2163-70.
[Nucleic Acids Res. 1990]J Mol Graph. 1992 Mar; 10(1):2-6, 17.
[J Mol Graph. 1992]J Theor Biol. 1986 Apr 7; 119(3):319-28.
[J Theor Biol. 1986]Biochimie. 1996; 78(5):323-6.
[Biochimie. 1996]Mol Biol Evol. 1996 May; 13(5):660-5.
[Mol Biol Evol. 1996]Nucleic Acids Res. 1998 May 15; 26(10):2286-90.
[Nucleic Acids Res. 1998]Bioinformatics. 2003 Mar 22; 19(5):593-9.
[Bioinformatics. 2003]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]Bioinformatics. 2003 Mar 22; 19(5):593-9.
[Bioinformatics. 2003]Cell. 2004 Jan 9; 116(1):25-38.
[Cell. 2004]Biochem Biophys Res Commun. 2003 Mar 21; 302(4):728-34.
[Biochem Biophys Res Commun. 2003]Extremophiles. 2004 Jun; 8(3):253-8.
[Extremophiles. 2004]Nucleic Acids Res. 2003 Mar 15; 31(6):1780-9.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2000 Jul 15; 28(14):2804-14.
[Nucleic Acids Res. 2000]Bioinformatics. 2004 Mar 22; 20(5):673-81.
[Bioinformatics. 2004]Gene. 2003 Oct 23; 317(1-2):127-35.
[Gene. 2003]Genomics. 2004 Mar; 83(3):384-94.
[Genomics. 2004]