• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Feb 15, 1998; 26(4): 1107–1115.
PMCID: PMC147337

GeneMark.hmm: new solutions for gene finding.

Abstract

The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.

Full Text

The Full Text of this article is available as a PDF (130K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. [PubMed]
  • Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995 Oct 20;270(5235):397–403. [PubMed]
  • Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [PubMed]
  • Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 1996 Nov 15;24(22):4420–4449. [PMC free article] [PubMed]
  • Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, et al. The complete genome sequence of Escherichia coli K-12. Science. 1997 Sep 5;277(5331):1453–1462. [PubMed]
  • Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997 Aug 7;388(6642):539–547. [PubMed]
  • Smith DR, Doucette-Stamm LA, Deloughery C, Lee H, Dubois J, Aldredge T, Bashirzadeh R, Blakely D, Cook R, Gilbert K, et al. Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J Bacteriol. 1997 Nov;179(22):7135–7155. [PMC free article] [PubMed]
  • Kunst F, Ogasawara N, Moszer I, Albertini AM, Alloni G, Azevedo V, Bertero MG, Bessières P, Bolotin A, Borchert S, et al. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature. 1997 Nov 20;390(6657):249–256. [PubMed]
  • Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature. 1997 Nov 27;390(6658):364–370. [PubMed]
  • Gelfand MS. Prediction of function in DNA sequence analysis. J Comput Biol. 1995 Spring;2(1):87–115. [PubMed]
  • Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol. 1989;51(1):79–94. [PubMed]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed]
  • Baldi P, Chauvin Y, Hunkapiller T, McClure MA. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063. [PMC free article] [PubMed]
  • Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994 Nov 11;22(22):4768–4778. [PMC free article] [PubMed]
  • Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997 Apr 25;268(1):78–94. [PubMed]
  • Henderson J, Salzberg S, Fasman KH. Finding genes in DNA with a Hidden Markov Model. J Comput Biol. 1997 Summer;4(2):127–141. [PubMed]
  • Link AJ, Robison K, Church GM. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis. 1997 Aug;18(8):1259–1313. [PubMed]
  • Médigue C, Rouxel T, Vigier P, Hénaut A, Danchin A. Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol. 1991 Dec 20;222(4):851–856. [PubMed]
  • Lawrence JG. Selfish operons and speciation by gene transfer. Trends Microbiol. 1997 Sep;5(9):355–359. [PubMed]
  • Lukashin AV, Engelbrecht J, Brunak S. Multiple alignment using simulated annealing: branch point definition in human mRNA splicing. Nucleic Acids Res. 1992 May 25;20(10):2511–2516. [PMC free article] [PubMed]
  • Hayes WS, Borodovsky M. Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. Pac Symp Biocomput. 1998:279–290. [PubMed]
  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. [PMC free article] [PubMed]
  • Borodovsky M, McIninch JD, Koonin EV, Rudd KE, Médigue C, Danchin A. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 1995 Sep 11;23(17):3554–3562. [PMC free article] [PubMed]
  • Sacerdot C, Dessen P, Hershey JW, Plumbridge JA, Grunberg-Manago M. Sequence of the initiation factor IF2 gene: unusual protein features and homologies with elongation factors. Proc Natl Acad Sci U S A. 1984 Dec;81(24):7787–7791. [PMC free article] [PubMed]
  • Missiakas D, Georgopoulos C, Raina S. The Escherichia coli heat shock gene htpY: mutational analysis, cloning, sequencing, and transcriptional regulation. J Bacteriol. 1993 May;175(9):2613–2624. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...