Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2006 May 21;7:263.

Automatic generation of gene finders for eukaryotic species.

Author information

1
Bioinformatics Centre, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen Ø, Denmark. kasper@binf.ku.dk

Abstract

BACKGROUND:

The number of sequenced eukaryotic genomes is rapidly increasing. This means that over time it will be hard to keep supplying customised gene finders for each genome. This calls for procedures to automatically generate species-specific gene finders and to re-train them as the quantity and quality of reliable gene annotation grows.

RESULTS:

We present a procedure, Agene, that automatically generates a species-specific gene predictor from a set of reliable mRNA sequences and a genome. We apply a Hidden Markov model (HMM) that implements explicit length distribution modelling for all gene structure blocks using acyclic discrete phase type distributions. The state structure of the each HMM is generated dynamically from an array of sub-models to include only gene features represented in the training set.

CONCLUSION:

Acyclic discrete phase type distributions are well suited to model sequence length distributions. The performance of each individual gene predictor on each individual genome is comparable to the best of the manually optimised species-specific gene finders. It is shown that species-specific gene finders are superior to gene finders trained on other species.

PMID:
16712739
PMCID:
PMC1522026
DOI:
10.1186/1471-2105-7-263
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center