Efficient algorithms for the computational design of optimal tiling arrays

IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):557-67. doi: 10.1109/TCBB.2008.50.

Abstract

The representation of a genome by oligonucleotide probes is a prerequisite for the analysis of many of its basic properties, such as transcription factor binding sites, chromosomal breakpoints, gene expression of known genes and detection of novel genes, in particular those coding for small RNAs. An ideal representation would consist of a high density set of oligonucleotides with similar melting temperatures that do not cross-hybridize with other regions of the genome and are equidistantly spaced. The implementation of such design is typically called a tiling array or genome array. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates. Computing the selection of probes requires multi-criterion optimization, which we cast into a shortest path problem. Standard algorithms running in linear time allow us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer for most problem variants. The solutions to this multi-criterion optimization are spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and trade offs between hybridization parameters, probe quality and tiling density easily. A web application is available at http://tileomatic.org.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA / genetics*
  • DNA Probes / genetics*
  • Molecular Sequence Data
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA Probes
  • DNA