![]() | ![]() |
Formats:
|
||||
Copyright ©2007 Bentham Science Publishers Ltd. Recent Computational Approaches to Understand Gene Regulation: Mining Gene Regulation In Silico 1MRC-BSU Robinson Way, Cambridge, UK 2Wellcome Trust/Cancer Research UK Gurdon Institute of Cancer and Developmental Biology, Cambridge, UK 3Leeds University, Leeds, UK *Address correspondence to this author at the Biostatistics Unit, MRC, Institute of Public Health, Robinson Way, Cambridge, CB2 0SR, UK; E-mail: irina.abnizova/at/mrc-bsu.cam.ac.uk Received November 30, 2006; Revised December 13, 2006; Accepted December 15, 2006. Abstract This paper reviews recent computational approaches to the understanding of gene regulation in eukaryotes. Cis-regulation of gene expression by the binding of transcription factors is a critical component of cellular physiology. In eukaryotes, a number of transcription factors often work together in a combinatorial fashion to enable cells to respond to a wide spectrum of environmental and developmental signals. Integration of genome sequences and/or Chromatin Immunoprecipitation on chip data with gene-expression data has facilitated in silico discovery of how the combinatorics and positioning of transcription factors binding sites underlie gene activation in a variety of cellular processes. The process of gene regulation is extremely complex and intriguing, therefore all possible points of view and related links should be carefully considered. Here we attempt to collect an inventory, not claiming it to be comprehensive and complete, of related computational biological topics covering gene regulation, which may en-lighten the process, and briefly review what is currently occurring in these areas. We will consider the following computational areas: o gene regulatory network construction; o evolution of regulatory DNA; o studies of its structural and statistical informational properties; o and finally, regulatory RNA. Key Words: Gene expression, transcription factors, transcriptional regulation, computational methods, data integration. 1. INTRODUCTION Transcription regulation has been responsible for organ-ismal complexity and diversity in the course of biological evolution and adaptation, and it is determined largely by the context-dependent complicated behaviour of cis-regulatory elements and transcription factors (TFs) binding to them [1,2]. Initiation of transcription in higher organisms requires binding of multiple transcription factor molecules to transcription regulatory regions, such as promoters and enhancers. Not very much is known about regulation of transcription in eukaryotes [3–6]. To start with, let us try to make an inventory of known transcription factors and their binding sites, also called cis-regulatory elements. It is estimated that there are around 2000 TFs in the mammalian genomes [7,8] and around 1000 in the flies and worms [9]. However, only for a minority of the TFs any known information on binding sites or interacting protein partners is available currently [10]. DNA-binding site models are available for around 500 vertebrate TFs, and for approximately 3000 genes we have the information about approximately 5000 binding sites. It means [5] that the total number of such sites in the higher organism genomes could be at least an order of thousands or more. Thus, our knowledge of the TFs, and especially their binding sites and regulated genes, is severely limited at this present time. Nowadays, we are given a number of full genome sequences and advancements in high-throughput technologies. Thus, in silico methods should be used to integrate diverse data sources toward unravelling the sophisticated and complex nature of transcriptional regulation. From a computational point of view, one may try to understand transcription regulation directly,by inferring Genetic Regulatory Networks (GRN) from gene and protein interactions. We will describe GRN construction briefly in the Section 1.
Indirect approaches to understand gene regulation will include the studies of: (i) regulatory DNA evolution; (ii) statistical, information and structural properties of regulatory regions as well as (iii) studying of regulatory RNA. Sections 2,3 and 4 are dedicated to reviewing what is done in the corresponding areas. 2. GENE REGULATORY NETWORKS AND THEIR BUILDING BLOCKS 2.1. Overview of Gene Regulatory Networks By definition, a gene regulatory network (also called a GRN ) is a collection of DNA segments in a cell which interact with each other and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA. In GRN (for a detailed review see [11,12]), genes are nodes in a complex network, with inputs being proteins such as transcription factors, and outputs being the level of gene expression. The gene networks are only beginning to be understood, and it is a next step for biology to attempt to deduce the functions for each gene “node”, and to assist in modelling of cell behaviour. Mathematical models of GRNs have been developed to allow predictions of the models to be tested. Various modelling techniques have been used, including Boolean networks [13–15], Petri nets [16,17], Bayesian networks [18–21], graphical Gaussian models [22–24] and sets of differential equations [25,26]. Some recent advances in GRN construction and application are well presented in a number of excellent reviews: [27] for yeast and fly, [28] for sea urchin development, [29] for comparative genomic applications. There is an elegant approach to study GRNs statistically [30] by computing significant regulatory network motifs. 2.2. Gene Networks Construction in Eukaryotes by In Silico Methods Understanding and constructing of complex gene regulatory networks in higher organisms is an extremely difficult task. In contrast to prokaryotes, where transcriptional regulation can be understood in terms of induction by single factors, the regulation in eukaryotes is mainly carried out by sophisticated interactions of multiple transcription factors. Additionally, the regulatory sites are distributed over large regions of the genome including intronic sequences [31,32]. It is rare that individual binding sites are strongly conserved, only the combinatorial action gives rise to a specific control. Thus, computational methods should be used to infer gene regulation and to integrate data for GRN construction. GRNs are Constructed Based on Different Diverse Sources of Information: a) Transcription Factor Binding Information The signals that determine activation and repression of specific genes in response to appropriate stimuli are one of the most important, but least understood, types of information encoded in genomic DNA. These signals, the Transcription Factor Binding Sites (TFBS), constitute an important building block of GRN. Therefore a vast number of TFBS motif search algorithms are developed. The detailed review of motif discovery methods is within the scope of this paper, so we would recommend reading the following reviews on the subject: [27, 33–42]. Some reviews focus on specific techniques such as phylogenetic footprinting [43], or on specific genomes [44]. The comparison of alternative approaches to motif discovery is discussed in the excellent works [45,46]. b) Using Information from Micro-Array Data To improve TFBS predictions and GRN construction [47], groups of co-regulated genes inferred from expression profiling [48,49] are considered as a search field for putative
TFBS motifs. In [50] they build an algorithm to search for combinations of transcription factor binding sites that are enriched in a set of potentially co-regulated genes with respect to the whole genome. There was proposed ‘motif re-gressor’ algorithm [51] for discovering sequence motifs upstream of genes that undergo expression changes in a given condition. In this work the authors integrated motif discovery and genome-wide expression analysis. c) Comparative Genomics Comparative genomicsis currently a powerful tool for searching for phylogenetically conserved regions and elements [52–56]. These elements then submitted to GRN construction. In the [57] an approach to present a comparative analysis of the human, mouse, rat and dog genomes to create a systematic catalogue of common regulatory motifs in promoters and 3’ untranslated regions (3’ UTRs) is done. Recently, phylogenetic shadowing approach [58,59] was developed and applied to compute and statistically evaluate conservation profiles of multiple sequence alignments from closely related species. This statistical method permitted the accurate prediction transcriptional regulatory elements in human–primate comparisons, and validated the use of comparative genomic approach for deciphering primate-specific functional DNA sequences. d) Genome Organisation: cis-Regulatory Modules Prediction Several groups [6, 60–63] have proposed that dense clusters of motifs may diagnose regulatory regions more accurately. An approach based on e-clustering of binding sites [63] within cis-regulatory modules (CRMs) is studied in [64]. Searching for co-occurrence of binding sites that form regulatory modules has been shown to be an effective approach to increasing prediction specificity without losing sensitivity [65]. The new method [66] relying on the principle that CRMs generally contain several phylogenetically conserved binding sites for a few different TFs, allows the prediction of a vast amount of CRMs within the human ge-nome. A subset of these is shown to be bound in vivo by TFs using ChIP (Chromatin Immunoprecipitation) on chip experiments (the ChIP-on-chip technology is briefly introduced later in the subsection 2.2 i). Their analysis reveals, among other things, that CRM density varies widely across the genome, with CRM-rich regions often being located near genes encoding transcription factors involved in development. Predicted CRMs show a surprising enrichment near the 3’ end of genes and in regions far from genes. e) Combinatorial Interaction of Transcription Factors Many cis-regulatory elements occur in combinatorial manner [67–69]. The TFs that act synergistically are likely to have their cis-elements co-localised on the genome at specific distances apart. Therefore a number of methods use this combinatorial information to improve the TFBS prediction accuracy [70,71]. Some methods integrate combinatorial information with other data sources: thus, in [73] a computational search for functional transcription-factor (TF) combinations using phylogenetically conserved sequences and mi-croarray-based expression data is developed. f) Context Dependent Behaviour of cis-Regulatory Elements Some methods utilise a context dependent behaviour of cis-regulatory elements [74,75]. Since sequence-dependent DNA structure may play a role in protein-DNA interactions, the exploration of sequence-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered. g) Integrating of Several Available Tools for Transcription Regulation Studies Currently, the most advanced methods tend to integrate multiple computational and experimental information, such as [72,76,77]. In [77] the authors integrate ChIP-on-chip, motif and microarray data. With the seqVISTA [76] tool it is possible to integrate a bunch of gene regulation analysis software (including eight web servers) that is scattered over the Internet. h) Metabolic Pathways Metabolic biological pathway is a series of chemical reactions occurring within a cell, catalysed by enzymes. Often, the initiation of another metabolic pathway occurs as a result. Network representations of biological pathways [78] offer a functional view of molecular biology that is different from and complementary to sequence, expression, and structure databases. There is currently available a wide range of digital collections of pathway data, differing in organisms included, functional areas covered, details of modelling and support for dynamic pathway construction. Databases [79] that represent pathway data at the level of individual interactions make it possible to combine data from different pathways and to query by network connectivity. i) ChIP- on- Chip Experiment Information Usage The results of ChIP(Chromatin Immunoprecipitation) on chip experiments are utilised in [80–82] . When recently the genome-wide binding analysis like ChIP-on-chip experiments have appeared, it significantly increased the chance for more reliable identification of actual binding site regions [83,81]. The short description of the Chip-on-chip technology can be found in [84,85]. Several motif finding and network discovery algorithms are developed based on this new type of data. MDscan was designed for motif finding from sequences obtained from a ChIP-on-chip experiment, using a semi-Bayesian scoring function [86]. The improved version of MDscan, Motif Re-gressor [87], identifies a set of non-redundant candidate motifs using MDscan following by linear regression analysis. Then the motifs whose promoter matching scores are significantly correlated with ChIP-on-chip enrichment or downstream gene expression values are selected. However, the Motif Regressor is not widely used due to its computational intensity. The computational modelling of genome binding events based on Chip-on-chip data has been extensively developed in [84,81,82,88–90]. Recently, a new method [84] was presented, Joint Binding Deconvolution. This probabilistic method combines additional experimental data about Chromatin Immunoprecipitation and sequence information to improve the spatial resolution of the transcription factor binding locations. Several methods [81,90,91,77] try to combine different kinds of data: [81,90] use the GRAM (Genetic Regulatory Modules) algorithm [81] to cluster genes based on both ChIP-data and gene expression data. Another recently developed computational approach, ‘ReMoDiscovery’ [77], exploits in a concurrent way three independent data sources: ChIP-chip data, motif information and gene expression profiles. 3. EVOLUTION STUDIES OF REGULATORY DNA SEQUENCES 3.1. Evolution of cis-Regulatory Sequences Gene expression is central to the genotype-phenotype relationship in all organisms, and it is an important component of the genetic basis for evolutionary change in diverse aspects of phenotype. However, the evolution of transcriptional regulation remains understudied and poorly understood. No general framework exists for understanding, interpreting, and predicting how transcription evolves. The first comprehensive attempts to do so are presented in [4,92]. In [4] the detailed study of evolutionary patterns within regulatory regions and elements in terms of macro and micro changes and their rates is presented. 3.2. Comparative Genomics and Types and Rates of Evolution within Conserved Non-coding Elements In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA contains important elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. It is not easy to recognise these elements and study their molecular evolution and function from the sequence alone [93–95]. In the general case where the transcription factor target sites are not known in advance, interspecific sequence comparison is now the method [92] of choice for physically identifying putative cis-regulatory modules in the intronic or in-tergenic DNA sequence of given animal genes. These key regulatory units [96] of the genome are evolutionarily conserved relative to flanking sequence. Thus, cis-regulatory modules can be detected computationally by interspecific comparison of the sequence surrounding the gene of interest, recognised as a block of sequence that has remained relatively similar between the two or more species. Following this [92] suggestion, several highly conserved non-coding sequences were identified in vertebrate [97–99] genomes recently. When some of these sequences were tested in vivo, indeed, the majority appeared to drive tissue-specific gene expression during early development. However, several studies indicate that the system is more complicated, with regulatory regions having an underlying pattern of evolution not directly visible from simple sequence comparisons [100–102]. To overcome this problem they propose [103] an approach to the study of the rate of evolution of functional non-coding sequence elements (CNEs) at a macro-evolution- ary scale by the means of cross-genomic comparison of two outgroup species. They conclude that the proposed method can be used for testing hypotheses about the rate and pattern of evolution of putative cis-regulatory elements. When speaking about evolution types, a higher eukary-otic genome, such as human, is often considered as consisting of three sequences types, each distinguished by their mode of evolution. Purifying selection is estimated to act on 2.5–5.0% of the genome [104], whereas virtually all remaining sequences are considered to have evolved neutrally and to be not functional. The third mode of evolution, positive selection of advantageous changes, is considered rare. In [104] the authors reviewed the evolutionary evidence for the majority of human-conserved DNA lying outside of the protein-coding sequence. The authors argue that within this non-coding fraction lies at least 1 Mb of functional sequence that has accumulated many beneficial nucleotide replacements, suggesting the possibility that a significant proportion of human sequence has evolved adaptively and thus has diverged by a greater extent than expected from neutral evolution. The observation of adaptive evolution pattern on Drosophila conserved non-coding DNA was confirmed in [105]. Nowadays, we are given enough sequences from related organisms, thus it is possible to detect positive selection occurring only in a portion of a gene in a subset of the sequences. In [106] they have used this comparative approach to understand the adaptations that E. coli has made to colonise and survive in the urinary tract. One can compare regulatory mechanisms between different species by means of genome comparison. Studies of conserved non-coding sequences [107], which are putative regulatory elements, indicate that plants have far fewer CNEs per gene than mammals, suggesting that plants have less complex regulatory mechanisms. 3.3. Regulation and Phenotypic Changes A considerable proportion of heritable human phenotypic variation (disease) is thought to result from altered gene expression. Not only single point polymorphism mutations (SNPs) within coding regions are associated with disease: there is strong experimental and association study’s evidence that SNPs within regulatory regions may cause disease [108–111]. Several reviews have argued that changes in transcriptional regulation constitute a major component of the genetic basis for phenotypic evolution [1, 112–116]. There have been also demonstrated by combining experimental evidence and computation that the promoter regions of human genes provide a rich source of functional single nucleotide polymorphisms [117,118]. As many as 35% of promoter SNPs may be of functional significance [118]. However, their identification and evaluation is likely to be not easy [119–121]. Progress in characterising regulatory SNPs has been limited by our relative inability to discriminate between functional and non-functional SNPs. Existing bioinformatics tools such as PupaSNP Finder (http://www.pupasnp.org/) [122] and rSNP_Guide http://wwwmgs.bionet.nsc.ru/mgs/ systems/rsnp/) [123] attempt to do this either by assessing the degree of evolutionary conservation displayed by the variant site or by trying to estimate whether or not the polymorphic variant disrupts transcription factor binding. Both the TRANSFAC database [124] and an annotated database of human promoter SNPs [125,126] have usually been used to assess the potential effect of a given polymorphic variant on transcription factor binding and gene expression. There are, however, currently no computational tools which can be used to assess directly from promoter DNA sequence whether or not a given variant is likely to alter gene expression and hence be of functional significance. In [127] an attempt to define some of the characteristics of promoter polymorphisms with functional effects on gene expression has been made. Sequence variants that altered gene expression by 1.5-fold or more were strongly biased toward a location in the core and proximal promoter regions. A recent [128] study aimed at an approach to identify DNA sequence features that could allow in silico estimation of the likely functional consequences of single nucleotide changes in human gene promoter regions. 3.4. Regulation and Duplication The idea that gene duplication has a major role in evolution was developed by Susumu Ohno in his classical book “Evolution by gene duplication” (1970) and is now widely accepted as a major evolutionary force [129].
Gene duplication occurs when an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome leads to the duplication of a region of DNA containing a gene [130]. As a result, one copy of the duplicate set of genes is often freed from selective pressure. The duplicate gene may either acquire mutations that lead to a gene with a novel function or acquire deleterious mutations and become a pseudogene. It is known [130,131] that regulatory elements participate in organism evolution. Thus, after gene duplication these ele- ments affect sub-functionalisation and neo-functionalisation processes. Gene duplication also contributes to the evolution of gene regulatory networks. Comparative genome analyses reveal a surprising constancy in genetic content: vertebrate genomes have only about twice the number of genes that invertebrate genomes have, and the increase is primarily due to the duplication of existing genes rather than the invention of new ones. It is suggested [132] that organismal complexity arises from progressively more elaborate regulation of gene expression. Thus, in [133] the authors propose a model whereby gene duplication and the evolution of cis-regulatory elements can be considered as responsible for morphological diversity and the emergence of the modern vertebrate body plan. 4. INFORMATION, STATISTICAL AND STRUCTURAL PROPERTIES OF REGULATORY REGIONS 4.1. Periodicities, Repeats and Information Content within Regulatory DNA Regulatory Motifs are Often Placed Periodically within Regulatory DNA, the Periods Reflecting DNA 3D Structure Multiple binding motifs and even multiple binding sites for the same motif presented in the regulatory regions are often described as ‘regulatory clusters’ [6,60,61,133–135]. Statistical models, based on motif clustering, are helpful for finding novel CRMs in the genome, but very often they consider only site density (cluster significance) [68] and relative site affinity (such as a weighted matrix score) [135]. However, it is known that specific arrangements of binding motifs within the regulatory regions are necessary to achieve proper biological function. The biological reasons leading to a specific arrangement of sites in promoters are clear: the transcription factors, bound to promoter DNA, are also involved in specific protein–protein interactions [136–137]; therefore, the binding motifs must be distributed in the promoter in a non-random fashion. In other words, the arrangement of binding motifs can control the formation of 3D protein complexes involved in initiation of specific transcription. One of the Most Well Known Periods Within Regulatory DNA is 10-11 bp Specific arrangements between binding sites are known from many examples in biology [138,139]. In vitro analysis of binding site arrangements in the rat collagenase-3 promoter [140] has revealed that a 10 bp (‘helical’) phasing in binding site distribution provides maximal transcriptional activity. The importance of the ‘helical phasing’ and specific binding site arrangement was also demonstrated in vivo for the murine CD4 promoter [141]. The ‘helical phasing’ (10 bp) has also been demonstrated computationally [142] using a large number of proximal eukaryotic promoters [143] and the list of binding motifs available from the TRANSFAC database [9]. In [144] they explored distance preferences in the arrangement of binding motifs for five transcription factors (Bicoid, Krüppel, Hunchback, Knirps and Caudal) in a large set of Drosophila cis-regulatory modules (CRMs). Analysis of non-overlapping binding motifs revealed the presence of periodic signals specific to particular combinations of binding motifs. The most striking periodic signals (10 bp for Bi-coid and 11 bp for Hunchback) suggest preferential positioning of some binding site combinations on the same side of the DNA helix. The authors also analysed distance preferences in arrangements of highly correlated overlapping binding motifs, such as Bicoid and Krüppel. Genomic DNA sequences contain a wealth of information about the bendability and curvature of the DNA molecule. For example, the well-known 10-11 bp periodicities within genomes can be attributed to supercoiled structures or wrapping around nucleosomes. In [145] study, the authors analyse correlation functions of relatively long motifs such as tetramers or poly(A) sequences. Periodically placed motifs may indicate regular protein binding or curvature signals. The authors detected various periodic signals e.g. strong 10-11 bp oscillations of periodically placed poly(A), poly(T) or poly(W) stretches. Composite Elements The very successfull side effect of site arrangement studies is the concept of composite elements (CEs) [146,147]. In the simplest case, a CE corresponds to a pair of individual binding motifs located at a particular distance and involved in formation of specific tertiary (DNA–protein–protein–DNA) complexes. Identical CEs may perform related functions in different genes. Further development of this concept resulted in construction of a dedicated database TRANS Compel [148], combining sequences for 256 CEs from different organisms. Currently, the CE concept is widely used for finding co-localised, synergistic (antagonistic) binding motif pairs [149,150] or combinatorial arrays of motifs responsible for the formation of similar gene expression profiles [151–153]. Tandem Repeats and Regulatory Elements Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. Using TandemSWAN tool [154] have compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5–7 and 11–14 were characteristic of the enhancer regions and other non-coding regions close to genes. Information Content Information content (IC, also called relative entropy, see [155]) commonly used to measure a TFBS motif affinity and significance [155–157]. Statistical methods for computing the P-value of IC have been defined in [158,159]. Several algorithms also use the IC measure to identify optimal motifs from input sequences [160–163]. 4.2. Statistics of Regulatory DNA We unavoidably use statistics to describe and model regulatory DNA and its activity. From biological observations and computational studies [92,4] we know about some statistical properties of regulatory sequences: binding motif over-representation, motif spatial distribution, combinatorial nature and context –dependent behaviour of regulatory elements. It is also known [4] that these properties probably originated from the evolutionary processes governing the development of transcription. Generally, when we do not have clear understanding of some process (namely, transcription regulation) we attempt to describe it statistically and learn from known examples. In this subsection we briefly review computational approaches to model regulatory DNA sequence and capture its statistical properties. These statistical models and properties are utilised further for different purposes: motif discovery, regulatory modules search, constructing evolutionary models and GRNs, etc. Some of approaches were already mentioned in the previous subsections. A lot of statistics are developed to reflect different biological aspects of TFBS motifs: their internal position dependencies [164], their spatial distribution within regulatory regions [165,166] and genome-wide [6,61,133]. 4.2.1. Models of TF Binding Motifs and their Statistics Genomic regulatory elements are commonly represented by DNA motifs. Recurrent motifs in a collection of DNA sites are most frequently modelled by sequence patterns (also called regular expressions), or by position weight matrices (PWMs, also called profiles and position-specific scoring matrices, PSSMs). The DNA sequence patterns are simpler in representation, and have an advantage in computing their significance in the genome using statistical methods. The PWMs are able to capture information on the variability of a collection of DNA sites in a quantitative manner, which is not possible with the DNA patterns. The most important practical application of these models is their usage in scanning DNA sequences for new regulatory element candidates. Currently, there are two main databases containing information on TFs binding site profiles [9,167]. JASPAR [167] contains a smaller set that is non-redundant , while TRANS-FAC [9] contains multiple profile models for some TFs. Biological Issues Relating to the Predictions of Functional TF-Binding Sites Site predictions with the PWMs and DNA patterns can suffer from high false positive rates if motifs are degenerate. Although there are limitations, the PWM models work fairly well in representing the specificity of DNA-binding sites and predicting TF-binding probability to a given site [168,169]. Some TFs are by nature moderately or poorly specific in their DNA binding and achieve higher specificity only in the context of other binding partners [170,171]. One also has to remember that the chromatin structure and DNA methylation play important roles in gene regulation [172–174]. Positional Dependencies Within TFBS The simple PWM approach assumes independent contributions from each position within a DNA site towards the binding free energy of the TF (mononucleotide model). The score of an aligned substring is the log-likelihood of the substring under a product multinomial distribution. PWM scores can also be described in a physical framework as the sum of binding energies for all nucleotides aligned with the PWM [175,176]. This has been shown not to hold true in many (around 25%) situations [164,177,178]. In such cases, higher-order models are better representations and give more accurate predictions when searching for candidate sites [164,178,179, 181]. Most of the higher-order models [164,178–183] assume di-nucleotide interactions, which requires less information [9] in comparison with higher orders, and has been proven by experimental data [177,178]. Many different position-dependent extensions to the basic PWMs have been proposed in the literature [164,184, 185]. Some authors [186] model probabilistic motifs as n’th order Markov chains, and furthermore, a variable-length Markov model (VLMM) [187] in case of varying dependencies within the motif is proposed. 4.2.2. CRM Statistics Different statistics are typically measured within cis-regulatory modules: o significant maximal distance between single motifs [189–193]. In some works [63,194,195] they model intermotif distance with Poisson distribution. A distance between motifs is expected to be constant (conserved) within regulatory modules [133,196]; o combination of single motif occurrences [197–199, 133] and their inter-motif distances [68,190,194,75, 200]. Multiple motif occurrences [201–202,191,51,63] when presence of multiple TFBS and their interactions are assumed; o density of motifs in a window [68]. They apply a logistic regression model to assess the motif combination significance Because regulatory regions may consist of different CRMs, they compute 4.2.3. Genome-wide Statistics o genome-level over-representation of a motif. They compute genome-wide motif frequency as a background expectation [64,205–208] , and look for genes around which they observe more motifs than expected; o over-representation of motif combinations genome-wide [209]. 4.2.4. Correspondence with Experimental Data o correspondence between micro-array data and combination of single motifs and modules [210–214]. Sometime this correspondence has a form of joint likelihood [213]; o correspondence between motif over-representation and cross-species conservation [215]. 4.2.5. Statistical Motif Discovery Algorithms Usually, the statistical findings mentioned are utilised in statistical motif discovery algorithms. These algorithms are typically based either on iterative refinement or stochastic optimisation. Expectation maximisation (EM) [216–220] is the most widely used iterative refinement method, but variable EM [194] has also been used. The stochastic optimisation technique most commonly used for motif discovery is Gibbs sampling [221–225], sometimes combined with general Metropolis-Hastings [222,226,227]. Recently, genetic algorithms [228], evolutionary Monte Carlo [226] and simulated annealing [229,211,215] has also been developed and successfully applied by bioinformatic community. 4.3. Regulation and Structural DNA Properties The use of structural information [230–236] has a lot of potential for assisting with regulatory region prediction. For example, methods to predict areas of helix destabilisation are developed to help in reducing false positives [237] and distance correlations between large sets of elements that have been used to identify over-represented correlations without the need for training [238]. The three-dimensional structure of DNA, densely packed as chromatin, inhibits transcriptional initiation in vivo [239]. The bendability of a region, as well as its position in DNA loops, may indicate whether it contains regulatory elements or not. Several works take DNA structure into consideration, such as [75] and [240,241] where helical parameter features are studied. However, structural approaches have been limited in their general use so far, since derivation of the quantitative structural parameters is dependent on the small number of solved protein–DNA complex structures. The distance between the transcription start site (TSS) and translation start site (TLS) has been explicitly utilised in promoter prediction algorithms [242]. Some algorithms informally incorporate this information by restricting their search areas using the TLS as a reference [243] while others incorporate probability distributions. One also has to remember that the chromatin structure and DNA methylation play important roles in gene regulation [172–174]. Large portions of the chromosomal DNA are sequestered by histones forming part of the nucleosomal structure, and are therefore not accessible for binding by the TFs. DNA methylation can inhibit interaction of the regulatory proteins with cognate DNA sites and also influence the chromatin structure. While doing genome-wide searches for putative binding sites using a motif model, one typically does not know the chromosomal regions that are open for the regulatory proteins to bind with, or the binding partners for a given TF. A blind search of the entire genome without such information usually returns a large number of sites, many of which would probably bind to the TF if the DNA sequences were open for binding, but are biologically non-functional in vivo. Regulatory sequences may often be outside of the nu-cleosome structure, or at least available a part of the time due to chromatin remodelling [244]. So the rate of nonfunctional site predictions in these sequences is likely to be lower than other parts of the genome. However, without the information on DNA availability, methylation status or other binding partners, the binding site predictions with individual models are likely to produce a lot of false positives. The genetic context of a regulatory element is important for its activity, and presence of CpG-islands may be relevant factors. Both high GC content and presence of CpG-islands may indicate [245] that a region contains regulatory elements. 5. REGULATORY RNA Not only cis-regulatory elements participate in regulation of gene transcription. It is now evident [246–250] that non-protein coding RNA (ncRNA) plays a critical role in regulating the timing and rate of protein translation. Large scale cDNA sequencing and genome tiling array studies have shown [248] that around 50% of genomic DNA in humans is transcribed, of which 2% is translated into proteins and the remaining 98% is non-coding RNAs (ncRNAs). Today, we know very little about the regulatory mechanisms and functions of these ncRNAs. Indeed, recent evidence [250] suggests that the majority of the genomes of mammals and other complex organisms is in fact transcribed into ncRNAs, many of which are alternatively spliced and processed into smaller products. These ncRNAs include microRNAs and snoRNAs (many if not most of which remain to be identified), as well as likely other classes of yet-to-be-discovered small regulatory RNAs, and tens of thousands of longer transcripts (including complex patterns of interlacing and overlapping sense and an-tisense transcripts), most of whose functions are unknown. These RNAs (including those derived from introns) appear to comprise a hidden layer of internal signals that control various levels of gene expression in physiology and development, including chromatin architecture and epigenetic memory, transcription, RNA splicing, editing, translation and turnover. Regulatory RNA may play a significant role in disease occurrence and to be responsible for a large amount of genetic variation within and between species. The potential importance of ncRNAs is suggested by the observation that the complexity of an organism is poorly correlated with its number of protein coding genes, yet highly correlated with its number of ncRNA genes, and that in the human genome only a small fraction (2-3%) of genetic transcripts are actually translated into proteins. In the review [246] several examples of known RNA mechanisms for the regulation of protein synthesis are discussed. In eukaryotes ncRNAs have been shown [247] to operate on virtually every level of transmission of genetic information. Recently, in [57] by means of cross-genome comparison regulatory motifs for microRNA gene are discovered. In this work a comparative analysis of the human, mouse, rat and dog genomes is done to create a systematic catalogue of common regulatory motifs in promoters and 3’ untranslated regions (3’ UTRs). Nearly one-half of 3’-UTR motifs, which are involved in post-transcriptional regulation, are associated with microRNAs (miRNAs), leading to the discovery of many new miRNA genes and their likely target genes. Their results suggest that previous estimates of the number of human miRNA genes were low, and that miRNAs regulate at least 20% of human genes. 6. CONCLUSIONS Significant progress in transcription regulation understanding in silico has been achieved, however, much more research is required. The field of gene expression regulation brings together researchers from several disciplines, in particular from biology, statistics and informatics. Additionally, research in the field is fairly recent and moving at a fast pace. This has resulted in a broad range of computational methods that are described with different vocabulary and different foci, different ways regarding the integration of experimental information. It is not easy to navigate within and understand this information. Clearly, significant advances have been made over the past two and half decades in methods of the DNA regulatory elements representation, modelling and identification. However, our knowledge of the transcriptional regulation of gene expression, its variation and evolution is still limited. It is hoped that integration of techniques and experiences across from existing approaches will give rise to refined and advanced methods with higher level of regulatory mechanism understanding than what we have seen so far. ACKNOWLEDGEMENTS We are thankful to Graham Ellis for support and to Rene te Boekhorst for hard work with references. REFERENCES 1. Davidson E. Genomic Regulatory Systems. Academic Press; 2001. 2. Arnone M, Davidson EH. The hardwiring of development: organization and function of genomic regulatory system. Development. 1997;124:1851–64. [PubMed] 3. Dermitzakis M, Reymond A, Antonarakis S. Conserved non-genic sequences-an unexpected feature of mammalian genomes. Nat Rev Genet. 2005;6:151–157. [PubMed] 4. Wray GA, Hahn M, Abouheif E, Balhoff J, Pizer M, Rockman M, Romano L. The Evolution of Transcriptional Regulation in Eukaryotes. Mol Biol Evol. 2003;20(9):1377–1419. [PubMed] 5. Yuh C, Bolouri H, Davidson E. Genomic cis-regulatory logic: functional analysis and computational model of a sea urchin gene control system. Science. 1998;279:1896–1902. [PubMed] 6. Berman B, Nibu Y, Pfeiffer P, Tomancak P, Celniker S, Rubin G, Levine M, Eisen M. Exploiting TFBS clustering to identify CRM involved in pattern formation in Drosophila genome. Proc Natl Acad Sci USA. 2002;99(2):757–762. [PubMed] 7. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, low K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O’Connor MJ, Okazaki Y, Oliver K, Larty EO, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Strange-Thomann NS, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Vidal AU, Vinson JP, von, Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. [PubMed] 8. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de, la, Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de, Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed] 9. Matys V, Fricke E, Geffers R, Gößling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Münch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E. TRANSFAC®: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003;31:374–378. [PubMed] 10. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. [PubMed] 11. Bower J, Bolouri H, editors. Computational Modeling of Genetic and Biochemical Networks, Computational Molecular Biology Series. MIT Press; 2001. 12. Smolen P, Baxter DA, Byrne JH. Modeling transcriptional control in gene networks - methods, recent results, and future directions. Bull Math Biol. 2000;62:247–92. [PubMed] 13. Shmulevich IE, Dougherty R, Kim S, Zhang W. Probabilistic Boolean Networks. A Rule-based Uncertainty Model for Gene Regulatory Networks. Bioinformatics. 2002;18(2):261–274. [PubMed] 14. Brazma A, Schlitt T. Reverse engineering of gene regulatory networks: a finite state linear model. Genome Biol. 2003;4:P5. 15. Akutsu T, Miyano S, Kuhara S. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput. 1999:17–28. [PubMed] 16. Chen M. Modelling and Simulation of Metabolic Networks: Petri Nets Approach and Perspective. In: Amorski K, editor. Modelling and Simulation. Darmstadt: 2002. pp. 441–444. Proceedings of ESM2002. 17. Genrich H, Kueffner R, Voss K. Executable Petri Net Models for the Analysis of Metabolic Pathways. Int J Soft Tools Technol Tran. 2001;3:394–404. 18. Friedman N, Goldszmidt M, Wyner A. Data Analysis with Bayesian Networks: A Bootstrap Approach. 1999 UAI 99. 19. Barash Y, Friedman N. Context-Specific Bayesian Clustering for Gene Expression Data. J Comput Biol. 2002;9(2):169–91. [PubMed] 20. Zou M, Conzen S. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course micro-array data. Bioinformatics. 2005;21(1):71–79. [PubMed] 21. Hartemink A. Bayesian Networks and Informative Priors: Transcriptional Regulatory Network Models. In: Do KA, Müller P, Vannucci M, editors. Bayesian Inference for Gene Expression and Proteomics. Cambridge, UK: Cambridge University Press; 2006. pp. 401–424. 22. Toh H, Horimoto K. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics. 2002;18:287–297. [PubMed] 23. Wu X, Ye Y, Subramanian K. Interactive analysis of gene interactions using graphical Gaussian model. ACM SIGKDD Workshop on Data Mining in Bioinformatics. 2003;3:63–69. 24. Waddell PJ, Kishino H. Cluster inferences methods and graphical models evaluated on NCI60 microarray gene expression data. Genome Inform. 2000;11:129–140. 25. Chen T, He HL, Church GM. Modeling gene expression with differential equations. Pac Symp Biocomput. 1999:29–40. [PubMed] 26. Hasty J, McMillen D, Isaaks F, Collins J. Computational studies of gene regulatory networks: in numero molecular biology. Nat Rev Genet. 2001;268(2):268–279. [PubMed] 27. Siggia E. Computational methods for transcriptional regulation. Curr Op Genetics Dev. 2005;15:214–221. 28. Oliveri P, Davidson E. Gene regulatory network controlling embrionic specification in the sea urchin. Curr Op Genetics Dev. 2004;14:351–360. 29. Li H, Wong W. Model based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001;98:31–36. [PubMed] 30. Shen-Orr S, Mhilo R, Mangan S, Alon U. Network motifs in transcriptional regulation network of E. coli. Nat Genet. 2002;31:64–68. [PubMed] 31. Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, Wheeler R, Wong B, Drenkow J, Yamanaka M, Patel S, Brubaker S, Tammana H, Helt G, Struhl K, Gingeras TR. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004;116:499–509. [PubMed] 32. Euskirchen G, Royce TE, Bertone P, Martone R, Rinn JL, Nelson FK, Sayward F, Luscombe NM, Miller P, Gerstein M, Weissman S, Snyder M. CREB binds to multiple loci on human chromosome 22. Mol Cell Biol. 2004;24:3804–3814. [PubMed] 33. Abnizova I, Gilks WR. Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in vertebrate genomes. Brief Bioinform. 2006;7(1):48–54. [PubMed] 34. Vavouri T, Elgar G. Prediction of cis-regulatory elements using binding site matrices-the success, the failures and the reasons for both. Curr Opin Genet Develop. 2005;15(4):395–402. 35. Brazma AI, Jonassen I, Vilo J, Ukkonen E. Pedicting gene regulatory elements in silico on a genomic scale. Genome Res. 1998;8:1202–1215. [PubMed] 36. Pavesi G, Mauri G, Pesole G.
In silico representation and discovery of transcription factor binding sites. Brief Bioinform. 2004;5(3):217–36. [PubMed] 37. Wasserman WW, Krivan W. In silico identification of metazoan transcriptional regulatory regions. Naturwissenschaften. 2003;90(4):156–66. [PubMed] 38. Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. [PubMed] 39. Sandve GK, Finn D. A survey of motif discovery methods in an integrated framework. Biol Direct. 2006;1:11. [PubMed] 40. van Helden J. Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics. 2004;20:399–406. [PubMed] 41. Frith M, Fu Y, Chen J, Hansen U, Weng Z. Detection of functional motifs via statistical representation. Nucleic Acid Res. 2004;32:1372–1381. [PubMed] 42. Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–181. [PubMed] 43. Blanchette M, Bataille A, Chen X, Poitras C, Laganière C, Deblois G, Giguere V, Feretti V, Bergeron D, Coulombe B, Robert F. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006;16:656–68. [PubMed] 44. Hannenhalli S, Levy S. Promoter prediction in the human genome. Bioinformatics. 2001;17(Suppl 1):S90–96. [PubMed] 45. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23:137–44. [PubMed] 46. Li N, Tompa M. Analysis of computational approaches for motif discovery. Algorithms Mol Biol. 2006;1:8. [PubMed] 47. Hartemink AJ, Gifford D, Jaakkola T, Young R. Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput. 2002:437–49. [PubMed] 48. Kielbasa SM, Blüthgen N, Sers C, Schäfer R, Herzel H. Prediction of cis-regulatory elements of coregulated genes. Genome Inform. 2004;15:117–124. [PubMed] 49. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genet. 2001;29:153–159. [PubMed] 50. Kreinman G. Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res. 2004;32(9):2889–2900. [PubMed] 51. Conlon EM, Liu XS, Lieb JD, Liu JS. Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci USA. 2003;100(6):3339–44. [PubMed] 52. Wasserman WW, Palumbo M, Thompson W, Fickett JW, Lawrence CE. Human-mouse genome comparisons to locate regulatory sites. Nature Genet. 2000;26:225–228. [PubMed] 53. Dieterich C, Wang H, Rateitschak K, Luz H, Vingron M. CORG: a database for comparative regulatory genomics. Nucleic Acids Res. 2003;31:55–57. [PubMed] 54. Blanchette M, Schwikowski B, Tompa M. Algorithms for phylogenetic footprinting. J Computational Biol. 2002;2:11–23. 55. Dermitzakis ET, Clark AG. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002;19:1114–1121. [PubMed] 56. Hardison R. Comparative genomic. PloS Biol. 2003;1(2):e58. [PubMed] 57. Xie X, Jun L, Kulbokas E, Golub T, Mootha W, Lindblad-Toh K, Lander E, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature. 2005;1038:3441. 58. Bofelli D, Nobrega M, Rubin E. Comparative genomics at the vertebrate extremes. Nat Rev Genet. 2005;6:151–157. [PubMed] 59. Ovcharenko I, Boffelli D, Loots G. eShadow: a tool for comparing closely related sequences. Genome Res. 2004;14(6):1191–8. [PubMed] 60. Markstein M, Markstein P, Markstein V, Levine MS. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc Natl Acad Sci USA. 2002;99:763–768. [PubMed] 61. Rajewsky N, Vergassola M, Gaul U, Siggia ED. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinform. 2002;3:30. 62. Bailey TL, Noble WS. Searching for statistically significant regulatory modules. Bioinformatics. 2003;19(Suppl 2):II16–II25. [PubMed] 63. Frith MC, Li MC, Weng Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 2003;31:3666–3668. [PubMed] 64. Rebeiz M, Reeves N, Posakony J. SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Proc Natl Acad Sci USA. 2002;99(15):9888–9893. [PubMed] 65. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004;5:276–287. [PubMed] 66. Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D, Coulombe B, Robert F. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression . Genome Res. 2006;16:656–68. [PubMed] 67. Michelson A. Deciphering genetic regulatory codes: a challenge for functional genomicc. Proc Natl Acad Sci USA. 2002;99:546–548. [PubMed] 68. Kel-Margoulis OV, Tchekmenev D, Kel AE, Goessling E, Hornischer K, Lewicki-Potapov B, Wingender E. Composition-sensitive analysis of the human genome for regulatory signals. In Silico Biol. 2003;3:0013. 69. Wagner A. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics. 1999;15(10):776–84. [PubMed] 70. Hannenhalli S, Levy S. Predicting transcription factor synergism. Nucleic Acids Res. 2002;30:4278–4284. [PubMed] 71. Qiu P, Ding W, Jiang Y, Greene JR, Wang L. Computational analysis of composite regulatory elements. Mamm Genome. 2002;13:327–332. [PubMed] 72. Blüthgen N, Kielbasa S, Herzel H. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res. 2005;33(1):272–279. [PubMed] 73. Zhou Z, Shendure J, Church J. Discovering functional transcription-factor combinations in the human cell cycle. Genome Res. 2005;15:848–855. [PubMed] 74. Nguen D, D’haeseleer P. Deciphering principles of transcription regulation in eukaryotic genomes. Mol Sys Biol. 2006:0012. 75. Beiko RG, Charlebois RL. GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA. BMC Bioinform. 2005;6:36. 76. Hu Z, Fu Y, Halees A, Kielbasa S, Weng S. SeqVISTA: a new module of integrated computational tools for studying transcriptional regulation. Proc Natl Acad Sci USA. 2004;32:W235–W241. 77. Lemmens K, Dhollander T, De Bie T, Monsieurs P, Engelen K, Smets B, Winderickx J, De Moor B, Marchal K. Inferring transcriptional modules from ChIP-chip, motif and microarray data. Genome Biol. 2006;7(5):R37. [PubMed] 78. van Helden J, Wernisch L, Gilbert D, Wodak S. Graph-based analysis of metabolic networks. In: Mewes H, Weiss B, Seidel H, editors. Ernst Schering Research Foundation Workshop. Vol. 38. Berlin Heidelberg: Springer-Verlag; 2002. pp. 245–274. (Bioinformatics and Genome Analysis). ISBN 3-540-42893-3. 79. Schaefer C. Pathway databases. Ann NY Acad Sci. 2004;1020:77–91. [PubMed] 80. Buck MJ, Lieb J. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics. 2004;83(3):349–60. [PubMed] 81. Bar-Joseph Z, Gerber G, Rinaldi N, Yoo J, Gordon B, Fraenkel E, Jaakkola T, Young R, Gifford D. Computational discovery of gene modules and regulatory networks. Nat Biotechnol. 2003;21(11):1337–42. [PubMed] 82. Harbison C, Gordon B, Lee T, Rinaldi N, MacIsaak K, Danford T, Gerber G, Hannet N, Tagne J, Reynolds D, Yoo J, Jennings E, Pokholok D, Zeitlinger J, Dowell T, Kellis M, Rolfe A, Takusagawa K, Lander E, Gifford D, Fraenkel E, Young R. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004;431(7004):99–104. [PubMed] 83. Ren B, Cam H, Takahashi Y, Volkert T, Terragni J, Young RA, Dynlacht BD. E2F integrates cell cycle progression with DNA repair, replication, and G2/M checkpoints. Genes Dev. 2002;16:245–256. [PubMed] 84. Qi Y, Rolfe A, MacIsaak K, Gerber G, Pokholok D, Zeitlinger J, Danford T, Dowell T, Fraenkel E, Jaakkola T, Young R, Gifford D. High-resolution computational models of genome binding events. Nat Biotechnol. 2006;24(8):963–70. [PubMed] 85. Hong P, Liu S, Zhou Q, Lu X, Liu J, Wong W. A boosting approach for motif modeling using ChIP-chip data. Bioinformatics. 2005;21(11):2636–43. [PubMed] 86. Liu XS, Brutlag D, Liu J. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002;20(8):835–9. [PubMed] 87. Liu Y, Wei L, Batzoglou S, Brutlag D, Liu J, Liu S. A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Res. 2004;32(Web Server issue):W204–7. [PubMed] 88. Lee T, Rinaldi N, Roberts F, Odom D, Bar-Joseph Z, Gerber G, Hannet N, Harbison C, Thompson C, Simon I, Zeitlinger J, Jennings E, Murray H, Gordon B, Ren B, Wurick J, Tagne J, Volkert T, Fraenkel E, Gifford D, Young R. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002;298(5594):799–804. [PubMed] 89. Schreiber J, Jenner R, Murray H, Gerber G, Gifford D, Young R. Coordinated binding of NF-kappaB family members in the response of human cells to lipopolysaccharide. Proc Natl Acad Sci USA. 2006;103(15):5899–904. [PubMed] 90. Takusagawa KT, Gifford D. Negative information for motif discovery. Pac Symp Biocomput. 2004:360–71. [PubMed] 91. Hartemink AJ, Gifford D, Jaakkola T, Young R. Combining location and expression data for principled discovery of genetic regulatory network models. Pac Symp Biocomput. 2002:437–49. [PubMed] 92. Cameron R, Chow S, Berney K, Chiu T, Yuan Q, Krämer A, Helguero A, Ransick A, Yun M, Davidson E. An evolutionary constraint: Strongly disfavored class of change in DNA sequence during divergence of cis-regulatory modules. Proc Natl Acad Sci USA. 2005;102(33):11769–11774. [PubMed] 93. Dermitzakis ET, Clark AG. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol. 2002;19:1114–1121. [PubMed] 94. Tautz D. Evolution of transcriptional regulation. Curr Opin Genet Dev. 2000;10:575–579. [PubMed] 95. Ludwig M, Bergman C, Patel H, Kreitman M. Evidence for stabilizing selection in eukaryotic enhancer element. Nature. 2000;403:564–567. [PubMed] 96. Britten RJ, Davidson EH. Q Rev Biol. 1971;46:111–138. [PubMed] 97. Woofle A, Goodson M, Goode D, Snell P, Smith S, Vavouri T, McEwen G, Gilks W, Walter K, Abnizova I, Edwards Y, Elgar G. Highly conserved non coding sequences are associated with developmental control genes in vertebrates. PloS Biol. 2005;3:e7. [PubMed] 98. Bofelli D, Nobrega M, Rubin E. Comparative genomics at the vertebrate extremes. Nat Rev Genet. 2005;6:151–157. [PubMed] 99. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent W, Mattick H, Haussler D. Ultraconserved elements in human genome. Science. 2004;304:1321. [PubMed] 100. Levy S, Hennenhalli S, Workman C. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics. 2001;17:871–877. [PubMed] 101. Hancock JM, Shaw P, Bonneton F, Dover G. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. Mol Biol Evol. 1999;16:253–265. [PubMed] 102. Tautz D. Evolution of transcriptional regulation. Curr Opin Genet Dev. 2002;10:575–579. [PubMed] 103. Wagner GP, Fried C, Prohaska SJ, Stadler PF. Divergence of Conserved Non-Coding Sequences: Rate Estimates and Relative Rate Tests. Mol Biol Evol. 2004;21:2116–2121. [PubMed] 104. Ponting C, Lunter G. Signatures of adaptive evolution within human non-coding sequence. Hum Mol Genet. 2006;15(Review Issue 2):R170–R175. [PubMed] 105. Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–1152. [PubMed] 106. Chen S, Hung C, Xu C, Reigstad C, Magrini V, Sabo A, Blasiar A, Bieri T, Meyer R, Ozersky P, Armstrong T, Fulton R, Latreille J, Spieth J, Hooton T, Mardis E, Hultgren J, Gordon J. Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: A comparative genomics approach. Proc Natl Acad Sci USA. 2006;103(15):5977–5982. [PubMed] 107. Lockton S, Gaut B. Plant conserved non-coding sequences and paralogue evolution. Trends Genet. 2005;21(1):60–65. [PubMed] 108. Monsuur AJ, de, Bakker PI, Alizadeh BZ, Zhernakova A, Bevova MR, Strengman E, Franke L, van’t, Slot R, van, Belzen MJ, Lavrijsen IC, Diosdado B, Daly MJ, Mulder CJ, Mearin ML, Meijer JW, Meijer GA, van, Oort E, Wapenaar MC, Koeleman BP, Wijmenga C. Myosin IXB variant increases the risk of celiac disease and points toward a primary intestinal barrier defect. Nat Genet. 2005;37:1341–4. [PubMed] 109. Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD. Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet. 2005;77:1044–60. [PubMed] 110. Ueda H, Howson JM, Esposito L, Heward J, Snook H, Chamberlain G, Rainbow DB, Hunter KM, Smith AN, Di, Genova G, Herr MH, Dahlman I, Payne F, Smyth D, Lowe C, Twells RC, Howlett S, Healy B, Nutland S, Rance HE, Everett V, Smink LJ, Lam AC, Cordell HJ, Walker NM, Bordin C, Hulme J, Motzo C, Cucca F, Hess JF, Metzker ML, Rogers J, Gregory S, Allahabadia A, Nithiyananthan R, Tuomilehto-Wolf E, Tuomilehto J, Bingley P, Gillespie KM, Undlien DE, Ronningen KS, Guja C, Ionescu-Tirgoviste C, Savage DA, Maxwell AP, Carson DJ, Patterson CC, Franklyn JA, Clayton DG, Peterson LB, Wicker LS, Todd JA, Gough SC. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature. 2003;423:506–11. [PubMed] 111. Morahan G, Huang D, Ymer SI, Cancilla MR, Stephen K, Dabadghao P, Werther G, Tait BD, Harrison LC, Colman PG. Nat Genet. 2001;27:218–21. [PubMed] 112. Stern D. Perspective: evolutionary developmental biology and the problem of variation. Evolution. 2000;54:1079–1091. [PubMed] 113. Tautz D. Evolution of transcriptional regulation. Curr Opin Genet Dev. 2000;10:575–57. [PubMed] 114. Purugganan MD. The molecular population genetics of regulatory genes. Mol Ecol. 2000;9:1451–1461. [PubMed] 115. Wray GA, Lowe C. Developmental regulatory genes and echinoderm evolution. Syst Biol. 2000;49:28–51. [PubMed] 116. Wilkins AS. Sinauer Associates. Sunderland: Mass; 2002. The evolution of developmental pathways. 117. Buckland P, Hoogendoorn B, Coleman S, Guy C, Smith S, O’Donovan M. Strong bias in the location of functional promoter polymorphisms. Hum Mutat. 2005;24:35–42. [PubMed] 118. Hoogendoorn B, Coleman SL, Guy CA, Smith SK, O’Donovan MC, Buckland PR. Functional analysis of polymorphisms in the promoter regions of genes on 22q11. Hum Mutat. 2004;24:35–42. [PubMed] 119. Mooney S. Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform. 2005;6:44–56. [PubMed] 120. Pastinen T, Hudson TJ. Cis-acting regulatory variation in the human genome. Science. 2004;306:647–650. [PubMed] 121. Hudson TJ. Wanted: regulatory SNPs. Nat Genet. 2003;33:439–440. [PubMed] 122. Conde L, Vaquerizas JM, Santoyo J, Al-Shahrour F, Ruiz-Llorente S, Robledo M, Dopazo J. PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic Acids Res. 2004;32:W242–W248. [PubMed] 123. Ponomarenko JV, Merkulova TI, Orlova GV, Fokin ON, Gorshkova EV, Frolov AS, Valuev VP, Ponomarenko MP. rSNP_Guide, a database system for analysis of transcription factor binding to DNA with variations: application to genome annotation. Nucleic Acids Res. 2003;31:118–121. [PubMed] 124. Kang HJ, Choi KO, Kim BD, Kim S, Kim YJ. FESD: a Functional Element SNPs Database in human. Nucleic Acids Res. 2005;33:D518–D522. [PubMed] 125. Tahira T, Baba S, Higasa K, Kukita Y, Suzuki Y, Sugano S, Hayashi K. dbQSNP: a database of SNPs in human promoter regions with allele frequency information determined by single-strand conformation polymorphism-based methods. Hum Mutat. 2005;26:69–77. [PubMed] 126. Chuzhanova NA, Anassis EJ, Ball EV, Krawczak M, Cooper DN. Meta-analysis of indels causing human genetic disease: mechanisms of mutagenesis and the role of local DNA sequence complexity. Hum Mutat. 2003;21:28–44. [PubMed] 127. Khan I, Mort M, Buckland P, Cooper D, O’Donnovan M, Chuzhanova N. In silico discrimination of single nucleotide polymorphisms and pathological mutations in human gene promoter regions by means of local DNA sequence context and regularity. In Silico Biol. 2006;6:0003. 128. Ohno S. Evolution by gene duplication. Springer-Verlag; 1970. 129. Zhang J. Evolution by gene duplication: an update. Trends Ecol Evol. 2003;18(6):292–298. 130. Duarte J, Cui L, Wall P, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis C. Expression Pattern Shifts Following Duplication Indicative of Sub-functionalization and Neo-functionalization in Regulatory Genes of Arabidopsis. Mol Biol Evol. 2005;23(3):469–478. [PubMed] 131. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. [PubMed] 132. McEwen G, Woolfe A, Goode D, Vavouri T, Callaway H, Elgar G. Ancient duplicated conserved noncoding elements in vertebrates: A genomic and functional analysis. Genome Res. 2006;15:451–465. [PubMed] 133. Wagner A. Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics. 1999;15:776–784. [PubMed] 134. Pickert L, Reuter I, Klawonn F, Wingender E. Transcription regulatory region analysis using signal detection and fuzzy clustering. Bioinformatics. 1998;14:244–251. [PubMed] 135. Lifanov AP, Makeev VJ, Nazina AG, Papatsenko DA. Homotypic regulatory clusters in Drosophila. Genome Res. 2003;13:579–588. [PubMed] 136. Chytil M, Peterson BR, Erlanson DA, Verdine GL. The orientation of the AP-1 heterodimer on DNA strongly affects transcriptional potency. Proc Natl Acad Sci USA. 1998;95:14076–14081. [PubMed] 137. Remenyi A, Tomilin A, Scholer HR, Wilmanns M. Differential activity by DNA-induced quarternary structures of POU transcription factors. Biochem Pharmacol. 2002;64:979–984. [PubMed] 138. Dion V, Coulombe B. Interactions of a DNA-bound transcriptional activator with the TBP-TFIIA-TFIIB-promoter quaternary complex. J Biol Chem. 2003;278:11495–11501. [PubMed] 139. Inoue J, Sato R, Maeda M. Multiple DNA elements for sterol regulatory element-binding protein and NF-Y are responsible for sterol-regulated transcription of the genes for human 3-hydroxy-3-methylglutaryl coenzyme A synthase and squalene synthase. J Biochem. 1998;123:1191–1198. [PubMed] 140. D’Alonzo RC, Selvamurugan N, Karsenty G, Partridge NC. Physical interaction of the activator protein-1 factors c-Fos and c-Jun with Cbfa1 for collagenase-3 promoter activation. J Biol Chem. 2002;277:816–822. [PubMed] 141. Sarafova S, Siu G. Precise arrangement of factor-binding sites is required for murine CD4 promoter function. Nucleic Acids Res. 2000;28:2664–2671. [PubMed] 142. Ioshikhes I, Trifonov EN, Zhang MQ. Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. Proc Natl Acad Sci USA. 1999;96:2891–2895. [PubMed] 143. Praz V, Perier R, Bonnard C, Bucher P. The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res. 2002;30:322–324. [PubMed] 144. Makeev V, Lifanov A, Nazina N, Papatsenko D. Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information. Nucl Acids Res. 2003;31(20):6016–6026. [PubMed] 145. Kumar L, Futschik M, Herzel H. DNA motifs and sequence periodicities. In Silico Biol. 2006;6:0008. 146. Diamond MI, Miner JN, Yoshinaga SK, Yamamoto KR. Transcription factor interactions: selectors of positive or negative regulation from a single DNA element. Science. 1990;249:1266–1272. [PubMed] 147. Kel A, Kel-Margoulis O, Ivanova T, Wingender E. Cluster-Scan: A Tool for Automatic Annotation of Genomic Regulatory Sequences by Searching for Composite Clusters. Proc German Conf Bioinform. 2001:96–101. 148. Kel-Margoulis OV, Kel AE, Reuter I, Deineko IV, Wingender E. TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res. 2002;30:332–334. [PubMed] 149. Hannenhalli S, Levy S. Predicting transcription factor synergism. Nucleic Acids Res. 2002;30:4278–4284. [PubMed] 150. Qiu P, Ding W, Jiang Y, Greene JR, Wang L. Computational analysis of composite regulatory elements. Mamm Genome. 2002;13:327–332. [PubMed] 151. Wasserman WW, Fickett JW. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–181. [PubMed] 152. Krivan W, Wasserman WW. A predictive model for regulatory sequences directing liver-specific transcription. Genome Res. 2001;11:1559–1566. [PubMed] 153. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genet. 2001;29:153–159. [PubMed] 154. Boeva V, Regnier M, Papatsenko D, Makeev V. Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics. 2006;22(6):676–684. [PubMed] 155. Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987;193:723–750. [PubMed] 156. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986;188:415–431. [PubMed] 157. Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci. 1998;23:109–113. [PubMed] 158. Hertz GZ, Stormo GD. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bio-informatics. 1999;15:563–577. 159. Nagarajan N, Jones N, Keich U. Computing the P-value of the information content from an alignment of multiple sequences. Bioinformatics. 2005;21:i311–i318. [PubMed] 160. Stormo GD, Hartzell GW., III Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci USA. 1989;86:1183–1187. [PubMed] 161. Hertz GZ, Hartzel GW, III, Stormo GD. Identification of con-sensus patterns in unaligned DNA sequences known to be functionally related. Comput Appl Biosci. 1990;6:81–92. [PubMed] 162. Lawrence CE, Reilly AA. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7:41–51. [PubMed] 163. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262:208–214. [PubMed] 164. Zhou Q, Liu JS. Modelling within-motif dependence for transcription factor binding site predictions. Bioinformatics. 2004;20:909–916. [PubMed] 165. FitzGerald P, Shlyakhtenko A, Mir A, Vinson C. Clustering of DNA Sequences in Human Promoters. Genome Res. 2004;14:1562–1574. [PubMed] 166. Papatsenko DA, Makeev VJ, Lifanov AP, Regnier M, Nazina AG, Desplan C. Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res. 2002;12:470–481. [PubMed] 167. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004;32:D91–D94. [PubMed] 168. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. [PubMed] 169. Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem Sci. 1998;23:109–113. [PubMed] 170. Kechris KJ, van Zwet E, Bickel PJ, Eisen MB. Detecting DNA regulatory motifs by incorporating positional trends in information content. Genome Biol. 2004;5(7):R50. [PubMed] 171. Scherf M, Klingenhoff A, Werner T. Highly specific localization of promoter regions in large genomic sequences by Promoterlnspector: a novel context analysis approach. J Mol Biol. 2000;297(3):599–606. [PubMed] 172. Ashraf SI, Ip YT. Transcriptional control: repression by local chromatin modification. Curr Biol. 1998;8:R683–R686. [PubMed] 173. Razin A. CpG methylation, chromatin structure and gene silencing—a three-way connection. EMBO J. 1998;17:4905–4908. [PubMed] 174. Farkas G, Leibovitch BA, Elgin SC. Chromatin organization and transcriptional control of gene expression in Drosophila. Gene. 2000;253:117–136. [PubMed] 175. Barash Y, Elidan G, Friedman N, Caplan T. Modeling Dependencies in Protein-DNA Binding Sites. In Proc of the 7th Int Conf Res Comput Mol Biol. 2003:28–37. 176. Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987;193(4):723–50. [PubMed] 177. Man TK, Stormo GD. Non-independence of Mnt repressor-operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001;29:2471–2478. [PubMed] 178. Bulyk ML, Johnson PL, Church GM. Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 2002;30:1255–1261. [PubMed] 179. Djordjevic M, Sengupta AM, Shraiman BI. A biophysical approach to transcription factor binding site discovery. Genome Res. 2003;13:2381–2390. [PubMed] 180. King OD, Roth FP. A non-parametric model for transcription factor binding sites. Nucleic Acids Res. 2003;31:e116. [PubMed] 181. Gershenzon NI, Stormo GD, Ioshikhes IP. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res. 2005;33:2290–2301. [PubMed] 182. Stormo G, Schneider T, Gold L, Ehrenfeucht A. The ‘Perceptron’ algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 1982;10(9):2997–3011. [PubMed] 183. Stormo G, Schneider T, Gold L. Quantitative analysis of the relationship between nucleotide sequence and functional activity. Nucleic Acids Res. 1986;14(16):6661–79. [PubMed] 184. Benos PV, Bulyk ML, Stormo GD. Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res. 2002;30:4442–4451. [PubMed] 185. O’Flanagan RA, Paillard G, Lavery R, Sengupta AM. Non-additivity in protein-DNA binding. Bioinformatics. 2005;21:2254–2263. [PubMed] 186. Lim L, Burge C. A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci USA. 2001;98(20):11193–8. [PubMed] 187. Cawley S. Statistical models for DNA sequencing and analysis spliceosome: motors, clocks, springs, and things. Cell, Statistical models for DNA sequencing and analysis. Berkely, CA: University of California at Berkely; 2000. PhD thesis. 188. Xing E, Jordan M, Karp R, Russell S. A hierarchical bayesian markovian model for motifs in biopolymer sequences. In: Becker S, Thrun S, Obermayer K, editors. Advances in Neural Information Processing Systems. Vol. 16. Cambridge, MA: MIT Press; 2002. 189. Wasserman W, Fickett J. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol. 1998;278:167–81. [PubMed] 190. Sinha S, van Nimwegen E, Siggia ED. A probabilistic method to detect regulatory modules. Bioinformatics. 2003;19(Suppl 1):i292–301. [PubMed] 191. GimaThakurta D, Stormo GD. Identifying target sites for cooperatively binding factors. Bioinformatics. 2001;17(7):608–21. [PubMed] 192. Rebeiz M, Reeves N, Posakony J. SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc Natl Acad Sci USA. 2002;99(15):9888–93. [PubMed] 193. Aerts S, Van Loo P, Thijs G, Moreau Y, De Moor B. Computational detection of cis -regulatory modules. Bioinformatics. 2003;19(Suppl 2):II5–II14. [PubMed] 194. Xing EP, Wu W, Jordan MI, Karp RM. Logos: a modular bayesian model for de novo motif detection. J Bioinform Comput Biol. 2004;2:127–54. [PubMed] 195. Gupta M, Liu JS. De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Natl Acad Sci USA. 2005;102(20):7079–84. [PubMed] 196. Frech K, Werner T. Specific modelling of regulatory units in DNA sequences. Pac Symp Biocomput. 1997:151–62. [PubMed] 197. Sharan R, Ovcharenko I, Ben-Hur A, Karp RM. CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics. 2003;19(Supp.1):i283–i291. [PubMed] 198. Scherf M, Klingenhoff A, Werner T. Highly specific localization of promoter regions in large genomic sequences by Promoterlnspector: a novel context analysis approach. J Mol Biol. 2000;297(3):599–606. [PubMed] 199. Brazma A, Vilo J, Ukkonen E, Valtonen K. Data mining for regulatory elements in yeast genome. Proc Int Conf Intell Syst Mol. Biol. 1997;5:65–74. [PubMed] 200. Mahony S, Hendrix D, Golden A, Smith TJ, Rokhsar DS. Transcription factor binding site identification using the self-organizing map. Bioinformatics. 2005;21(9):1807–1814. [PubMed] 201. Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with expression. Nat Genet. 2001;27(2):167–71. [PubMed] 202. Wang T, Stormo GD. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics. 2003;19(18):2369–80. [PubMed] 203. Bussemaker HJ, Li H, Siggia ED. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci USA. 2000;97(18):10096–100. [PubMed] 204. Gupta M, Liu JS. Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model. J Am Stat Asso. 2003;98:55–66. 205. Takusagawa KT, Gifford DK. Negative information for motif discovery. Pac Symp Biocomput. 2004:360–71. [PubMed] 206. Sinha S, Tompa M. A statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol. 2000;8:344–54. [PubMed] 207. Tompa M. An exact method for finding short motifs in sequences, with application to the ribosome binding site problem. Proc Int Conf Intell Syst Mol. Biol. 1999:262–71. Heidelberg, Germany. [PubMed] 208. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262(5131):208–14. [PubMed] 209. Sharan R, Ovcharenko I, Ben-Hur A, Karp RM: CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics. 2003;19(Suppl 1):i283–91. [PubMed] 210. Curran MD, Liu H, Long F, Ge N. Statistical methods for joint data mining of gene expression and DNA sequence database. SIGKDD Explor Newsl. 2003;5(2):122–129. 211. Segal E, Yelensky R, Koller D. Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics. 2003;19(Suppl 1):i273–82. [PubMed] 212. Hong P, Liu X, Zhou Q, Lu X, Liu JS, Wong WH. A boosting approach for motif modeling using ChlP-chip data. Bioinformatics. 2005;21(11):2636–2643. [PubMed] 213. Holmes I, Bruno WJ. Finding regulatory elements using joint likelihoods for sequence and expression profile data. Proc Int Conf Intell Syst Mol. Biol. 2000;8:202–10. [PubMed] 214. Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with expression. Nat Genet. 2001;27(2):167–71. [PubMed] 215. Grad YH, Roth FP, Halfon MS, Church GM. Prediction of similarly-acting cis-regulatory modules by subsequence profiling and comparative genomics in D. melanogaster and D. pseudoob-scura. Bioinformatics. 2004;20(16):2738–2750. [PubMed] 216. Bailey TL, Elkan C. The value of prior knowledge in discovering motifs with. Proc Int Conf Intell Syst Mol. Biol. 1995;3:21–9. [PubMed] 217. Lawrence CE, Reilly AA. An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins. 1990;7:41–51. [PubMed] 218. Sinha S, Blanchette M, Tompa M. PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics. 2004;5:170. [PubMed] 219. Prakash A, Blanchette M, Sinha S, Tompa M. Motif discovery in heterogeneous sequence data. Pac Symp Biocomput. 2004:348–59. [PubMed] 220. Ao W, Gaudet J, Kent WJ, Muttumu S, Mango SE. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science. 2004;305(5691):1743–6. [PubMed] 221. Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;6:127–138. [PubMed] 222. Jensen ST, Liu XS, Liu JS, Zhouj Q. Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective. Statist Sci. 2004;19:188–204. 223. Thompson W, Rouchka EC, Lawrence CE. Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003;31(13):3580–5. [PubMed] 224. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262(5131):208–14. [PubMed] 225. Neuwald AF, Liu JS, Lawrence CE: Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995;4(8):1618–32. [PubMed] 226. Gupta M, Liu JS: Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model. J Am Stat Asso. 2003;94(461):55–66. 227. Zhou Q, Wong W. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc Natl Acad Sci USA. 2004;101(33):12114–9. [PubMed] 228. Aerts S, Van Loo P, Moreau Y, De Moor B. A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes. Bioinformatics. 2004;20(12):1974–6. [PubMed] 229. Zhao X, Huang H, Speed TP: RECOMB ‘04-’ Proceedings of the eighth annual international conference on Computational molecular biology. New York, NY, USA: ACM Press; 2004. Finding short DNA motifs using permuted markov models; pp. 68–75. 230. Steffen NR, Murphy SD, Tolleri L, Hatfield GW, Lathrop RH. DNA sequence and structure: direct and indirect recognition in protein-DNA binding. Bioinformatics. 2002;18:S22–S30. [PubMed] 231. Liu R, Blackwell TW, States DJ. Conformational model for binding site recognition by the E.coli MetJ transcription factor. Bioinformatics. 2001;17:622–633. [PubMed] 232. Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005;33:5781–5798. [PubMed] 233. Kaplan T, Friedman N, Margalit H.
Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput Biol. 2005;1:e1. [PubMed] 234. Mandel-Gutfreund Y, Margalit H. Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res. 1998;26:2306–2312. [PubMed] 235. Kono H, Sarai A. Structure-based prediction of DNA target sites by regulatory proteins. Proteins. 1999;35:114–131. [PubMed] 236. Havranek JJ, Duarte CM, Baker D. A simple physical model for the prediction and design of protein-DNA interactions. J Mol Biol. 2004;344:59–70. [PubMed] 237. Benham CJ. Computation of DNA structural variability—a new predictor of DNA regulatory regions. Comput Appl Biosci. 1996;12:375–381. [PubMed] 238. Quandt K, Grote K, Werner T. GenomeInspector: a new approach to detect correlation patterns of elements on genomic sequences. Comput Appl Biosci. 2003;12:405–413. [PubMed] 239. Pedersen AG, Baldi P, Chauvin Y, Brunak S. The biology of eukaryotic promoter prediction-a review. Comput Chem. 1999;23(3-4):191–207. [PubMed] 240. Ponomarenko J, Ponomarenko M, Frolov A, Vorobyev D, Overton G, Kolchanov NA. Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics. 1999;15(7-8):654–68. [PubMed] 241. El Hassan MA, Calladine C. Conformational characteristics of DNA: empirical classifications and a hypothesis for the conformational behaviour of dinucleotide steps. Roy Soc of London Phil Tr A. 1997;355(1722):43–100. 242. Burden S, Lin XR, Zhang R. Improving promoter prediction Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics. 2005;21(5):601–607. [PubMed] 243. Liu R, States D. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res. 2001;12:462–469. [PubMed] 244. Mellor J. The dynamics of chromatin remodeling at promoters. Mol Cell. 2005;19:147–157. [PubMed] 245. Pudimat R, Schukat-Talamazzini E, Backofen R. Feature Based Representation and Detection of Transcription Factor Binding Sites. Proc German Conf Bioinform. 2004:43–52. 246. Perkins D, Jeffries C, Sullivan P. Expanding the ‘central dogma’: the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia. Mol Psychiatry. 2005;10(1):69–78. [PubMed] 247. Szymanski M, Barciszewski J. RNA regulation in mammals. Ann N Y Acad Sci. 2006;1067:461–8. [PubMed] 248. Qi L, Li X, Zhang S, An D. Genetic regulation by non-coding RNAs. Sci China C Life Sci. 2006;49:201–17. [PubMed] 249. Yang L, Ke Y. Noncoding RNA, a new focus of functional genomic study. Beijing Da Xue Xue Bao. 2006;18(38):444–6. [PubMed] 250. Mattick J, Makunin I. Non-coding RNA. Hum Mol Genet. 2006;15:R17–R29. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||
Development. 1997 May; 124(10):1851-64.
[Development. 1997]Nat Rev Genet. 2005 Feb; 6(2):151-7.
[Nat Rev Genet. 2005]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Nucleic Acids Res. 2003 Jan 1; 31(1):374-8.
[Nucleic Acids Res. 2003]Bull Math Biol. 2000 Mar; 62(2):247-92.
[Bull Math Biol. 2000]Bioinformatics. 2002 Feb; 18(2):261-74.
[Bioinformatics. 2002]Pac Symp Biocomput. 1999; ():17-28.
[Pac Symp Biocomput. 1999]Bioinformatics. 2002 Feb; 18(2):287-97.
[Bioinformatics. 2002]Pac Symp Biocomput. 1999; ():29-40.
[Pac Symp Biocomput. 1999]Nat Rev Genet. 2001 Apr; 2(4):268-79.
[Nat Rev Genet. 2001]Proc Natl Acad Sci U S A. 2001 Jan 2; 98(1):31-6.
[Proc Natl Acad Sci U S A. 2001]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Cell. 2004 Feb 20; 116(4):499-509.
[Cell. 2004]Mol Cell Biol. 2004 May; 24(9):3804-14.
[Mol Cell Biol. 2004]Brief Bioinform. 2006 Mar; 7(1):48-54.
[Brief Bioinform. 2006]J Mol Biol. 1998 Apr 24; 278(1):167-81.
[J Mol Biol. 1998]Genome Res. 2006 May; 16(5):656-68.
[Genome Res. 2006]Bioinformatics. 2001; 17 Suppl 1():S90-6.
[Bioinformatics. 2001]Nat Biotechnol. 2005 Jan; 23(1):137-44.
[Nat Biotechnol. 2005]Pac Symp Biocomput. 2002; ():437-49.
[Pac Symp Biocomput. 2002]Genome Inform. 2004; 15(1):117-24.
[Genome Inform. 2004]Nat Genet. 2001 Oct; 29(2):153-9.
[Nat Genet. 2001]Nucleic Acids Res. 2004; 32(9):2889-900.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2003 Mar 18; 100(6):3339-44.
[Proc Natl Acad Sci U S A. 2003]Nat Genet. 2000 Oct; 26(2):225-8.
[Nat Genet. 2000]PLoS Biol. 2003 Nov; 1(2):E58.
[PLoS Biol. 2003]Nat Rev Genet. 2005 Feb; 6(2):151-7.
[Nat Rev Genet. 2005]Genome Res. 2004 Jun; 14(6):1191-8.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):763-8.
[Proc Natl Acad Sci U S A. 2002]Nucleic Acids Res. 2003 Jul 1; 31(13):3666-8.
[Nucleic Acids Res. 2003]Proc Natl Acad Sci U S A. 2002 Jul 23; 99(15):9888-93.
[Proc Natl Acad Sci U S A. 2002]Nat Rev Genet. 2004 Apr; 5(4):276-87.
[Nat Rev Genet. 2004]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):546-8.
[Proc Natl Acad Sci U S A. 2002]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Nucleic Acids Res. 2002 Oct 1; 30(19):4278-84.
[Nucleic Acids Res. 2002]Mamm Genome. 2002 Jun; 13(6):327-32.
[Mamm Genome. 2002]Genome Res. 2005 Jun; 15(6):848-55.
[Genome Res. 2005]Nucleic Acids Res. 2005; 33(1):272-9.
[Nucleic Acids Res. 2005]Genome Biol. 2006; 7(5):R37.
[Genome Biol. 2006]Ann N Y Acad Sci. 2004 May; 1020():77-91.
[Ann N Y Acad Sci. 2004]Genomics. 2004 Mar; 83(3):349-60.
[Genomics. 2004]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Genes Dev. 2002 Jan 15; 16(2):245-56.
[Genes Dev. 2002]Nat Biotechnol. 2003 Nov; 21(11):1337-42.
[Nat Biotechnol. 2003]Nat Biotechnol. 2006 Aug; 24(8):963-70.
[Nat Biotechnol. 2006]Nat Biotechnol. 2002 Aug; 20(8):835-9.
[Nat Biotechnol. 2002]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W204-7.
[Nucleic Acids Res. 2004]Nat Biotechnol. 2006 Aug; 24(8):963-70.
[Nat Biotechnol. 2006]Nat Biotechnol. 2003 Nov; 21(11):1337-42.
[Nat Biotechnol. 2003]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Pac Symp Biocomput. 2004; ():360-71.
[Pac Symp Biocomput. 2004]Nat Biotechnol. 2003 Nov; 21(11):1337-42.
[Nat Biotechnol. 2003]Pac Symp Biocomput. 2004; ():360-71.
[Pac Symp Biocomput. 2004]Pac Symp Biocomput. 2002; ():437-49.
[Pac Symp Biocomput. 2002]Genome Biol. 2006; 7(5):R37.
[Genome Biol. 2006]Mol Biol Evol. 2003 Sep; 20(9):1377-419.
[Mol Biol Evol. 2003]Proc Natl Acad Sci U S A. 2005 Aug 16; 102(33):11769-74.
[Proc Natl Acad Sci U S A. 2005]Mol Biol Evol. 2002 Jul; 19(7):1114-21.
[Mol Biol Evol. 2002]Nature. 2000 Feb 3; 403(6769):564-7.
[Nature. 2000]Proc Natl Acad Sci U S A. 2005 Aug 16; 102(33):11769-74.
[Proc Natl Acad Sci U S A. 2005]Q Rev Biol. 1971 Jun; 46(2):111-38.
[Q Rev Biol. 1971]Proc Natl Acad Sci U S A. 2005 Aug 16; 102(33):11769-74.
[Proc Natl Acad Sci U S A. 2005]PLoS Biol. 2005 Jan; 3(1):e7.
[PLoS Biol. 2005]Science. 2004 May 28; 304(5675):1321-5.
[Science. 2004]Bioinformatics. 2001 Oct; 17(10):871-7.
[Bioinformatics. 2001]Curr Opin Genet Dev. 2000 Oct; 10(5):575-9.
[Curr Opin Genet Dev. 2000]Mol Biol Evol. 2004 Nov; 21(11):2116-21.
[Mol Biol Evol. 2004]Hum Mol Genet. 2006 Oct 15; 15 Spec No 2():R170-5.
[Hum Mol Genet. 2006]Nature. 2005 Oct 20; 437(7062):1149-52.
[Nature. 2005]Proc Natl Acad Sci U S A. 2006 Apr 11; 103(15):5977-82.
[Proc Natl Acad Sci U S A. 2006]Trends Genet. 2005 Jan; 21(1):60-5.
[Trends Genet. 2005]Nat Genet. 2005 Dec; 37(12):1341-4.
[Nat Genet. 2005]Nat Genet. 2001 Feb; 27(2):218-21.
[Nat Genet. 2001]Evolution. 2000 Aug; 54(4):1079-91.
[Evolution. 2000]Hum Mutat. 2004 Jul; 24(1):35-42.
[Hum Mutat. 2004]Hum Mutat. 2004 Jul; 24(1):35-42.
[Hum Mutat. 2004]Brief Bioinform. 2005 Mar; 6(1):44-56.
[Brief Bioinform. 2005]Nat Genet. 2003 Apr; 33(4):439-40.
[Nat Genet. 2003]Nucleic Acids Res. 2004 Jul 1; 32(Web Server issue):W242-8.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):118-21.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D518-22.
[Nucleic Acids Res. 2005]Hum Mutat. 2005 Aug; 26(2):69-77.
[Hum Mutat. 2005]Hum Mutat. 2003 Jan; 21(1):28-44.
[Hum Mutat. 2003]Mol Biol Evol. 2006 Feb; 23(2):469-78.
[Mol Biol Evol. 2006]Mol Biol Evol. 2006 Feb; 23(2):469-78.
[Mol Biol Evol. 2006]Nature. 2003 Jul 10; 424(6945):147-51.
[Nature. 2003]Genome Res. 2006 Apr; 16(4):451-65.
[Genome Res. 2006]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):763-8.
[Proc Natl Acad Sci U S A. 2002]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Genome Res. 2003 Apr; 13(4):579-88.
[Genome Res. 2003]Proc Natl Acad Sci U S A. 1998 Nov 24; 95(24):14076-81.
[Proc Natl Acad Sci U S A. 1998]J Biol Chem. 2003 Mar 28; 278(13):11495-501.
[J Biol Chem. 2003]J Biochem. 1998 Jun; 123(6):1191-8.
[J Biochem. 1998]J Biol Chem. 2002 Jan 4; 277(1):816-22.
[J Biol Chem. 2002]Nucleic Acids Res. 2000 Jul 15; 28(14):2664-71.
[Nucleic Acids Res. 2000]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2891-5.
[Proc Natl Acad Sci U S A. 1999]Nucleic Acids Res. 2003 Oct 15; 31(20):6016-26.
[Nucleic Acids Res. 2003]Science. 1990 Sep 14; 249(4974):1266-72.
[Science. 1990]Nucleic Acids Res. 2002 Jan 1; 30(1):332-4.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Oct 1; 30(19):4278-84.
[Nucleic Acids Res. 2002]Mamm Genome. 2002 Jun; 13(6):327-32.
[Mamm Genome. 2002]J Mol Biol. 1998 Apr 24; 278(1):167-81.
[J Mol Biol. 1998]Bioinformatics. 2006 Mar 15; 22(6):676-84.
[Bioinformatics. 2006]J Mol Biol. 1987 Feb 20; 193(4):723-50.
[J Mol Biol. 1987]Trends Biochem Sci. 1998 Mar; 23(3):109-13.
[Trends Biochem Sci. 1998]Bioinformatics. 2005 Jun; 21 Suppl 1():i311-8.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 1989 Feb; 86(4):1183-7.
[Proc Natl Acad Sci U S A. 1989]Science. 1993 Oct 8; 262(5131):208-14.
[Science. 1993]Proc Natl Acad Sci U S A. 2005 Aug 16; 102(33):11769-74.
[Proc Natl Acad Sci U S A. 2005]Mol Biol Evol. 2003 Sep; 20(9):1377-419.
[Mol Biol Evol. 2003]Bioinformatics. 2004 Apr 12; 20(6):909-16.
[Bioinformatics. 2004]Genome Res. 2004 Aug; 14(8):1562-74.
[Genome Res. 2004]Genome Res. 2002 Mar; 12(3):470-81.
[Genome Res. 2002]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Nucleic Acids Res. 2003 Jan 1; 31(1):374-8.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D91-4.
[Nucleic Acids Res. 2004]Bioinformatics. 2000 Jan; 16(1):16-23.
[Bioinformatics. 2000]Trends Biochem Sci. 1998 Mar; 23(3):109-13.
[Trends Biochem Sci. 1998]Genome Biol. 2004; 5(7):R50.
[Genome Biol. 2004]J Mol Biol. 2000 Mar 31; 297(3):599-606.
[J Mol Biol. 2000]Curr Biol. 1998 Sep 24; 8(19):R683-6.
[Curr Biol. 1998]J Mol Biol. 1987 Feb 20; 193(4):723-50.
[J Mol Biol. 1987]Bioinformatics. 2004 Apr 12; 20(6):909-16.
[Bioinformatics. 2004]Nucleic Acids Res. 2001 Jun 15; 29(12):2471-8.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2002 Mar 1; 30(5):1255-61.
[Nucleic Acids Res. 2002]Genome Res. 2003 Nov; 13(11):2381-90.
[Genome Res. 2003]Nucleic Acids Res. 2005; 33(7):2290-301.
[Nucleic Acids Res. 2005]Bioinformatics. 2004 Apr 12; 20(6):909-16.
[Bioinformatics. 2004]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]Bioinformatics. 2005 May 15; 21(10):2254-63.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 2001 Sep 25; 98(20):11193-8.
[Proc Natl Acad Sci U S A. 2001]Genome Biol. 2004; 5(7):R50.
[Genome Biol. 2004]J Mol Biol. 1998 Apr 24; 278(1):167-81.
[J Mol Biol. 1998]Bioinformatics. 2003 Oct; 19 Suppl 2():ii5-14.
[Bioinformatics. 2003]Nucleic Acids Res. 2003 Jul 1; 31(13):3666-8.
[Nucleic Acids Res. 2003]J Bioinform Comput Biol. 2004 Mar; 2(1):127-54.
[J Bioinform Comput Biol. 2004]Proc Natl Acad Sci U S A. 2005 May 17; 102(20):7079-84.
[Proc Natl Acad Sci U S A. 2005]Bioinformatics. 2003; 19 Suppl 1():i283-91.
[Bioinformatics. 2003]Proc Int Conf Intell Syst Mol Biol. 1997; 5():65-74.
[Proc Int Conf Intell Syst Mol Biol. 1997]Bioinformatics. 1999 Oct; 15(10):776-84.
[Bioinformatics. 1999]Bioinformatics. 2003; 19 Suppl 1():i292-301.
[Bioinformatics. 2003]J Bioinform Comput Biol. 2004 Mar; 2(1):127-54.
[J Bioinform Comput Biol. 2004]Proc Natl Acad Sci U S A. 2000 Aug 29; 97(18):10096-100.
[Proc Natl Acad Sci U S A. 2000]J Mol Biol. 1987 Feb 20; 193(4):723-50.
[J Mol Biol. 1987]Bioinformatics. 2005 Jun; 21 Suppl 1():i311-8.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 2002 Jan 22; 99(2):757-62.
[Proc Natl Acad Sci U S A. 2002]J Bioinform Comput Biol. 2004 Mar; 2(1):127-54.
[J Bioinform Comput Biol. 2004]Proc Natl Acad Sci U S A. 2002 Jul 23; 99(15):9888-93.
[Proc Natl Acad Sci U S A. 2002]Pac Symp Biocomput. 2004; ():360-71.
[Pac Symp Biocomput. 2004]Science. 1993 Oct 8; 262(5131):208-14.
[Science. 1993]Bioinformatics. 2003; 19 Suppl 1():i283-91.
[Bioinformatics. 2003]Nat Genet. 2001 Feb; 27(2):167-71.
[Nat Genet. 2001]Proc Int Conf Intell Syst Mol Biol. 2000; 8():202-10.
[Proc Int Conf Intell Syst Mol Biol. 2000]Bioinformatics. 2004 Nov 1; 20(16):2738-50.
[Bioinformatics. 2004]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]Science. 2004 Sep 17; 305(5691):1743-6.
[Science. 2004]J Bioinform Comput Biol. 2004 Mar; 2(1):127-54.
[J Bioinform Comput Biol. 2004]Pac Symp Biocomput. 2001; ():127-38.
[Pac Symp Biocomput. 2001]Protein Sci. 1995 Aug; 4(8):1618-32.
[Protein Sci. 1995]Bioinformatics. 2002; 18 Suppl 1():S22-30.
[Bioinformatics. 2002]J Mol Biol. 2004 Nov 12; 344(1):59-70.
[J Mol Biol. 2004]Comput Appl Biosci. 1996 Oct; 12(5):375-81.
[Comput Appl Biosci. 1996]Comput Appl Biosci. 1996 Oct; 12(5):405-13.
[Comput Appl Biosci. 1996]Comput Chem. 1999 Jun 15; 23(3-4):191-207.
[Comput Chem. 1999]Bioinformatics. 1999 Jul-Aug; 15(7-8):654-68.
[Bioinformatics. 1999]Bioinformatics. 2005 Mar 1; 21(5):601-7.
[Bioinformatics. 2005]Genome Res. 2002 Mar; 12(3):462-9.
[Genome Res. 2002]Curr Biol. 1998 Sep 24; 8(19):R683-6.
[Curr Biol. 1998]Gene. 2000 Aug 8; 253(2):117-36.
[Gene. 2000]Mol Cell. 2005 Jul 22; 19(2):147-57.
[Mol Cell. 2005]Mol Psychiatry. 2005 Jan; 10(1):69-78.
[Mol Psychiatry. 2005]Hum Mol Genet. 2006 Apr 15; 15 Spec No 1():R17-29.
[Hum Mol Genet. 2006]Sci China C Life Sci. 2006 Jun; 49(3):201-17.
[Sci China C Life Sci. 2006]Hum Mol Genet. 2006 Apr 15; 15 Spec No 1():R17-29.
[Hum Mol Genet. 2006]Mol Psychiatry. 2005 Jan; 10(1):69-78.
[Mol Psychiatry. 2005]Ann N Y Acad Sci. 2006 May; 1067():461-8.
[Ann N Y Acad Sci. 2006]