• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS ONE. 2008; 3(8): e3105.
Published online Aug 29, 2008. doi:  10.1371/journal.pone.0003105
PMCID: PMC2518617

Systems Biology of the Clock in Neurospora crassa

Nicholas James Provart, Editor

Abstract

A model-driven discovery process, Computing Life, is used to identify an ensemble of genetic networks that describe the biological clock. A clock mechanism involving the genes white-collar-1 and white-collar-2 (wc-1 and wc-2) that encode a transcriptional activator (as well as a blue-light receptor) and an oscillator frequency (frq) that encodes a cyclin that deactivates the activator is used to guide this discovery process through three cycles of microarray experiments. Central to this discovery process is a new methodology for the rational design of a Maximally Informative Next Experiment (MINE), based on the genetic network ensemble. In each experimentation cycle, the MINE approach is used to select the most informative new experiment in order to mine for clock-controlled genes, the outputs of the clock. As much as 25% of the N. crassa transcriptome appears to be under clock-control. Clock outputs include genes with products in DNA metabolism, ribosome biogenesis in RNA metabolism, cell cycle, protein metabolism, transport, carbon metabolism, isoprenoid (including carotenoid) biosynthesis, development, and varied signaling processes. Genes under the transcription factor complex WCC ( = WC-1/WC-2) control were resolved into four classes, circadian only (612 genes), light-responsive only (396), both circadian and light-responsive (328), and neither circadian nor light-responsive (987). In each of three cycles of microarray experiments data support that wc-1 and wc-2 are auto-regulated by WCC. Among 11,000 N. crassa genes a total of 295 genes, including a large fraction of phosphatases/kinases, appear to be under the immediate control of the FRQ oscillator as validated by 4 independent microarray experiments. Ribosomal RNA processing and assembly rather than its transcription appears to be under clock control, suggesting a new mechanism for the post-transcriptional control of clock-controlled genes.

Introduction

To explain how a complex trait works, systems biology begins with organizing macromolecules into a genetic network [1]. The biological clock is an example of how a complex trait with numerous pleiotropic phenotypes can emerge from the interaction of only a few regulatory macromolecules. For two reasons much of what we know about the clock at the molecular level comes from the study of the filamentous fungus, Neurospora crassa [2]. First, this complex trait is easy to observe and to manipulate in N. crassa (Fig. 1). Second, as a well-studied microbial system, it has been possible to identify three molecular building blocks of the clock, the genes white-collar-1 (wc-1), white-collar-2 (wc-2), and frequency (frq). The genes wc-1 and wc-2 encode PAS-domain containing transcription factors [3] that turn on the clock oscillator. The WC-1 protein also acts as a blue-light receptor [4]. The gene frq encodes the clock oscillator FRQ [5] and is activated by the WHITE-COLLAR transcription factor protein complex WCC = WC-1/WC-2. The FRQ protein in turn appears to function as a cyclin to recruit an as yet to be identified kinase/phosphatase pair for the phosphorylation-dependent inactivation of WCC [6].

Figure 1
The clock of N. crassa is remarkably adaptive in its entrainment to varied artificial days.

This information enabled formulation of the detailed genetic network shown in Fig. 2 that explains how the clock functions [7]. In this network model, the WCC protein activates the oscillator gene frq. The active frq1 gene is then transcribed into its cognate mRNA frqr1, which in turn is translated into its cognate protein FRQ. The FRQ protein, in turn, deactivates the WCC in the P reaction. FRQ thereby closes a loop of dynamical frustration wherein WCC turns on the oscillator gene whose product shuts down the activator WCC. This dynamical frustration (i.e., negative feedback loop between WCC and FRQ) explains in part how clock oscillations arise [7]. In addition, WCC activates a number of clock-controlled genes (ccgs) that serve as outputs of this clock mechanism. The number of these ccgs in the genome, and hence the extent of clock control over metabolism, is largely unknown but see [8].

Figure 2
A genetic network for the biological clock from [7].

Our goal is to refine systematically the genetic network model of the clock mechanism [7] and to explore the metabolic context of the clock. To achieve this goal Locke et al. [9] proposed using an iterative process of modeling and experimentation to identify and validate genetic networks. Along these lines, we introduce a model-driven discovery process called Computing Life in Fig. 3 [1], [10]. In this paradigm, a cycle of modeling and genomics experiments are used to identify and, with each cycle, tighten our estimates on model parameters and on model predictions for the biological clock. The biological system is first perturbed. Measurements on all relevant species are made by RNA and protein profiling [1]. An ensemble of genetic network model parameters is generated for the process of interest [11][12], [7]. Predictions are made from the model ensemble and compared with available data. Revision of the model then poses the difficult choice of what perturbation experiment is to be done next to improve maximally our knowledge of the genetic network?

Figure 3
Computing Life Paradigm.

One approach to the problem of “informative” experiment design has been to assume that genetic networks are in steady state and/or are linear, and under these conditions predictions are made about the next round of perturbations [13][16]. This cannot be done here because the biological clock is usually not in steady state but rather approaching a stable limit cycle [17]. Also, the steady-state approach discards most information contained in observations on network dynamics, i.e., its time-dependent behavior. Another approach is to generate an entire compendium of profiling experiments for varied genetic and environmental perturbations [18]. Such profiling experiments are costly, however, and it is desirable that every experiment, at each stage, be maximally informative about the underlying genetic network. Here, we describe how this process of choosing the Maximally Informative Next Experiment (or MINEing) can be guided by the continuously refined network model in an intelligent and cost-effective way while fully exploiting the information contained in the observed network dynamics. Our tracing of three cycles through the Computing Life paradigm in the context of refining our network model for the biological clock's mechanism in Neurospora crassa illustrates this approach.

Materials and Methods

Describing the genetic network

All stages of the Computing Life paradigm in Fig. 3 involve the use of the genetic network. The methods of describing, fitting, predicting with, and evaluating the genetic network are first described, and then we continue to trace the methodology used to complete the cycle in Fig. 3, providing a methodological walk through the Computing Life paradigm in Materials and Methods.

Kinetics model and the model ensemble

The starting point for our MINE design approach is a kinetic rate equation model for the time-dependence of the molecular species concentrations in the network, based, e.g., on mass-action kinetics. The model in Fig. 2, for example, specifies a system of 16 ordinary differential equations (ODEs) that describe the temporal profiles of genes and their products. In general, these ODEs have the form:

equation image
(1)

where X[equivalent][X1, … XN]T is a N×1 vector of species concentrations, with N denoting the number of molecular species evolving according to the kinetics rate equations; G[equivalent][G1, … GN]T specifies the kinetics, i.e., Gn(X,t;θ,u) is the net rate of production of species n at time t, given the species concentrations X. The model parameter vector or, for the short, “the model” θ[equivalent][θ1, …θM]T in G is the M×1 vector of all unknown model parameter variables, including, for example, unknown reaction rate constants, species' initial concentrations and unit conversion factors. The rate functions G also depend on an array of “control variables” which are known and can be varied by the experimenter. These control variables specify, for example, the nature of the perturbations and external conditions to be applied to the biological system or, more generally, the experiment to be done. This array of control variables is denoted by a vector u (of unspecified dimension) and, for short, is referred to as “the experiment” in the following.

Fitting, Predicting with, and evaluating ensembles of genetic networks

Profiling experiments tend to generate data at only a few time points, the reaction networks are large, and their kinetics models are rich in unknown θ-parameters. An ensemble method of genetic network identification [11], [12], [19], [20], [7] is therefore used to constrain the model parameters θ, using the model likelihood or some other criterion to select members of the model ensemble. The model ensemble Q(θ[center dot]) is a probability distribution on the parameter space of rate coefficients and initial species concentrations [21]. When viewed as a function of θ, the ensemble Q(θ) can be the likelihood function. This model ensemble summarizes what we know and, equally importantly, what we do not know about the biological network, given the prior or “old” experimental data. We refer to Ref. [7] for a detailed description of the construction of Q(θ[center dot]) from prior experimental data and its numerical implementation by way of a Metropolis Monte Carlo ensemble simulation algorithm ens.f90.

With the ensemble in hand it is possible to make predictions from the ensemble means of the species concentrations, as shown in Results. It is also possible to take the expected species trajectories of the ensemble member θ and compare them to the observed species trajectories using a figure of merit, such as the likelihood Q(θ) or χ2 = −2 lnQ(θ)+const, to evaluate goodness of fit of the ensemble as described in [11], [7]. A direct graphical evaluation of the ensemble's goodness of fit by can be assessed also by overlaying the observed trajectories of species concentrations onto the ensemble mean trajectories +/− the ensemble standard errors in these mean trajectories, as again shown in Results.

Selecting an optimal perturbation

The next stage in the Computing Life paradigm is selecting a perturbation in Fig. 3. We describe for the first time a novel method for selecting an optimal perturbation involving evaluating the Maximally Informative Next Experiment.

MINE design as “microscopy” in model space

For a given choice of model θ, let f(θ,u) denote a kinetics model prediction for a single species log-concentration, log(y), to be measured for a single time point by the next profiling experiment, i.e., y is one of the elements of X, to be measured at some specific observation time t. The vector u, as explained above, comprises all control variables which are known and describe the externally imposed conditions of the experiment. However, u should now be understood also to comprise all control variables defining the specific data point y to be measured, including, for example, the choice of molecular species to be observed and the time of observation. We will need to generalize this notation when the planned next experiment measures multiple variables y1,…yd. Let F(θ, U): = [f(θ, u1),…f(θ, ud)]T denote d×1 vector of the corresponding predicted log-outcomes and U: = [u1,…ud] the (super-)vector of corresponding control parameter vectors ui where ui specifies the control variables for the measurement of the data point yi for i = 1, …d. The log-variables to be observed, log(yi), will also be referred to, for short, as the “observables” in the following and U, for short, as “the next experiment”. We are using log-concentrations instead of the concentrations themselves here, in order to obtain scale-free (i.e. concentration-unit-independent) MINE criteria, as explained below.

Clearly, the question of which next experiment U is “maximally informative” is not a mathematically well-defined problem. We have to make an ad hoc choice for a design criterion and then try it out in real-life applications. The basic conceptual ideas underlying this ad hoc construction of a MINE criterion are borrowed from microscopy: we want to use whatever experimental technique is available to us to “look into” or “image” the inner workings of the cell. A microscope generates images of the cell's material components in three-dimensional physical space or in some lower-dimensional projection thereof. Profiling experiments, by analogy, generate images of the cell's (very!) high-dimensional kinetics parameter space.

Ideally, we would like to be able to obtain highly resolved images, allowing us to determine accurately a genetic network's “location” in kinetics parameter space, specified by a unique choice of parameter vector θ. Unfortunately, and again in analogy to microscopy, the images we do get from present-day profiling experiments do not allow us to completely re-construct θ: our “vision” in θ–space is seriously blurred. The model ensemble Q(θ) captures the constraints imposed on θ, i.e., what we know; but the spread of Q(θ) within those constraints in θ-space, also captures the blurring, i.e., what we do not know, given the prior experimental data. Our goal is therefore to reduce this blurring as much as possible when performing the next experiment: we want to tune our “model parameter microscope” to get a different view of θ-space with the maximum possible resolution.

An important aspect to keep in mind here is that every imaging procedure, be it optical microscopy or RNA profiling, requires a mathematical model which relates the observed image (F) to the underlying object (θ). The mapping function F(θ,U) captures that imaging model. Without such an imaging model, we cannot, for example, re-construct the shape, size and location (θ) of a cellular organelle from the light intensity pattern (F) of the cell's magnified image produced by an optical microscope. In the case of optical and electron microscopes, the imaging model is, by now, well-established, highly reliable, and commonly known as physical optics. In the case of profiling experiments, an appropriate imaging model framework may well be mass balance kinetics, but the details, as illustrated by Fig. 2 or or44 are still very much subject to debate.

Figure 4
Alternate genetic network for the biological clock from [33].

Since the profiling experiment is sparse and noisy, we do not have a sufficient amount of sufficiently diverse experimental data to “look in all directions” of the kinetics parameter space. Each experiment only yields a (in general non-linear) projection of object points θ in the M-dimensional kinetics parameter space onto the image points F(θ,U) in d-dimensional image space. Sparsity and noise imply that typically only a lower dimensional image sub-space, of dimension deff<M, can actually be resolved by the experiment. The MINE design procedure cannot eliminate the blurring of our vision; but it can help to minimize the blur.

Criterion 1: MINE by maximal distance in image space

To develop these notions into a quantitative MINE criterion, let us first consider the simplest case: the design of a Maximally Informative Next Experiment to measure only a single data point y. Suppose we randomly draw two possible choices of models from the model ensemble Q, denoted by θ and θ', which both give predictions consistent with the “old” experimental data (within the experimental uncertainties). To distinguish between these two choices, we want to perform the next experiment with control vector u. The predicted outcomes for this next experiment would be, respectively, f(θ[center dot],u) and f(θ',u). The crucial point to notice here is this: the more these two predicted outcomes f(θ,u) or f(θ',u) differ from each other, the “better” the next experiment will allow us to discriminate between the two model choices. As a “metric” of the difference between the two members of the ensemble, we could choose, for example,

equation image
(2)

The Maximally Informative Next Experiment u is then the one that maximizes this difference metric. Letting the joint distribution of the randomly drawn pair Q(θ,θ') = Q(θ)Q(θ'), the foregoing criterion can be applied to an ensemble of models by choosing u such that it maximizes the average of Vθ[center dot]'(u):

equation image
(3)

Where ∫θ denotes integration or summation over all θ-components and E[…] denotes the mean over the ensemble probability distribution Q(θ). Our MINE criterion V(u) is then the variance in our prediction within the ensemble and can be evaluated by Monte Carlo (MC) methods [11]. To achieve maximum “resolution” in model (θ-) space, we thus “tune” the next experiment u so that it will take us into those regions of prediction (f-) space where there is the most uncertainty: the MINE design criterion [3] rationalizes and advocates discovery.

A straightforward generalization of the foregoing criterion to the case of the next experiment measuring multiple variables y1,…yd is to replace the square of the predicted one-dimensional “image difference”, Δf(θ,θ',u): = f(θ,u)−f(θ',u), in [2] by the corresponding squared “length” of the d-dimensional “image difference vector”

equation image
(4)

That is, we replace Vθ,θ[center dot]' (u) in [2] by

equation image
(5)

where |…| denotes the Euclidean norm, i.e., |Φ|: = TΦ)1/2 for Φ = 1,…Φd]T. Inserting this into [3] we get the MINE design criterion

equation image
(6)

It is easy to see from [6] that maximizing this V(U) will collapse all ui at the same u-point where the individual variance in the 1-dimensional (single-variable) prediction space, E[|f(.,ui)|2]−|E[f(.,ui)]|2, is largest. So, this MINE criterion would demand that the next experiment simply observe the same y-variable d times, instead of observing d independent y-variables. Clearly, this criterion lacks the ability to enforce independence of multiple observables.

Criterion 2: MINE by maximal volume in image space

To construct a likely more useful MINE criterion, which does enforce some measure of independence of the observables, we are again guided by the microscopy analogue. Suppose we are “viewing” a certain “object space volume” νo through our microscope. By way of the mapping function, this produces from νo an “image difference volume” νΔ in d-dimensional image difference space. That is, νΔ is the volume swept out by the image difference vector ΔF(θ,θ',U) for all pairs of object points (θ,θ') in νo×νo; or, formally, νΔ(νo,U): = ΔF(νo,νo,U). Our notation makes it explicit here that νΔ depends on the choice of the control vector U, as well as on νo.

To formulate an improved MINE criterion, we propose to invoke the volume νΔ swept out by ΔF, instead of the Euclidean norm of ΔF used in [6]. The basic microscopy-inspired idea here is this: the greater the volume amount contained in νΔ(νo,U), the more detail we should be able to discern in νo. In other words, we should be able to gain more information about the contents of νo if we tune our microscope's control vector U so as to increase the d-dimensional image difference volume amount, denoted by |νΔ(νo,U)|. However, unlike the Euclidean distance criterion [6], the requirement of sweeping out a higher-dimensional volume νΔ will naturally enforce a certain degree of independence of the observables. Notice here that the Euclidean norm measures just the length of the ΔF-vector and this can be maximized even if ΔF sweeps out only a 1-dimensional sub-manifold. By contrast, νΔ is by construction a higher-dimensional manifold with a dimensionality of up to d or M, whichever is less.

The next question is then how to choose an appropriate νo, or the corresponding νΔ, in terms of the ensemble pair distribution Q(θ,θ') = Q(θ)Q(θ'). Again, this requires an ad hoc decision and we are guided in making it by computational expediency. In fact, the foregoing considerations of constructing a νΔ from an underlying νo in object (θ-) space should only be regarded as a heuristic motivation for introducing such a νΔ. As a practical matter, our νΔ-based MINE approach is greatly simplified if we do not try to construct an exact νΔ from a given νo. Rather, we will define a “representative” νΔ, swept out by ΔF(θ,θ',U) when θ and θ' are drawn from “typical” values prescribed by the ensemble pair distribution Q(θ,θ') = Q(θ)Q(θ'). This νΔ will be constructed from the characteristic variance/co-variance ellipsoid of ΔF and it will be again dependent on the control vector U.

To that end, it is conceptually (but not computationally) useful to first define the ensemble distribution of ΔF

equation image
(7)

where Φ: = 1, …Φd]T is any point in ΔF-space and δ(…) is the Dirac delta-function in d dimensions. QΔ(Φ,U) is the probability density for ΔF(θ,θ',U) to take on the value Φ, given that θ and θ' are independently distributed according to Q(θ) and Q(θ'), respectively. QΔ(Φ,U) defines an effective νΔ(U) in the image difference (ΔF-) space by way of the characteristic ellipsoid of ΔF's d×d variance/co-variance matrix D(U), given by

equation image
(8)

Note that ΔF's characteristic variance/co-variance ellipsoid is centered at the origin, Φ = 0, since QΔ(Φ,U) is even in Φ. i.e., QΔ(−Φ,U) = QΔ(Φ,U) due to ΔF(θ',θ,U) = −ΔF(θ,θ',U). The squared half-axis lengths of the characteristic ellipsoid are the D-matrix eigenvalues, with corresponding eigenvectors defining the respective half-axis orientations. The ellipsoid's half-axes are orthogonal to each other and they define a rectangular prism in ΔF-space whose volume amount, by a universal constant prefactor, is proportional to that of the ellipsoid. Instead of the ellipsoid itself, we therefore choose this “variance/co-variance prism” of ΔF as our image difference volume νΔ(U). The square of its volume amount, |νΔ(U)|2, is the determinant of D(U) and this is what we can use as a possible MINE criterion to be maximized:

equation image
(9)

This is sometimes referred to as the generalized variance, and its distribution is known exactly if the predictions are Gaussian over the ensemble [22, Th. 3.2.15.]

We should strongly emphasize here that our invocation of the (co-)variance ellipsoid of ΔF does not imply or require the ΔF-distribution QΔ(Φ,U) to be Gaussian. Via the matrix D(U) in [8], such an ellipsoid can be constructed for any QΔ(Φ,U). Given the ad hoc character of the entire MINE approach, the resulting νΔ is “as good as any” for purposes of representing a “characteristic volume” swept out by ΔF(θ,θ',U). The main advantage of this construction is its computational feasibility: combining [4], [7] and [8], we can re-write Dik(U) in terms of ensemble means over Q(θ) as

equation image
(10)

which can be calculated by ensemble Monte Carlo evaluation [7], [11] of the required ensemble means E[…]. We also note here that the D-matrix, as well as the E-matrix defined below, is scale-free, that is, independent of the choice of model concentration units, since the definition of Dik(U) invokes only ΔF(θ,θ',U) [4] which involves only the differences of log-concentrations or, equivalently, logs of only concentration ratios. Hence, the Dik(U) matrix elements, the corresponding Eik(U) matrix elements defined below, and all MINE criteria developed here based on these two matrices are scale-free.

Hilbert Space picture of MINE formalism

The foregoing results can also be re-stated in terms of a Hilbert Space (HS) formalism, using a HS of functions defined on the kinetics model parameter (θ-) space where the (co-)variance serves as the HS inner product. That is, for any pair of such functions, g(θ) and h(θ), we define the HS inner product by:

equation image
(11)

The log variables log(yi) are now represented by HS vectors fi:

equation image
(12)

and the (co-)variance matrix element Dik is simply the inner product of HS vectors fi and fk,

equation image
(13)

The ensemble standard deviation of the observable predicted by fi is the HS vector norm or “length”, denoted by ||fi||, where |||| is defined by ||g||: = (g|g)1/2. For notational convenience, we are occasionally suppressing the dependence on U or ui for quantities like fi or Dik.

Independence of observables log(y1),…,log(yd) is very naturally represented in this formalism in terms of linear independence of the corresponding HS vector set f1,…,fd. These d HS vectors span a finite-dimensional subspace in HS which has dimension d, if f1,…,fd are linearly independent; else it is less than d. A linearly independent HS vector set f1,…,fd also spans a d-dimensional prism in HS, and the MINE criterion [9] has a simple interpretation in terms of this HS prism: from [13], it is easy to show that det(D) is the square of the volume of this HS prism. Hence, the characteristic (co-)variance prism νΔ in d-dimensional ΔF-space has an alternative (“dual”) representation in terms of the HS subspace prism volume. In contrast νΔ in ΔF-space, the HS prism spanned by f1,…,fd is, in general, not rectangular since f1,…,fd are not guaranteed to be mutually orthogonal with respect to their HS inner product [11].

If the f1,…,fd become linearly dependent, their HS prism collapses to a lower-dimensional one, and det(D) vanishes. In the other extreme, if the observables are uncorrelated, the corresponding f1,…,fd are mutually orthogonal in terms of the HS inner product, i.e., they are maximally independent: their HS prism volume is simply given by the product of their vector lengths ||fi|| and hence det(D) = (||f1||·…·||fd||)2. In general, for correlated observables, the f1,…,fd are non-orthogonal and we have det(D)<(||f1||·…·||fd||)2. The ratio det(D)/(||f1||·…·||fd||)2 can be regarded as a composite measure of the degree of independence of the observables and this ratio, as discussed further below, depends only on the HS “angles” between pairs of fi-vectors, but not on their individual lengths ||fi||. Maximizing V(U) = det(D(U)) therefore requires a compromise between maximal mutual independence of all observables and maximal variance of each individual observable.

We note in passing that the Euclidean distance criterion [6] can also be expressed in terms of the (co-)variance matrix D: the right-hand side of [6] is the trace of D(U). However, in contrast to the volume criterion [9], for the Euclidean distance criterion [6], V(U) is simply the sum of the squared HS vector lengths, i.e., V(U) = trace(D(U)) = ||f1||2+…+||fd||2, and that is maximized when each individual HS vector length is maximal, that is, when each observable has maximal variance, regardless of any co-variance correlations between observables, as already discussed under [6].

Criterion 3: MINE by maximal observational independence

Based on these considerations, we propose one further MINE criterion which more strongly than [9] emphasizes independence of the observables. Instead of the original HS vectors fi, we use normalized HS vectors

equation image
(14)

to define a new “normalized” (co-)variance matrix or correlation matrix, denoted by E, analogous to [13],

equation image
(15)

This is the well known correlation matrix between the predictions [22]. Our proposed third MINE criterion is then to maximize

equation image
(16)

In contrast to [9], the variances of the observables do not affect det(E(U)), only their degree of linear independence does. In HS geometrical language, det(E(U)) is the squared volume of a prism spanned by the HS unit vectors g1,…,gd, and that volume is determined entirely by the pairwise “angular” relations between the gi, not by their individual lengths which are all fixed at ||gi|| = 1.

Such a MINE criterion is likely advantageous in applications where the HS vectors of the observables, f1,…,fd, are “almost” linearly dependent. The greatest gain in information from the next experiment is then likely achieved by improving the independence of the observables, rather than by maximizing their individual variances. This scenario of “almost” linearly dependent (i.e., highly correlated) observables is what we have in fact encountered, consistently, in our MINE calculations for the three Computing Life cycles reported in this work. Maximizing det(E(U)) is therefore the MINE criterion we have employed to guide the design of our cycle 1, cycle 2 and cycle 3 experiments, subject to additional numerical modifications now to be discussed.

There are several additional reasons why the det(E(U)) is the preferred MINE criterion. The correlation matrix is a well known measure of linear dependence between variables (i.e., the predictions F) as well as a well known measure of stochastic dependence when the predictions F are Gaussian over the ensemble. The det(E(U)) MINE measure is bounded between 0 and 1. The value of 1 denotes linear independence, and in the case of Gaussian predictions, complete stochastic independence of the predictions. The value 0 means perfect linear dependence of the predictions, and in the case of Gaussian predictions, perfect stochastic dependence. The measure is familiar and easy to interpret, and finally, the det(E(U)) has well known distributional properties, particularly when the predictions are Gaussian [22].

Volume collapse pathology

The lack of sufficient linear independence of the g1,…,gd HS vector set (or, equivalently, of the f1,…,fd set) is most easily diagnosed numerically by calculating the d eigenvalues of the D-matrix, denoted by λn = λn(U) and enumerated by n = 1,…d in descending order, with corresponding complete, orthonormal d×1 eigenvectors e(n). Since D is non-negative, so are the exact λn. Given the exact λn and e(n), we can decompose D into its eigenvector representation

equation image
(17)

and det(D) is simply the product of the exact λn. However, in our actual MINE calculations, we encounter the numerical difficulty that the HS vector set is numerically “almost” linearly dependent. This numerical pathology manifests itself in the fact that the ratio of smallest eigenvalue λd to largest eigenvalue λ1 becomes of order or smaller than the machine precision εmp. All eigenvalues λn for which the numerical λn1–ratio is comparable to or less than εmp are then dominated by rounding errors, i.e., they are numerically not calculable and neither, therefore, is det(D). In geometrical terms this simply means that, in the D-matrix characteristic ellipsoid, the ellipsoid half-axis along the corresponding eigenvector direction e(n) has “almost” collapsed to zero and is numerically indistinguishable from zero, √(λn) being the length of that half-axis. However, such an almost collapsed ellipsoid still contains useful information about the “range” swept out by the image-difference function ΔF which can be exploited for MINE design.

To remedy the “ellipsoid volume collapse” pathology, we therefore propose to introduce a numerically stable lower cut-off into the eigenvalue spectrum of D, by replacing λn with a modified eigenvalue μn according to

equation image
(18)

with a fixed “cut-off ratio” εcut = 10−10. This is typically at least 2 or 3 orders of magnitude larger than the machine precision εmp. An almost collapsed characteristic ellipsoid is thus “fattened up” to have a half-axis of at least √(εcutλ1) along every eigenvector direction. The numerically inaccessible exact det(D) is then replaced by the numerically stable determinant of a modified D-matrix,

equation image
(19)

with

equation image
(20)

for purposes of MINE calculations. To use the MINE criterion [16] instead of [9], the same cut-off procedure can be employed to generate a modified, numerically stable E-matrix with determinant det(E(cut)(U)). This is what we have actually done in the MINE calculations reported here. We also note in closing that the Euclidean distance criterion [6], V(U) = trace(D(U)), is numerically not affected by the ellipsoid volume collapse: trace(D) is the sum, not the product, of the D-eigenvalues and it is therefore dominated by the numerically well-controlled largest eigenvalues only.

Cycle 1–3 kinetics ensemble simulations and MINE calculations

In all MINE calculations reported here, the observables log(y1),…log(yd) were chosen to be the log-concentrations of the 3 clock RNA species which are represented in our network model shown in Fig. 4: log([frqr]), log([wc-1r]), and log([wc-2r]) where [wc-1r] is the combined total of the “r0” and “r1” versions of the wc-1-RNA, i.e., [wc-1r]: = [wc-1r0]+[wc-1r1]. Each of these 3 RNA concentrations was to be measured at 13 observation time points, tj with j = 1,…13; hence there were d = 3×13 = 39 data points y1,…yd to be observed in the next experiment and the index “i” in the above sections therefore represents both the time index j and the species index n, i.e., i↔(j,n) with n = 1,2,3 for the 3 clock RNA species. The observation times tj are chosen to have equidistant spacing tS, after an initial time lag of tL, measured from the starting time of the experiment, t = 0, where t = 0 is defined by the initial Light-to-Dark (L/D) transition. Hence,

equation image
(21)

In each MINE cycle, an ensemble simulation was performed for the kinetics model for Fig. 4, using the same ensemble simulation procedure as described in [7], to generate a “representative” MC sample of 40,000 random θ-vectors, drawn from the respective ensemble distribution Q(θ) for that cycle. A subset of 200 random θ-vectors from this sample was then used to calculate MC estimates for the ensemble expectation values E[…] for evaluation of the E-matrix via [15].

In the cycle 1 ensemble simulation, Q(θ) was constructed, as described in [7], from the same prior (“old”) experimental input data as shown in Figs. 2 and and44 of Ref. [7], taken from the literature, Refs. [3], [23], [24] and [25]. For the cycle 2 (cycle 3) ensemble simulation, Q(θ) was revised by adding the new experimental data, from the new cycle 1 (cycle 2) experiment to the cycle 1 (cycle 2) prior experimental data set. In addition, the cycle 2 and cycle 3 ensemble simulations included conidiation density data from the 48 hr artificial day, interpreted as a measure of the CCG protein concentration in Fig. 1, as described in [7]. These conidiation data were generated in a race tube experiment with a periodic light/dark (L/D) exposure with a 48 h period and are the data shown in the right-most panel of Fig. 1.

In all new experiments reported here, where light exposure was applied, the light intensity at the sample location was about 70 µmole(photons)/(s·m2) in Einsteinian units, or about 15 W/m2 in radiometric units, or about 5,300lux~490 ft-candles in photometric units, assuming a “cool white” spectral distribution, generated by our fluorescent light source. [The approximate conversion factors are 1 W/m2↔4.622 µmole(photons)/(s[center dot]m2)↔350.7lux = 32.58 ft-candles for the “cool white” spectrum, as given in [26].] By contrast, the light intensity was only 20 µmole(photons)/(s[center dot]m2), with unspecified spectral distribution, for the light exposure experimental data we have taken from the literature [25]. Since our kinetics ensemble simulations for cycle 2 and 3 incorporate both our new experimental data and the literature data [25] into the respective distributions Q(θ), we have assumed that both, the literature experiments and ours, were performed with the same photon spectral distribution. We have therefore modeled all light exposure experiments in terms of photon “concentrations” [entering into the reaction rate function G in [1]] which are chosen proportional to the respective light intensities. For all light-exposed experimental data used [25] or reported here, the light exposure was periodic, starting at the initial (t = 0) L/D transition, with a 50% duty cycle (i.e., the same duration of D and L) and a phasing of either D/L (i.e. dark first, then light) or L/D, as indicated in the discussion of the respective results. This time-dependent light exposure was modeled, as in [7], by a corresponding time-dependent periodic photon concentration of rectangular pulse shape, entering into in the rate function G.

In cycle 1, V(U) = det(E(U)) was maximized with respect to tS and tL, for a “next” experiment designed to measure the three clock RNA species in the dark. The resulting optimal MINE values of tS = 0 and tL = 5 h (see Fig. 5) were slightly modified, to tS = 0 and tL = 4 h, so as to keep the total duration of the cycle 1 experiment below the ~50 h limit imposed by experimental clock stability constraints in liquid cultures. The latter values of tS and tL were then used, without further adjustments, in all subsequent cycle 1, cycle 2, and cycle 3 RNA profiling experiments and in the corresponding cycle 2 and cycle 3 MINE calculations.

Figure 5
MINE calculation to determine when to start sampling (tL) and how often (tS).

In cycle 2, V(U) = det(E(U)) was maximized with respect to the period tP of the alternating light exposure, for a “next” experiment designed to measure the three clock RNA species subject to a 50% duty cycle and D/L phasing. From the resulting optimal range of tP ~20 h–24 h (see Results), tP = 24 h was chosen for the actual cycle 2 experiment.

In cycle 3, V(U) = det(E(U)) was maximized with respect to the “gene knock-down” transcription ratio (TR), and with respect to the choice of the gene species to be knocked down, for a “next” experiment designed to measure the three clock RNA species in the dark, but with one of the three clock genes subjected to an experimentally controlled reduction in its transcription rate coefficient. The resulting “most informative” gene was found to be wc-1, with a transcription rate coefficient reduced to TR = 10% of the wild-type value (see Results). In the actual cycle 3 experiment, this MINE-recommended knock-down value for wc-1 was approximated, within the limitations of experimental control, by a TR of about 30% of the wild-type value. The clock RNA data from this wc-1 knock-down experiment were then incorporated with all other prior (literature, cycle 1 and cycle 2) experimental data into the Q(θ) for one “terminal” ensemble simulation for the network in Fig. 2 and and4.4. The clock RNA results from this terminal ensemble simulation, along with the respective new RNA data generated in the 3 MINE cycles, are shown in Results for the network in Fig. 4.

Perturbing the genetic network

Once the MINE perturbation experiment is designed, the perturbation is implemented at the next stage in the cycle in Fig. 3., as now described.

Strains

All but one strain used (namely OR-74A below) carry a band (bd) mutation, permitting the observation of the clock in race tubes. The bd mutation (Fungal Genetics Stock Center 1858) was used for the first series of microarray experiments in the dark as well as the second series of microarray experiments examining a light-response. Strain 87-84-6-8 [7] carrying a mutation in wc-1 and an inducible copy qa:wc-1+ at the his-3 locus was used in the dial down of wc-1 expression [7]. Dial-down was achieved by shifting liquid cultures from 0.3% quinic acid to 2% galactose. Strain 74A-OR23-1A was used in shift experiments from sucrose (1.5%) to quinic acid (.3%) as a control to identify QA inducible genes [27]. Strain 93-4 (frq) qa:frq+ transformed with pDE3dBHqa:frq+ was kindly provided by Deborah Bell-Pedersen (Biology Department, Texas A & M University) to test for auto-feedback loops in wc-1 and wc-2.

Liquid Growth Conditions for harvesting RNAs

Establishment of liquid cultures followed Nakashima [28] and [5], and the cultures were grown for 48 hrs in petri plates. Half-cm mycelial disks were cut from the mat and dropped into 500 ml flasks with 100 ml Fries+2.0% glucose+.5% arginine+supplements [29].

Cycle 1

In the first series of microarray experiments to identify circadian genes (cycle 1) all flasks were placed in a shaker (New Brunswick Scientific, Edison, NJ, Series 25) at 150 rpm and at 25°C for the same period of time, 50 hrs, and also were given at least 2 hrs of 70 micromoles per second per meter squared (µM/s/m2) before L/D (light to dark) transition. A total of 13 flasks were harvested by vacuum filtration, one flask every 4 hrs starting at time 0, the L/D transition time, in such a way that the total growth time of each liquid culture was kept constant [42]. After the L/D transition initiating the experiment, the flasks were shaken at 150 rpm and left in the dark. Cells were placed at −70°C to await RNA harvesting using a High Pure RNA kit (Roche, Inc.). The harvested RNA was subjected to RT-PCR and then microarray analysis as described below.

Cycle 2

In the second series of microarray experiments to study the light-response (cycle 2) the total growth time was not controlled, and 24 hrs after the L/D transition the light (70 µM/s/m2) was turned back on for 24 hrs. 17 flasks were harvested at time points 0 hr, 4 hrs, 8 hrs, 12 hrs, 16 hrs, 20 hrs, 24 hrs, 24 hrs+20 m, 24 hrs+40 m, 25 hrs, 26 hrs, 28 hrs, 32 hrs, 36 hrs, 40 hrs, 44 hrs, and 48 hrs.

Cycle 3

In the third series of microarray experiments to study the WCC-response (cycle 3), prior to the L/D transition, mycelial disks were transferred into 500 ml flasks with 100 ml Fries+0.3% quinic acid [29]. Fourteen flasks received a total of 4 hrs of 70 µM/s/m2 before L/D transition. Cultures were transferred by vacuum filtration to new 500 ml flasks with 2% galactose+Fries medium and placed in the dark. Flasks were harvested at 0, 10 m, 20 m, 30 m, 40 m, 50 m, 1 hr, 2 hrs, 4 hrs, and 8 hrs. Four additional replicate 0 time points were harvested as well.

Observing the outcome of the perturbation

Observing the outcome of the perturbation completes the cycle through Fig. 3. Measurements are made on the system by a combination of race tube assays, real-time polymerase chain reaction (RT-PCR), and profiling with oliginucleotide arrays, as now described.

Race tube assay for biological clock

Starter cultures were made on 0.1% glucose+0.17% arginine+Vogel's medium [29] and subject to a 23 µM/s/m2 light source. Conidia were filtered with glass wool as described in [30] and used to inoculate race tubes layered with 20 ml of 0.1% Glucose+0.17% arginine+Vogel's medium or 0.001 M quinic acid+0.17% arginine+Vogel's medium [29]. A total of 135 replicates tubes were inoculated and either subject to low (23 µM/s/m2) or high intensity pulses (70 µM/s/m2) once per hour for 90 sec over a twelve hour period or subjected to no light pulses to measure period and a phase response as described in [30] and [31].

RNA isolation

RNAs were isolated using the High Pure RNA isolation kit (Roche, Inc.). Their quality and quantify was assessed using an RNA Nano LabChip (Agilent Technologies, Inc.). Generally only samples with a ratio of at least 1.00 for 28S/18S rRNA on the LabChips were used.

Real-time PCR to validate microarray experiments

Results on Combimatrix chips were cross-validated by real-time polymerase chain reaction (RT-PCR). cDNAs were synthesized from 1.6 µg total RNA using a High-Capacity cDNA Archive (synthesis) kit (Applied Biosystems, Inc.). The wc-1, wc-2, frq, kal-1, rpn-4, rrg-1, pab-1, rok-1, lhp-1, and rRNA cDNAs were detected by RT-PCR (ABI-Prism 7500, Applied Biosystems, Inc.) according to manufacturer directions using Taq Man probes against an rRNA standard. Triplicate reactions (50 µl) were analyzed using the ΔΔC. method (Applied Biosystems, Inc.).

Design of 12K Oligonucleotide Arrays (Combimatrix, Inc.)

These arrays were constructed with Version 3 of the Neurospora crassa genome sequence [32] from a file labeled neurospora_3_gene_dna_3205.txt downloaded from the Broad Institute Web site. From this sequence Combimatrix, Inc. designed 12,000 oligonucleotides of ~35 nt to synthesize electrochemically on their chips. Several genes were represented multiple times with oligonucleotides derived from the following genes: wc-1 (5 duplicates); wc-2 (5); frq (5); rDNA (>7), qa-x (5); qa-2 (5), qa-4 (5); qa-3 (5); qa-y (5); qa-y (5); qa-1F (5); qa-1S (5). In addition, 633 negative control oligonucleotides were added from bacterial, plant, and λ-phage sequences as well as features with no oligonucleotides and quality control oligonucleotides. The rDNA derived oligonucleotide sequences were treated as negative controls in the microarray experiments as well because none of the at least eight rDNA-derived oligonucleotides on each chip showed up in an expanded Fig. 6 with 4721 circadian genes [33] with or without a known LRE upstream. In addition 22 duplicate oligonucleotides from each of 4 distinct λ-phage sequences were added to be used as positive controls for all but 4 arrays (by spiking them into the amplified RNA (aRNA) probe). These positive and negative controls were scattered at random on each array. The arrays with their design are at http://www.yale.edu/townsend/Links/ffdatabase/introduction.htm [34]. Their accession numbers are 13, 34, and 36 for cycle 1, cycle 2, and cycle 3, respectively. The same design was used on all Combimatrix chips except for samples 48 hrs, 44 hrs, 40 hrs, and 36 hrs on cycle 1.

Figure 6
Transcriptional profile of approximately 2436 putative genes with LREs upstream at 0, 4, 8, …, 48 hrs (values staggered on x-axis) after shift from light to dark (L/D) after background subtraction, normalization within arrays relative to the grand ...

RNA amplification and oligonucleotide array hybridization

750 ng of total RNA (as determined by a Nano LabChip (Agilent Technologies, Inc.)) was subjected to one round amplification using the MessageAmp aRNA Amplification Kit (Ambion Inc), which uses an “Eberwine type” amplification. Biotin-11-UTP and CTP (Enzo Life Sciences, Inc.) were incorporated during the in vitro transcription reaction. A total of 5 µg of amplified RNA (aRNA) was fragmented, A total of 10 pM each biotinylated spike-in oligonucleotide (phage) was added with hybridization solution, and hybridized according to manufacturer's protocol rev 2.03 (http://www.combimatrix.com). Hybridization was performed at 45°C for 24 hrs using a 25% formamide based solution. Washing was done according to manufacturers protocol rev 2.03. Streptavidin Alexa Fluor® 647 conjugate (Invitrogen) was used at a final concentration of 1.0 µg/ml to visualize hybridized targets. Laser confocal scanning was performed on a GSI Lumonics ScanArray 5000 (now manufactured by Perkin-Elmer, Inc.) using a single laser power and a photomultiplier (PMT) gain setting adjusted less than 10% between arrays. Versions of image software MI_Version_5_4_3, MI_Version_5_5_0, MI_Version_5_6_0, and MI_Version_5_7_0 (Combimatrix, Inc.) were used to obtain spot intensities (such as median foreground count) on each array feature for microarray analysis.

Quality Control on RNAs

RNA samples was confirmed to have a ratio of at least 1.00 for 28S/18S rRNA on the LabChips (Agilent Technologies, Inc.). RNAs used for aRNA synthesis and hybridization to chips were visually scanned for trends in the foreground median count in control sequences in the (x,y) coordinates. For the 4 λ-oligonucleotides spiked into each aRNA, the coefficient of variation (CV) in median foreground count was computed, and if the chip had a CV greater than 0.65 (n = 88), the sample was usually not used and redone. All chips were verified to have 51% of its features above median background (with the exception of samples 48 hrs, 44 hrs, 40 hrs, and 36 hrs on cycle 1, which were at 44–45%). This percentage (51%) of identified genes with expression above background is higher than that reported (38%) in [35] and close to the 60% reported in [36]. RT-PCR was done in parallel on all RNA samples, and if the relative quantification by RT-PCR with ΔΔC method did not agree quantitatively with the median foreground counts obtained on wc-1, wc-2, and frq from the Combimatrix chip hybridizations, the aRNAs and hybridizations to oligonucleotide arrays were redone. Two samples on cycle 1 and two samples on cycle 2 required redoing. Four samples of cycle 3 required redoing because of gradients on the foreground median counts on the arrays. Data are deposited in the Neurospora crassa functional genomics database at http://www.yale.edu/townsend/Links/ffdatabase/introduction.htm [34] under accession numbers 13 (cycle 1), 34 (cycle 2), and 36 (cycle 3).

Microarray Analysis

The median foreground (FG) counts were used on all 12K features. From each median foreground count on an oligonucleotide array a background subtraction was performed using the 5th percentile of the following negative control oligonucleotide features: (1) plant; (2) bacterial; (3) phage not spiked into aRNA; (4) quality control oligonucleotides (QC); and (5) empty. Then the median foreground counts were normalized within arrays by multiplying each feature's median FG count on a particular chip x (median FG across all chips in the cycle) / (median FG count on the particular chip). A MIPS functional classification was assigned each feature on a chip [37]. Hierarchical clustering of genes was implemented using the methods proposed in [38] and implemented in Cluster 3.0 [39] available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Options selected for analysis were log transformation, mean-centering and normalization followed by average linkage (i.e., UPGMA) using Euclidean distance. Trees were displayed with Java TreeView 1.0.12 [40] available at http://treeview.sourceforge.net.

Searching for WCC and QA-1F binding sites in silico

Putative WCC GATX-binding sites or Light-Response Elements (LREs) were identified with the program pattern (Accelrys, Inc.) operating on the 1000 nt upstream of each identified gene [32] from file neurospora_crassa_#10BD2C.fasta from the Broad Institute Web site. The offset used was 1 and overhang 0. A mismatch of 2 was allowed. Patterns searched for were:

  • GATG{5,20}GATG{5,20}
  • GATC{5,20}GATC{5,20}
  • GATA{5,20}GATA{5,20}
  • GATT{5,20}GATT{5,20}.

Putative QA-1F-binding sites were identified with the program pattern (Accelrys, Inc.) operating on the 1000 nt upstream of each identified gene [32] from file neurospora_crassa_#10BD2C.fasta from the Broad Institute Web site. The offset used was 1 and overhang 0. Patterns [41] searched for were:

GGATAA{4}TTATCC, GGRTAA{4}TTATCC, GGGTAA{4}TTATCC, GGATAA{4}TTATCC, GGGTAA{4}TTAAGC, GGTTAT{4}TCATCC, GGATGA{4}TTAACC, GGCTAA{4}TTAACA, GGGTAA{4}TTTTCC, GGCAAA{4}TCATCC, GGATAA{4}TAACCC, GGGGAA{4}TTATAG, GGATGA{4}TTCTCC, GGCGAA{4}TTACCC, CGTTAA{4}TTATTC, and GGCTCA{4}TCATCA.

Results

Genetic networks for the biological clock guide the MINEing for clock-controlled genes

The genetic network models in Figs. 2 and and44 [7], [43] make three predictions about each gene in the genome under clock-control: a clock-controlled gene should: (1) maintain an endogenous circadian rhythm when the organism is grown in the dark; (2) be light-responsive when the organism is moved from Dark to Light (D/L); (3) change its expression when the level of the transcription factor WCC is dialed down (see Materials and Methods). The Computing Life paradigm is used below to discover these clock-controlled genes.

Maximally Informative Next Experiment (MINE)

The objective of the MINE approach is to develop a quantitative criterion (or criteria) for the amount of additional information that can be gained about the genetic network from the “next” experiment to be performed; and then to maximize this “measure of additional information”, denoted by V(U), with respect to the choice of the design or “control” parameters of the next experiment, denoted by the control vector U. Control vector U comprises all those parameters which are known to, and are to some extent controllable by, the experimenter and which completely characterize measurements to be performed and the external conditions and perturbations applied to the biological system during the experiment. Two critical inputs for the MINE calculation are the underlying network kinetic rate equation model of the genetic network and any available “prior” or “old” experimental data. In a recently developed ensemble simulation approach [7], [11], these two inputs are combined both to constrain the unknown kinetics model parameters and to predict the likely information content V(U) for the next experiment, given U. Technical details and underlying conceptual ideas of the MINE approach are described in the Materials and Methods. Here we have used one of the MINE criteria in the Materials and Methods, V(U) = det(E(U)), Eq. (16), to guide the design of new experiments on the biology of the clock. This criterion is the determinant of the ensemble correlation matrix E(U) between predictions. The predictions here are of the log concentrations of wc-1, wc-2, and frq mRNAs over time in the next experiment.

When the predictions of two models in the fitted ensemble (the collection of all models consistent with available data) are highly correlated, the models will be difficult to distinguish by the next experiment U; when the predictions of two models in the ensemble are less correlated, they will be more easily distinguished in the next experiment U. A higher value of the MINE criterion V(U) recommends the experiment U for which predictions between any two randomly selected models in the fitted ensemble are more uncorrelated and hence more distinguishable. Each MINE calculation is done within the constraint of a fixed budget (i.e., 13 microarray chips per experimental cycle or equivalently, 13 time points to be sampled). The budget and hence the number of time points determine the dimension of the correlation matrix E(U).

Two possible hypotheses have been developed for the clock mechanism in Figs. 2 and and4,4, from [7] and [33]. An older and slightly more realistic genetic network [33], [43] in Fig. 4 was used to guide the MINEing because the simpler genetic network in Fig. 2 was developed while the MINE experiments were underway. The older network [33] allows a light and dark form of WCC [4]. At the conclusion of the Computing Life enactment, these two different networks are tested against each other.

Cycle 1 - Which genes are circadian?

The first series of microarray experiments were designed to determine how many genes are under clock-control. If such genes were outputs of the clock mechanism in Fig. 4, then they should be able to maintain an endogenous rhythm of ~22 hrs (hrs) in the dark. The first experiment involves growing the organism in the dark for 48 hrs to observe the endogenous rhythm. The initial MINE design question concerns how often should we sample and when should we start sampling. The spacing between observations is denoted by tS, and the delay till the first observation by tL. The maximum in spacing (tS) is limited by the time over which circadian rhythms are maintained in liquid culture and the cost constraint of 13 microarray chips (see Materials and Methods). A MINE calculation using published data [3], [23][25] results in Fig. 5, based on the genetic network in Fig. 4.

The “best” experiment, with maximum det(E(U)) – in the upper back corner of Fig. 5 – is to start sampling immediately and to use the maximum spacing of 4 hrs between observations. This was the microarray experiment, performed (see Materials and Methods) with the results shown in Fig. 6. These experimental results would suggest that as many as 43% of the genes could be clock-controlled. A more detailed statistical analysis below reduces this percentage to 25%. There are 2436 (22%) circadian genes with light response elements (LREs) upstream [4] out of 11,000 genes, which is still considerably higher than 2–10% of circadian genes reported for Drosophila [44], [45] and Arabidopsis [46][48], and 10% higher than that reported in [8] for Neurospora. Our percentage, however, is not out of line with estimates of 36% based on in vivo enhancer traps in Arabidopsis [49].

In addition to the 11,000 N. crassa genes on each chip (including 43 genes used as positive controls), the chips carried 633 negative control oligonucleotide sequences including those derived from plant, bacterial, phage, and N. crassa rDNA sequences. The empirical false positive and false negative rates are reported in Table 1 for each microarray experiment. For the first microarray experiment (cycle 1), the empirical false positive rate is 18% with the nominal significance level of the periodicity test used. Of these 4721 circadian genes, 2436 of them have a LRE upstream [4]. With a multiple test correction suggested for microarray analysis by Storey [50] and implemented as in Benjamini and Hochberg [51], the number of circadian genes with upstream LREs drops to 1460. In this multiple test correction the ranked list of genes sorted by their P-values is simply trimmed from the high end near the nominal significance level (of P-values) to control the False Discovery Rate (FDR) as described in [52]. The target FDR is set to the nominal significance levels in Table 1. If we subtract the 18% of false positives, then 43−18 = 25% of the genome would appear to be under clock-control. The estimated percentage of circadian genes (25% corrected for false positives) is close to the uncorrected percentage of circadian genes with LRE(s) upstream (22%). Requiring the presence of an upstream LRE appears to be a good filter for circadian genes.

Table 1
Observed fraction of false positives and false negatives among 633 negative controls on each microarray chip (see Materials and Methods) and among 43 distinct genes as positive controls using reported clock-associated genes [93].

Circadian oscillations are seen in Fig. 6 along rows by the alternating pattern of red (high expression) and green (low expression) for different genes, albeit with different phases. There also appear to be two clusters of known clock-associated genes with similar transcriptional profiles at the top and half way up Fig. 6 with different phases. The distribution of periods of oscillation is found in Fig. 7A with a mean of 24.92+/−0.09 hrs, implying the oscillations are circadian as predicted. This compares well with the average period between conidiating bands of 21.64+/−0.05 hrs obtained from 135 race tube assays (see Materials and Methods).

Figure 7
A. Blue bars show the frequency (count/2436) of 2436 genes in Fig. 6 by their period of oscillation in hrs.

The phase of the clock-controlled genes is also interesting in Fig. 7B. Dusk (L/D transition) is taken as the zero time. The phase [var phi] of 100% corresponds to being 360 degrees out of phase with genes at midnight. There are morning and evening genes in Fig. 7B. The frq mRNA has a phase of 48%+/−16% as a typical morning gene, and the wc-1 mRNA has a phase of 69+/−12% as a typical dawn gene, being a blue-light receptor [4]. RNA metabolism genes tend to be dawn genes as well with a mean of 62+/−5% (Fig. 7B), while regulators tend to be dusk genes with a mean of 28+/−9%. Cell cycle genes among clock-controlled genes (in red in Fig. 7B) tend to be morning genes with a mean of 47+/−8%, as is the cell cycle checkpoint kinase prd-4 [53]. This is consistent with light triggering conidiation in Fig. 1. The phase of genes may provide some clues as to how the clock allows the organism to adapt to its environment (see Discussion).

A naïve expectation might be that while the phase would vary between different ccgs, as in Fig. 7B, the gene periods in Fig. 7A would be expected to be the same. Several possible causes for this variation in period of clock-controlled genes in Fig. 7A present themselves. There is noise in mRNA profiling measurements on which the period estimates are based; there is also intrinsic noise in mRNA levels from cell to cell [54]; and we are observing the system over a short time interval covering only 2 periods of oscillation. As illustrated in [7], it can take longer than 2 periods before the limit cycle is established and, during the transient prior to that, neither period nor phase of oscillation are well-defined. The finite observation time also limits the accuracy of the measured period by way of an uncertainty principle [55]: the shorter the observation time, the greater the uncertainty in frequency and period.

Cycle 1 microarray results were validated by RT-PCR (see next section) on twelve genes including wc-1, wc-2, and frq relative to an rRNA standard with excellent quantitative agreement (Fig. 8). The surprise was seeing oscillations in the wc-2 mRNA with a period of 22.17 hrs+/−1.66 hrs, which have not been reported before (see validation in the next section). The presence of a LRE upstream of the wc-2 gene would suggest adding additional feedback loops to wc-1 and wc-2 in Figs. 2 and and44 to make them autoregulatory. Evidence for an autoregulatory loop for wc-1 has recently been provided [56]. Oscillations in wc-1 mRNA are weak if plotted on the same scale as frq mRNA levels, as expected [24], [7]. The periods for frq, wc-1, and wc-2 mRNA oscillations of 21.40+/−1.69 hrs, 23.5+/−2.47, and 22.17+/−1.66, respectively, in Fig. 8 agree well with the period of banding in race tubes above, namely 21.64+/−0.05 hrs [31], [33].

Figure 8
RT-PCR results for cycles 1–3 validate results of oligonucleotide arrays for wc-1, wc-2, and frq mRNA levels in cycles 1–3 of the Computing Life paradigm.

RT-PCR confirms microarray results in cycle 1

The cycle 1 microarray results were validated by RT-PCR (see Materials and Methods) on twelve genes including wc-1, wc-2, and frq relative to an rRNA standard with excellent quantitative agreement (Fig. 8). This period of oscillation in wc-2 mRNA (22.17 hrs+/−1.66 hrs) in Fig. 8 is not significantly different from that of the frq mRNA (21.40 hrs+/−1.69 hrs). The cycle 1 experiment was repeated in its entirety as well as the measurement of wc-2 mRNA levels by RT-PCR with almost the same results (results not shown).

RNA metabolism genes

As a confirmation of microarray results on RNA metabolism genes, an entire replicate of the cycle 1 experiment was conducted (results not shown and same replicate used for validating wc-2 microarray results), and the levels of LHP1, PAB1, and ROK1 homologs' (denoted lhp-1, pab-1, and rok-1 in N. crassa) mRNAs were measured every 4 hrs over a 48 hr window in the dark by RT-PCR (See Materials and Methods) in two replicates of cycle 1 (including the original cycle 1 experiment in the dark. A combined estimate of the amplitudes based on 26 time points was tested with an F-test (as described in the legend of Fig. 6) and found to be F1,24 = 5.55, P = 0.04 for the lhp-1, F1,24 = 3.60, P = 0.07 for pab-1, and F1,24 = 4.96, P = 0.04 for rok-1. All three genes had an estimated period of 17 hrs.

Regulators

An entire replicate of the cycle 1 experiment was conducted (results not shown and same replicate used for validating wc-2 microarray results), and the levels of kal-1 and rpn-4 mRNAs were measured every 4 hrs over a 48 hr window in the dark by RT-PCR (see Materials and Methods) in two replicates of cycle 1 (including the original cycle 1 experiment in the dark). A combined estimate of the amplitudes based on 26 time points was tested with an F-test (as described in the legend of Fig. 6) and found to be significant (F1,24 = 4.95, P = 0.04 with a period of 23 hrs for kal-1 and F1,24 = 6.65, P = 0.02 with a period of 30 hrs for rpn-4).

Signaling

The gene rrg-1 has a role in the osmotic response signaling pathway [57][58]. An entire replicate of the cycle 1 experiment was conducted (results not shown and same replicate used for validating wc-2 microarray results), and the levels of rrg-1 mRNA were measured every 4 hrs over a 48 h window by RT-PCR (see Materials and Methods) in two replicates of cycle 1 (including the original cycle 1 experiment in the dark). The combined estimate of amplitude based on 26 time points was tested with an F-test (as described in the legend of Fig. 6) and found to be significant (F1,24 = 6.82, P = 0.02) with a period of 17 hrs.

Cycle 2 - Which genes are light-responsive?

Each of these 2436 circadian genes from cycle 1 could be under the control of: (1) WCC; (2) a different oscillator [8]; or (3) multiple oscillators [59]; or be false positives. The chance of the latter is only 18% (see Table 1). As shown in Fig. 1, an important element to the clock is light-entrainment. As the organism is grown under different “artificial days”, that is, different periods of alternating light exposure, the organism speeds up or slows down its biological clock.

If these genes were under WCC-control, then they should also be light-responsive according to the genetic network hypotheses in Figs. 2 and and4.4. This poses the question of what artificial day period should be used for the experiment. A MINE calculation results in Fig. 9 using published results [3], [23][25] plus the data in Fig. 6 (cycle 1), i.e., the MINE calculations are cumulative with respect to data already obtained.

Figure 9
MINE calculation to determine what artificial day to use in cycle 2.

The MINE results, shown in Fig. 9, suggest a long artificial day with a half-period of daylight of between 19 and 24 hrs. A second cycle of microarray experiments was therefore performed in which the light was turned back on after 24 hrs in a 48 hour observation period (see Materials and Methods). Results are shown in Fig. 10. Among these 3374 light-responsive genes (or 31% of N. crassa genes), 1725 (or 16% of the genes) responded to light in Fig. 10 and possessed LREs upstream. With the Benjamini and Hochberg [51] multiple-test correction, 1026 out of these 1725 genes with upstream LREs remain significant. In a similar experiment Lewis et al. [60] report detecting 22 light-responsive genes induced out of 1343 distinct genes arrayed as cDNAs, or 3%, and Ma et al. [61] report 34% of the unique genes in Arabidopis thaliana induced or repressed. Among the 31% of light-responsive genes detected here, up to 17% could be false positives, leaving 14% = 31%−17% as light-inducible. Among the 31% of light-responsive genes, 56% were induced (as opposed to repressed). The percentage of 0.56×14% = 8% is still higher than the 3% of Lewis et al. [60] (and less than the 31% of A. thaliana). The estimated percentage of light-responsive genes (14% corrected for false positives) is close to the uncorrected percentage of light-responsive genes with LRE(s) upstream (16%). Requiring an upstream LRE also appears to be a good filter for light-responsive genes.

Figure 10
Transcriptional profile of approximately 1725 genes with LREs upstream at 0, 4, 8, …, 24, 24.3333, 24.6667, 25, 26, 28, …, 48 hrs (values staggered on x-axis) after shift from light to dark (L/D) followed by D/L transition 24 hrs later, ...

A total of 768 genes were both circadian and light-responsive (Fig. 11), and the chance that any one of these 768 genes is a false positive would be (0.18)×(0.17) = 0.03 (Table 1), since the experiments were done independently. These 768 genes then remain candidates for ccgs in Fig. 2 or or4.4. The response by genes to light falls into two clusters, one cluster being turned off (in the top part of Fig. 10) and one cluster being turned on (in the bottom part of Fig. 10). The positive response of some genes to light appears largely transient with a burst of expression after the Dark-to-Light (D/L) transition while other genes appear to have a sustained response after the D/L transition. Most of the known clock-associated genes fall within the bottom cluster of light-responsive genes with LREs upstream, as expected.

Figure 11
Classification of 4380 N. crassa genes with upstream LREs in a Venn Diagram by their response in each of the three microarray experiments: (1) cycle 1 (assay for circadian rhythm); (2) cycle 2 (assay for light response); and (3) cycle 3 (assay for response ...

As a control, these results from cycle 2 were compared with a near replicate of this experiment reported in [60], using a different microarray technology and only a sample of the genes in N. crassa. With a power of 53% (Table 1) in our experiments, we would expect to see around 53% concordance with the experiments reported in [60]. In fact, we saw 64%+/−20% of the genes reported as light-responding in [60] in our cycle 2 experiments with good agreement to the 53% expectation.

Cycle 3 - Which genes are under WCC-control?

Another prediction of the genetic networks in Fig. 2 or or44 is that if WCC were dialed down (i.e., the mRNA level of wc-1 is reduced by use of a QA-inducible promoter as described in Materials and Methods), then a gene under its direct or indirect control should experience a sudden change in its mRNA level. To test this with a gene knock-down experiment, it is necessary to ascertain first what gene should be perturbed to yield maximum information about the genetic network in Fig. 4. A MINE calculation was done using published results [3], [23][25] plus the data in Fig. 6 (cycle 1) and the data in Fig. 10 (cycle 2) as described in Materials and Methods.

The MINE calculation in Fig. 12, suggests that the most informative knock-down is to reduce wc-1 to 10% of its original transcriptional activity. As detailed in Methods and Materials, the knock-down was engineered with a mutation in the native wc-1 and with a quinic acid inducible copy of wc-1+ introduced at another locus, producing a knock-down to 30% of its original activity; the results are shown in Fig. 13. A total of 4655 WCC-responsive genes were found to respond, but only 2323 of these genes had a LRE upstream as reported in Fig. 13. With the Benjamini and Hochberg [51] multiple-test correction, 1445 out of these 2323 genes with upstream LREs remain significant. The estimated percentage of WCC-responsive genes (20% corrected for false positives) is close to the uncorrected percentage of WCC-responsive genes with LRE(s) upstream (21%). Requiring an upstream LRE remains a good filter for WCC-responsive genes.

Figure 12
A 90% knock-down of the wc-1 gene is the MINE experiment.
Figure 13
Transcriptional profile of approximately 2323 genes with upstream LREs at 0, 0.1667, 0.3333, 0.5000, 0.6667, 0.8333, 1, 2, 4, 8 hrs (time points appear on x-axis) after shift simultaneously from light to dark (L/D) and from quinic acid (0.3%) ...

Most of the frq gene mRNAs were dialed down, as expected. The frq gene belongs to a cluster of genes being turned off in the top part of Fig. 13. Both wc-1 and wc-2 responded differently than frq and belong to the second larger cluster of genes being turned on (at least transiently) in the bottom part of Fig. 13. They have a fast transient response about 20–40 m after the QA/GAL (L/D) transition and then a drop off (see Fig. 8).

Also of interest are the 440 genes that are both circadian and light-responsive, but not under WCC control in Fig. 11. These 440 light-responsive and circadian genes which are not apparently under WCC-control, could be responding indirectly to genes under the control of WCC. They could also be false-negatives under the WCC-responsive assay (Table 1); they could be responding through a yet to be identified oscillator [8]; or they could be responding through multiple oscillators [59]. The chance that any one of them is a false positive is 0.03 = (0.18)×(0.17). The genes cpc-1, NCU05429, and ccg-16 (WO6H02), have been previously identified as candidates for being under the control of another oscillator [8]. All three were found to be circadian here (confirming the results in [8]), and cpc-1 and ccg-16 were found to be light-responsive as well. All of them were found not to be WCC-responsive here, although in the case of ccg-16 this appears to be a false negative [59].

Lewis et al. [60] conducted a related experiment that over-expressed wc-1. Only 18% of the induced genes corresponded to ones that we detected. A similar result was seen with overexpression of CLOCK in Drosophila [36]. There could be a variety of explanations, but it is not unexpected in a signaling system that over-expression leads to a different outcome than a knock-down, particularly when there are other coupled interacting pathways to those in Fig. 2 or or4.4. See, for example, the platelet-derived growth factor β receptor (PDGFRβ) signaling system [63] or the sonic Hedgehog (shh) signaling system in neurogenesis [64], where high and low levels of shh have different neurogenetic outcomes. The response of the clock to wc-1 over-expression is apparently not the same as lowering wc-1 expression. An additional MINE calculation analogous to Fig. 12 (results not shown) also suggested that that an over-expression experiment would not be as informative as a knock-down.

The Computing Life paradigm has led us to the discovery of 328 clock-controlled genes supported by all three series of microarray experiments and having an upstream LRE. Among these 328 genes, the chance of a false positive is (0.18)×(0.17)×(0.22) = 0.0067 (Table 1), the three microarray experiments having been done independently. A total of 104 of these 328 genes survive the multiple test correction in all three cycles [51]. These genes satisfy the three predictions of the genetic network and constitute clock-controlled genes (Fig. 11). Of these 328 clock-controlled genes, 314 of them are distinct on the arrays (some genes are represented multiple times; see Materials and Methods).

Direct test of the auto-feedback loops activating wc-1 and wc-2

All three cycles of microarray experiments support the presence of auto-feedback loops for WCC activating wc-1 and wc-2. In cycle 1 there was evidence that wc-1 and wc-2 mRNAs were circadian in Fig. 8. In cycle 2 there was a fast light-response by wc-1 and wc-2 (of less than an hour) in Fig. 8. In cycle 3 both wc-1 and wc-2 were WCC-responsive in Fig. 8 and have upstream LRE elements. An experiment with a short 6 hr artificial day is predicted by a MINE calculation in Fig. 9 to be highly informative about the genetic network in Fig. 4. A prediction of the genetic network in Fig. 4 is that the auto-feedback loops added should permit entrainment to a short artificial day of 6 hrs duration (as in Fig. 1) independent of FRQ. To test this hypothesis, a strain (93-4) with a frq mutation was subjected to a short artificial day as seen in Fig. 14A. As can be seen, the rapid conidiation pattern with frq in Fig. 14A is indistinguishable from a bd mutant in Fig. 1. To rule out that the conidiation response to a short artificial day is under the control of an independent light-response pathway, a mutant in bd, his-3, wc-1 (87-84-6) was generated by a cross, bd his-3 (87-84)×wc-1 (FGSC 3914). As can be seen (Fig. 14B), the wc-1 mutation almost entirely removed banding under the artificial day of 6 hrs. To confirm this finding, the bd, his-3, wc-1 (87-84-6) strain was transformed with a plasmid containing a QA-inducible wc-1+ as described ([7] and Materials and Methods) and was found to band weakly when wc-1+ was induced and not to band when wc-1+ was not induced (results not shown). This establishes that banding under a short artificial day is under the direct clock control of wc-1. In a similar entrainment experiment a double mutant wc-2KO, bd from the cross wc-2KO (FGSC 11124, 65)×bd (FGSC 1858) was subjected to a 6 hr artificial day. As can be seen in Fig. 14C, wc-2 also nearly removed all banding. As a final confirmation of these experiments, the frq gene was over-expressed on .001 M QA as well as turned off on glucose in a strain with a QA-inducible copy of the frq gene (see Materials and Methods), and these two conditions had no effect on the rapid banding (results not shown). These results serve to confirm that the rapid light response to a 6 hr day is due to the auto-feedback loops to wc-1 (and wc-2) and not due to the FRQ oscillator.

Figure 14
Light entrainment response under short artificial day by frq and wc-1 mutations provides evidence for autofeedback loops on wc-1 and wc-2 in Fig. 2.

The clock mechanism of three genes wc-1, wc-2, and frq is pleiotropic in its effects on metabolism

These 314 clock-controlled genes identified are involved in a broad range of biological functions: DNA metabolism, replication, repair, cell cycle, RNA metabolism, transport, carbon and energy metabolism, isoprenoid biosynthesis (including carotenoids), development, and signaling (Fig. 15A). Periods and phases of all 295 (314–19; see below about 19) ccgs are similar in distribution to all circadian genes (Fig. 7) with upstream LREs.

Figure 15
The clock of N. crassa has 295 distinct clock-controlled genes of diverse function as outputs.

The connection of the clock to development has been reported (ccg-2, ccg-4, ccg-6, and con-6). A recent connection to RNA metabolism has been through the frequency RNA helicase (frh) gene, whose product FRH co-immunopreciptates with FRQ [66]. The microarray experiments here identified frh and 16 additional genes in RNA metabolism under clock control (Fig. 15B). In addition to frh, 4 additional genes with products homologous to ATP-dependent RNA helicases in S. cerevisiae, namely ROK1, HAS1, PRP16, and RRP3, are among the 295 clock-controlled genes. At least three of these RNA helicases are involved in ribosomal RNA processing. While ribosome transcription is not under clock control (the ribosomal RNAs are not circadian in Cycle 1), almost all of the ccgs in RNA metabolism are involved in ribosome processing and assembly, i.e. ribosome biogenesis. These include SEN2, SOF1, LHP1, RRP3, POP4, UTP5, RCL1, ABD1, and PAB1 homologs in S. cerevisiae. The yeast PAB1 is a poly-A binding protein, and LHP1 is another distinct RNA-binding protein involved in the maturation of tRNAs and snRNAs. LHP1 has been implicated not only in the biogenesis of noncoding RNAs, but a recent ChIP/chip experiment in S. cerevisiae has demonstrated that it targets a number of mRNAs [67]. The RNase P (POP4) binds to the RPR1 RNA, which is also a target of Lhp1p, and the PAB1 mRNA is also apparently a target of Lhp1p in yeast as well [67]. Inada and Guthrie [67] also report enrichment of the gene encoding the snoRNA U3 among targets of Lhp1p. The product of UTP5 is part of the processome containing the U3 snoRNA and involved in ribosome biogenesis. The clock's regulation of the ribosome appears to occur through its biogenesis rather than its transcription. This is a novel mechanism by which the clock can regulate clock-controlled genes post-transcriptionally.

Connections of the clock to DNA metabolism are recently reported in humans [68]. A human clock CLK2 protein physically associates with S-phase checkpoint components ATR, ATRIP, claspin, and the checkpoint kinase, Chk1. Also human CLK2-depleted cells accumulate DNA damage, engage in radio-insensitive DNA synthesis, and fail to recruit proteins, such as RAD51, functioning in human recombination pathways. Here several putative checkpoint-associated proteins (e.g., NCU00560 and NCU04326) as well as 8 genes involved directly in purine/pyrimidine metabolism (NCU0 7590, 8359, 4323, 6262, 3194, 5542, and 4195) and repair (uvs-6/NCU00901) appear to be clock-controlled genes. The uvs-6 gene is a homolog of RAD50 in S. cerevisiae involved in double-stranded break repair. As predicted from the results on humans [68], the RAD51 homolog (mei-3/NCU02741) was circadian and light-responsive in N. crassa in cycles 1 and 2, but not WCC-responsive (cycle 3).

The clock connection to the cell cycle has been only recently reported in Neurospora through prd-4, a homolog of Chk2 in yeast, a second checkpoint kinase [53]. In addition to prd-4, we have identified 2 other putative cell cycle checkpoint genes as clock-controlled genes (NCU00560 and NCU04326), homologs of CDC4 and CDC28 in S. cerevisiae. Up to 16% of rhythmic genes (cycle 1) may be involved in the cell cycle in some mouse tissues in contrast to the 3% in Fig. 15A identified by more stringent criteria (i.e., positive in cycles 1–3) in N. crassa [69].

In that carbon metabolism showed up as significant and may have arisen due to the use of the QA-inducible switch in the last series of microarray experiments, one additional control was performed with wild type (OR74A – see Materials and Methods) in which many QA-inducible genes were identified with microarrays by a shift from 1.5% sucrose to 0.3% QA over a 0 to 8 hr window [11], [27]. Of the 314 distinct clock-controlled genes identified, only 19 of them were QA-inducible (with most of them being unclassified in function). Only 2 of the QA-inducible clock-controlled genes were involved in carbon metabolism. Subtracting the 19 QA-inducible ccgs from the 314 distinct ccgs, 295 clock-controlled genes remained.

Approximately one-half of these 295 clock-controlled genes are of unknown function. The most prevalent known function among these genes are phosphatases and kinases. They make up almost half [11] of the 23 genes with products involved in protein synthesis (Fig. 16B), processing, and degradation, and at least three of the genes under DNA metabolism are known kinases/phosphatases as well (CK1, HHP1 homolog, and PP1). This plethora of phosphatases and kinases may reflect the role they play in modifying/linking the functions of wc-1, wc-2, and frq as regulators of (to) other pathways, as well as in the coupling of the clock to varied signaling pathways and the cell cycle. For example, the phosphatases PP1 and PP2a, dephosphorylate FRQ in vitro, thereby altering oscillator behavior [70], and the kinases CK1a and CKII mediate the phosphorylation of WCC [71]. After that, DNA metabolism, RNA metabolism, and carbon/energy metabolism represent equally important outputs of the clock. The clock outputs are representative of the frequency of these functions in the proteome [37] with two exceptions: a deficit of transport and unclassified genes that are ccgs.

Figure 16
Transcriptional profiles of individual genes with upstream LRE elements in the dark (cycle 1) in the functional categories of: (A) regulation (MIPS functional classification categories 1.02.04, 11.02.03, and 16, [37]); (B) putative phosphatases/kinases ...

While only one ccg in Fig. 15 has a product classified as a transcriptional activator (PRO1 homolog) involved in fruiting body development (kal-1, NCU07392, [72]), four other ccg genes were classified as regulators. Their individual cyclical transcriptional profiles are given in Fig. 16A. One of these putative regulators is an inferred ornithine decarboxylase antizyme involved in sulfur and nitrogen regulation (NCU07155) [37]. This connection has also been reported in Arabidopsis [47], [73]. The remaining putative regulators identified under clock control in Fig. 2 or or44 were NCU00045, NCU01640 (rpn-4), and NCU06108.

Earlier work has suggested a link between signal transduction pathways for conidiation and the clock [74]. From microarray analysis here the clock is tied into a number of other signal transduction pathways as well, including stress (ccg-9/NCU09559), oxidative stress (NCU05169), light (vivid/NCU03967), mating (ccg-4/NCU02500, NCU03378, NCU07335), and osmo-sensing (os-1, rrg-1, hpt-1). The last output to the clock has only been reported recently (Fig. 16C). For example, cut-1 involved in osmo-sensing has been reported to be under WCC control [75]. Jones et al. [57] have reported a role of rrg-1 in osmo-sensing reminiscent of the os mutants. The genes rrg-1 and hpt-1 have an upstream LRE and were found to be circadian in cycle 1, but not light-responsive or WCC-responsive [58].

Impact of standard that the period of a ccg is between 16 and 30 hrs

To provide insights on the impact of the standard [5] of requiring genes to have a period between 16 and 30 hrs to be declared clock-controlled genes, we tightened the standard to 17–29 hrs because of the up turn at the extremes of the distribution of periods in Fig. 7A. The result was declaring 2172 genes with upstream LREs as circadian as opposed to 2436 genes in Fig. 6. This reduces the number of clock-controlled genes identified by all three cycles of microarray experiments to 290. If the 14 duplicates of frq, wc-1, and wc-2 as well as 18 remaining QA-inducible genes are removed, we are left with 258 clock-controlled genes with a range of periods between 17 hrs and 29 hrs. Genes affected in the text can be determined from their reported periods. Further trimming of the range of acceptable periods in Fig. 7A will gradually shrink the number of clock-controlled genes identified.

Identifying an ensemble of genetic networks for the biological clock of N. crassa

The culmination of Computing Life is the identification of an ensemble of genetic networks describing how the clock functions from 3 cycles of microarray experiments initiated from published data [3], [23][25]. Results are summarized in Table 2. For 69% = (100×18/26) of the rate constants in common with Table 1 [7], standard errors were reduced by the addition of data from cycles 1–3. Measured lifetimes of the wc-1 mRNA and FRQ protein remain concordant with estimated values in Table 2 with an order of magnitude increase in the amount of data (see Table 3). The long lifetime of the wc-1 mRNA provided a critical test of the genetic networks [7], and the long lifetime of the wc-1 mRNA of 7.4 hrs = D7−1 continues to be supported by microarray data here. Transcription rates of frq, A and An external file that holds a picture, illustration, etc.
Object name is pone.0003105.e024.jpg, as well as the deactivation rate of WCC, P, were previously identified as critical parameters for maintaining oscillations through the negative feedback loop in Fig. 2 [7]. These constants are now more sharply defined in Table 2. Eleven of the 26 parameters identified in Table 1 of [7] are not significantly different from those in Table 2, although a majority of the rate constants are estimated more precisely. Precision of cycles 1–3 are assessed further in the next section and Table 3.

Table 2
Rate coefficients in the genetic network model (Fig. 4) of the biological clock (n = m = 4) based on data from cycles 1–3 predicting the clock's observed oscillations, light response, and wc-1 perturbation ...
Table 3
The quality of fit of the model usually improves in successive cycles through the Computing Life paradigm. Several measures of fit are reported.

The behavior of the ensemble is displayed in Fig. 17. In cycle 1 the predicted oscillations of frq mRNAs are displayed with microarray measurements. The predicted oscillations in wc-1 and wc-2 mRNAs are much reduced relative to the frq mRNAs. In cycle 2 in the first 24 hrs the measurements and predictions track those in cycle 1; the correlation over all 87808 = 12544×7 microarray features between cycles 1 and 2 for the first 7 time points is 0.82. When the light is turned back on at 24 hrs into the cycle 2 experiment, the coordinated response of the ensemble and microarray data (particularly the frq mRNA) to light can be seen as the clock resets [62]. In cycle 3, the slow decline in the wc-1 mRNAs is seen corresponding to a lifetime of 7.4 hrs = D7−1. An alternative ensemble in Fig. 2 from Yu et al. [7] was tested and performed about the same as the genetic networks in Fig. 4; the distribution of chi-squared statistics [7] for the ensemble fitted to Fig. 2 is largely overlapping with the distribution of chi-squared statistics for the networks in Fig. 4 (results not shown). By Occam's Razor the simpler network with fewer parameters in Fig. 2 is then preferred. The series of model-guided experiments now has identified and selected an ensemble of genetic networks describing the clock mechanism in Fig. 2.

Figure 17
An ensemble of genetic networks predicts the mRNA levels of wc-1, wc-2, and frq for cycles 1–3.

Comparison of the precision and power of microarray experiments in cycles 1–3 with other microarray experiments

A standardized way of assessing progress in the Computing Life paradigm as well for comparing the power of different microarray experiments here with others in the literature and in the future would be useful. Progress here is measured by the error per observation σ2 or error variance.

In linear and nonlinear models a standard approach to estimating the precision of an experiment is to estimate the error variance σ2 [76], as it appears in the likelihood for the genetic network [7]. Townsend [77] illustrates by simulation and data analysis that such a common variance component can be extracted from each of a variety of microarray experiments and used to compare different experiments. Under the multivariate Gaussian assumption leading to the likelihood in [7], a simple estimator for the error per observation σ2 can be constructed for successive cycles of the Computing Life paradigm:

equation image

where n is the number of observations, χ2min is the minimum chi-squared statistic over the ensemble [7], and σ02 is a preliminary estimate of the error per observation in the multivariate Gaussian likelihood [7]. This preliminary estimate of σ02 was allowed to vary across observations. In the preliminary data drawn from the literature [3], [23][25], σ02 is 0.02 for the genetic network in Fig. 2 [7] and used on the RHS of expression above. Preliminary estimates of σ02, 4σ02, and 36σ02 for published data [3], [23][25], microarray data in cycles 1–3, and for conidial banding data from Fig. 1, respectively, were used in calculating χ2min. These weightings were selected to give equal weight per time to different experiments in the ensemble fitting process.

In Table 3, the progress in reducing the error variance in successive cycles through the Computing Life paradigm is reported. The fourth cycle began with a switch to the genetic network in Fig. 2. In cycle 4 all experiments in different cycles were allowed to have their own initial conditions for initial species concentrations. An additional 842 data points of conidial banding data were collected under the regimen of a 48 hr artificial day (cycle 2). A downward trend in the estimated error variance across cycles is evident.

The final estimate of σ2 is 0.03, slightly larger than our initial guess based on published data from Northerns, Westerns, and race tubes [3], [23][25]. The advantage of having this estimated error variance is that it can be readily compared with other families of models, such as simpler linear models, used in microarray analysis [77] as well as to other experiments by other laboratories. The estimated error variance also allows diagnosis of whether or not further experiments will refine the model ensemble. Based on the downward trend in the estimated error variance further cycles would be predicted to be profitable.

In each cycle of the Computing Life paradigm we constructed a test statistic (F or t) for a response on a gene by gene basis and calculated the same for all genes with LRE elements. Imagine extracting a ranked list of these significant statistics in a particular cycle. Townsend [77] has shown that the median value of this significant test statistic in this list is a good proxy for power from simulations. This statistic is called the gene expression level 50 (GEL50). With each GEL50 statistic, there can be an associated fold-change in expression level that can often be substituted for the original statistic for ease of interpretation. The advantage of this GEL50 statistic is that it allows easy comparison across experiments reported in the literature and in the future. The GEL50 is reported for cycles 1–3 in Table 3. These values are in the range of at least 5 other microarray studies [77].

Discussion

What we know and do not know about clock-controlled genes

Model-guided microarray experiments through MINEing have revealed much of what we know and do not know about the biological clock in Fig. 11. At the center of the Venn Diagram there is a highly cross-validated set of 295 distinct clock-controlled genes behaving as the clock mechanisms in Fig. 2 or or44 would predict. To date, only sixteen clock-controlled genes have been discovered in N. crassa in over 40 years of clock biology [59]. This set of 295 genes is circadian, light-responsive, and under WCC-control and spans a broad array of functions (Fig. 15). It is quite remarkable that only three genes, wc-1, wc-2, and frq, could have such diverse and pleiotropic effects on the organism's transcriptome, and the full extent of the clock's role in the metabolic web has not been evidenced till now.

The series of model-guided array experiments also point to an intriguing set of 440 genes or 57% ( = 100×440/(328+440) in Fig. 11) of the genes that are circadian and light-responsive and not under direct WCC control. No more than half of these 440 genes can be explained by false negatives in the cycle 3 experiment assaying for WCC control according to Table 1. These genes could be under the indirect control of the clock mechanism. Some candidate regulators implicated by all three microarray experiments include five genes, NCU00045, NCU01640 (rpn-4), NCU06108, NCU07155, and NCU07392 (kal-1). One of these may be a regulator of nitrogen and sulfur metabolism (NCU07155), and another is a homolog to PRO1 (called kal-1), a homeo-box containing transcription factor with a typical GAL4-like DNA binding domain involved in fruiting body development [72]. Another possibility is that some of these mRNAs are modified post-transcriptionally by RNA-binding proteins which are ccgs, such as the LHP1 and PAB1 homologs, to control expression. In that RNA-binding proteins appear to have specific mRNA target populations [78], they may provide another mechanism for the combinatorial clock control of gene expression post-transcriptionally.

There are WCC-responsive genes that are circadian only (612), but not light-responsive as well as those that are light-responsive only (396), but not circadian in Fig. 11. The genes, which are circadian only and not light-responsive, could be explained by other regulators involved with the clock mechanism that suppress a light-response (or be under the control of an oscillator not connected to the light-response through WCC as suggested by Correa et al., [8]). The possibility of an oscillator not connected to WCC is discussed in the next section below. Similarly, genes that are light-responsive only, but not circadian, might also be explained by regulators of the light-response that interact with WCC to repress a circadian response (or be part of a light-response pathway not connected to the known clock mechanism, but see below). One potential list of candidates as modulators of the circadian- and light-response are the 11 phosphatases and kinases that are clock-controlled genes in Fig 15. These two functions of being circadian and being light-responsive can be separated in the action of WCC through the outputs of the clock. Other kinds of posttranslational modifications of histones could be involved in chromatin-remodeling of upstream sequences to clock-controlled genes to determine gene activation through WCC [79].

Data on wc-1 cis-regulation by Kaldi et al. [56] and on frq cis-regulation by Belden et al. [79] suggest a simpler hypothesis that invokes only one new regulator class. The wc-1 gene has at least two kinds of promoters upstream responding to WCC. The CCAAT box promoter in front of wc-1 is not light-responsive, while the TAATTA promoter in front of wc-1 for WCC is light-responsive. The LREs determine light-responsiveness [4], while other genes, such as clockswitch (csw-1), regulate accessibility to promoters through chromatin-remodeling around the Clock Box or C box upstream of frq to determine the endogenous circadian rhythm. Froehlich et al. [80] have shown that the C box is necessary and sufficient for endogenous circadian expression of frq. An additional LRE more proximal to frq is light-responsive [4]. Genes (295 out of 328) that have both types of elements, C boxes and LREs, might be circadian and light-responsive. Genes (612) with a C box and no LRE might be circadian but not light-responsive. Light-responsive genes (396) could be explained by only having the LRE upstream. This alternative would also not require any new regulators beyond the enzyme CSW-1 enabling chromatin- remodeling around the C box. In Table 4 there is a significant association of genes being light-responsive and not circadian vs. being circadian and not light-responsive and the number of upstream LREs. Unfortunately, there was no difference in the incidence of the known Clock box among the circadian/non-light-responsive (612) versus the 396 light-responsive/noncircadian genes. Further characterization of Clock boxes in front of other genes besides frq appears warranted.

Table 4
There is an association between whether or not a WCC-responsive gene is light-responsive and the number of its LREs upstream.

Fig. 11 would suggest considerable limits on what we know about the clock mechanism. There are 1056 genes that are circadian only and not WCC-responsive and 561 genes that are light-responsive only and not WCC-responsive, and no more than half of these can be explained as false negatives in cycle 3 (WCC response array experiment). These two sets of genes are not explainable directly by WCC. A similar observation was made on CLOCK in Drosophila [36]. This percentage of the unexplained has increased from the earlier array experiments carried out by Lewis et al. [60]. This would suggest that there are other genes involved in the biological clock mechanism and other genes involved in a light response.

From the model-guided array experiments there is a concluding suggestion on a new direction for MINEing for clock-controlled genes. There are 987 genes that apparently are not circadian or light-responsive, but under WCC control. The clock is well-known to respond to a variety of environmental cues or zeitgebers, such as temperature [81]. The fact that these 987 genes do not respond to light would suggest future cycles of discovery to examine how these 987 genes respond to temperature [82].

The circadian response of genes with upstream LREs is stochastically independent of their light response conditional on the WCC response

The microarray experiments in cycles 1–3 allow us to infer a broad relationship of circadian, light, and WCC- responses by different genes in the genome from Fig. 11. We have established in Fig. 11 that the circadian and light-responses can be separated by the response of clock-controlled genes. It is natural then to ask how are these responses related in different genes? For example, we can ask if the circadian and light responses of different genes with LREs upstream are stochastically independent across all genes with an LRE element. A simple contingency table model in which the circadian and light- response of each of the 5702 genes with LRE elements are conditionally independent given the WCC response can be fitted as shown in Table 5 fairly well. The resulting goodness of fit chi-squared statistic X22 is 7.20, which is barely a significant departure from the model at the 0.05 level. What this implies is that the circadian and light responses of genes with LREs upstream are almost entirely explained by whether or not they have a response to WCC. When the WCC response is off, then the circadian response of a gene with an upstream LRE is more probable (pC>qC in Table 5). When the WCC response is on, then the light response of a gene with an upstream LRE is more probable (qL>pL in Table 5). This would suggest that any gene, such as csw-1 [79], acting on the light or circadian response of a gene with an upstream LRE would need to act through WCC. The only explanation for light and circadian responses of ccgs in Fig. 2 or or44 is WCC. If there were another oscillator acting on genes with upstream LREs, then its outputs would be constrained to behave in the same way as the known clock mechanism. That is, an additional oscillator would be coupled to the known clock mechanism in Fig. 2 through WCC and would have an independent light and circadian gene outputs conditional on a gene's response to WCC. If there were additional information, such as the number of upstream LREs (as in Table 4), then more could be said. This conditional independence result does not hold across all 11,000 genes, if the same model in Table 5 is fitted to all genes (X22 = 771.70, P<0.001).

Table 5
The circadian response of each gene with an upstream LRE is conditionally independent of its light response.

Clock as adaptation

The clock-controlled genes and their time of action in Fig. 7B provide a possible narrative on adaptation. This complex trait controls levels of asexual reproduction as shown in Fig. 1. The complex trait has a clear genetic basis in Fig. 2. Variation in the clock trait in natural populations has been demonstrated in D. melanogaster [83], A. thaliana [84], and N. crassa [85]. There may be the opportunity for natural selection to act on the clock network [86] through simple sequence repeats, for example, in WC-1 and FRQ [85]. It is clear that the clock controls fecundity in the asexual part of the life cycle in Fig. 1. In addition, longevity is in part under clock control. The bd mutation as an allele of ras-1 [87] is a well-known longevity gene in S. cerevisiae [88]. In addition, the longevity assurance gene LAG1 homolog [89] (designated NCU00008 in N. crassa) is under WCC-control from the cycle 3 experiments. There appears to be a strong connection between the clock and the fitness components of fecundity and longevity, as has been reported in other model systems [90].

The organism needs to protect its DNA from light. Light triggers the onset of conidiation at dawn, thereby placing the DNA into these environmentally robust packages, spores. Prior to cell cycle initiation in the morning to produce the spores, DNA synthesis must be reinitiated and completed before entry into cell-cycle checkpoints on the following day (Fig. 7B). The development of structures to produce these spores requires carbon and energy as well as activation of the developmental program in the morning (Fig. 7B) as well as some initial groundwork by regulators “as early immediate genes (IEGs)” in the evening. Implementation of the developmental program is timed through clock-controlled genes positioned in the Central Dogma played out in ribosome assembly, protein synthesis, modification, and degradation taking place in the morning. One of the features of spores that make them so resistant to environmental insults is the synthesis and incorporation of the isoprenoids as pigments. Apparently their biosynthesis is sufficiently important that the organism continues working on these protective factors in the afternoon. The clock mechanism can tune the phase of clock-controlled genes by adjusting the rate constants (particularly the degradation constants of CCGs) in the genetic network for ccgs in Fig. 2 as well as their level of expression. Setting the phase may set the level of WCC experienced by the ccg as shown in (Fig. 7B). This adaptationist interpretation of the role of different clock-controlled genes helps to explain why the clock orchestrates such a diverse set of players in the cell.

Using the Maximally Informative Next Experiment (MINE) and its Consequences

While the MINE is the next experiment U* to give us the most new information about the network in Fig. 4, U* is defined within a set of possible experiments that are ultimately specified by the goal(s) of the experimenter. Within the constraint of a particular biological goal, such as finding clock-controlled genes, finding the MINE U* provides an avenue to obtain the most information about the genetic network in each successive cycle through the Computing Life paradigm. This still leaves the choice of a set of possible experiments, from which U* is drawn, in the hands of the experimenter. In the present case, the biological goal of identifying clock-controlled genes sets the stage on which MINE plays. Identifying clock-controlled genes leads us to ask the genetic network in Fig. 4 for their predicted behavior, thus establishing the set of experiments in cycles 1–3 to be considered. In this context MINE is a tool to achieve a particular biological goal. One could naively make the biological goal coincident with the unconstrained objective of learning the most about the genetic network, the criterion of MINE, but then the set of possible experiments U becomes very large and the optimization of V(U) computationally intractable. Adopting a particular biological goal, such as finding ccgs, puts structure on the design question and thereby enables the researcher to parameterize the optimization of the next experiment U.

There have been several consequences to the use of model-guided discovery by MINE through the choice of experiments. In detecting circadian rhythms in cycle 1 MINE involved a design with even spacing between observations of 4 hrs and with sampling starting immediately without delay (Fig. 5). This design in cycle 1 conforms to practice in previous experiments [5]. However, in detecting the light-response in cycle 2 and the WCC-response in cycle 3 there is a departure from conventional wisdom. In cycle 2 prior experiments on light-entrainment, researchers have avoided a very short 6 hr artificial day [25], and as a consequence have missed an opportunity to examine the autofeedback in the networks as in Fig. 14. In detecting a response to perturbations in the clock mechanism in cycle 3, conventional wisdom would have us focus on perturbing the oscillator gene frq [8]. In contrast the MINE calculations in Fig. 12 pointed to engineering mutations in wc-1 to obtain more information about the network. Even when a mutation in the transcription factor WC-1 or CLOCK was pursued [36], [60], experimenters have elected to overexpress the transcription factor. The result has been a paucity of responding genes [36], [60]. The MINE calculations in Fig. 12 suggested that a knockdown would be more informative about the genetic network in Fig. 2 or or4,4, and this was the experiment adopted here. As a consequence, by following conventional wisdom without the input of MINE, it is likely our screen would have missed the clear and unbiased identification of 295 clock-controlled genes in Fig. 15 or as precise an identification of the genetic network (see Table 3). The connection of the clock to ribosome biogenesis in Fig. 15B in particular is then an outcome of MINE suggesting a new direction of exploration in microarray experiments to refine our understanding of the clock mechanism.

It may at first seem surprising, or accidental, that MINE design actually helps increase the experimental sensitivity for ccg detection: the MINE approach, as described in Materials and Methods, optimizes the experimental sensitivity for discerning between unknown model parameter vectors Θ; it does not per se optimize the next experiment for detection of new molecular or gene species (such as ccgs) that are not explicitly included in the network model in Fig. 2. However, these two seemingly unrelated features are actually closely linked, as we will now explain.

The MINE optimization does tend to select experimental conditions U* which enhance the predicted response for the molecular species to be observed in the next experiment. (If the to-be-observed species were responding weakly their observation would hardly improve discrimination between different Θ). One of those to-be-observed molecular species in each of our 3 MINE cycles was the frq RNA. The frq gene, however, is co-regulated (by WCC) with all clock-controlled genes. For purposes of external perturbation response, the frq gene itself is in fact like a typical ccg. Thus, by choosing the MINE U* to enhance the frq response we are in effect also enhancing the response of all other ccgs, since they are subject to similar regulatory control. Hence, the MINE-directed discovery of our 295 new clock-controlled genes is not accidental; it is an expected and highly desirable by-product of the MINE optimization. The broader conclusion to be drawn from these considerations is that, for gene-regulatory systems, MINE optimization, in general, will improve the conditions for experimental detection of new, unknown gene species which are co-regulated with the already known species targeted by the next experiment.

There are three advances here over earlier analyses of the genome-wide logical circuitry [1], [91], [92] of fundamental processes, such as development, carbon metabolism, and the clock. One, a working kinetics model of the clock [7] gives a complete quantitative description of genetic network dynamics in Fig. 17. Two, an ensemble method was developed for identifying a genetic network with many parameters from limited data [7] with the results in Table 2. Three, a new methodology (i.e., MINE design) was developed for a model-driven discovery workflow cycle (Computing Life) in profiling experiments. This new methodology resulted in the identification of most downstream clock-controlled genes in Fig. 15A and an unexpected connection between the clock and ribosome biogenesis in Fig. 15B.

Acknowledgments

We thank S. Bhandarkar, M. Bender, N. Hilton, J. Reeves, and T. Taha for comments and Deborah Bell-Pedersen (Biology Department, Texas A & M University), M. E. Case (Department of Genetics, University of Georgia, J. C. Dunlap (Department of Genetics, Dartmouth Medical School), J. J. Loros (Department of Biochemistry, Dartmouth Medical School), H. Colot (Department of Genetics, Dartmouth Medical School), Y. Liu (Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX), P. Cheng (Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX), and Suzannah Rutherford (Fred Hutchison Cancer Research Center, Seattle, WA) for reagents and advice.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: NSF DBI-0243574, NSF MCB-0542915, NSF BES-0425762, Sun Grid Grant 12021957, The University of Minnesota Supercomputer Center and the University of Georgia College of Agricultural and Environmental Sciences supported this work. These funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001;292:929–934. [PubMed]
2. Dunlap JC. Molecular bases for circadian clocks. Cell. 1999;96:271–290. [PubMed]
3. Crosthwaite SK, Dunlap JC, Loros JJ. Neurospora wc-1 and wc-2: transcription, photoresponses, and the origins of circadian rhythmicity. Science. 1997;276:763–769. [PubMed]
4. Froehlich AC, Liu Y, Loros JJ, Dunlap JC. White Collar-1, a circadian blue light photoreceptor, binding to the frequency promoter. Science. 2002;297:815–819. [PubMed]
5. Aronson BD, Johnson KA, Loros JJ, JC Dunlap JC. Negative feedback defining a circadian clock: autoregulation of the clock gene frequency. Science. 1994;263:1578–1584. [PubMed]
6. Schafmeier T, Haase A, Kaldi K, Scholz J, Fuchs M, et al. Transcriptional feedback of Neurospora Circadian clock gene by phosphorylation-dependent inactivation of its transcription factor. Cell. 2005;122:235–246. [PubMed]
7. Yu Y, Dong W, Altimus C, Tang X, Griffith J, et al. A genetic network for the clock of Neurospora crassa. Proc Natl Acad Sci U S A. 2007;104:2809–2814. [PMC free article] [PubMed]
8. Correa A, Lewis ZA, Greene AV, March IJ, Gomer RH, Bell-Pedersen D. Multiple oscillators regulate circadian gene expression in Neurospora. Proc Natl Acad Sci U S A. 2003;100:13597–13602. [PMC free article] [PubMed]
9. Locke JWC, Southern MM, Kozma-Bognar L, Hibbard V, Brown PE, et al. Extension of a genetic network model by iterative experimentation and mathematical analysis. Molecular Systems Biology. 2005;1:10.1038. [PMC free article] [PubMed]
10. Arnold J, Schüttler H-B, Logan DA, Battogtokh D, Griffith J, et al. Metabolomics Chapter 22. In: An Z, editor. Handbook of Industrial Mycology. NY, NY: Marcel-Dekker; 2004. pp. 597–634.
11. Battogtokh D, Asch DK, Case ME, Arnold J, Schüttler H-B. An ensemble method for identifying regulatory circuits with special reference to the qa gene cluster of Neurospora crassa, Proc Natl Acad Sci U S A. 2002;99:16904–16909. [PMC free article] [PubMed]
12. Brown KS, Sethna JP. Statistical mechanical approaches to models with many poorly known parameters. Phys. Rev E. 2003;68:021904-1–021904-9. [PubMed]
13. Yeung MK, Tegner J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci U S A. 2002;99:6163–6168. [PMC free article] [PubMed]
14. Tegner J, Yeung MKS, Hasty J, Collins JJ. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci U S A. 2003;100:5944–5949. [PMC free article] [PubMed]
15. Gardner TS, diBernardo D, Lorenz D, Collins JJ. Inferring genetic networks and identifying compound mode of action via expression profiling. Science. 2003;301:102–104. [PubMed]
16. Xiong M, Li J, Fang X. Identification of genetic networks. Genetics. 2004;166:1037–1052. [PMC free article] [PubMed]
17. Strogatz SH. Exploring complex networks. Nature. 2001;410:268–276. [PubMed]
18. Hughes TR, Marton MJ, Jones AR, Robers CJ, Stoughton R, et al. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–126. [PubMed]
19. Clarke DC, Betterton MD, Liu X. TGF-β concentration dependence of Smad phosphorylation kinetics: a systems perspective. IEEE Systems Biology. 2006;153:412–424. [PubMed]
20. Violin D, DiPilato LM, Yildirim N, Elston TC, Zhang J, et al. β2-Adrenergic Receptor Signaling and Desensitization Elucidated by Quantitative Modeling of Real Time cAMP Dynamics. J Biol Chem. 2008;283(5):2949–2961. [PubMed]
21. Alves R, Savageau MA. Systemic properties of ensembles of metabolic networks: application of graphical and statistical methods to simple unbranched pathways. Bioinformatics. 2000;16:534–547. [PubMed]
22. Muirhead RJ. Aspects of Multivariate Statistical Theory. New York: Wiley; 1982.
23. Garceau NY, Liu Y, Loros JJ, Dunlap JC. Alternative initiation of translation and time-specific phosphorylation yield multiple forms of the essential clock protein FREQUENCY. Cell. 1997;89:469–476. [PubMed]
24. Lee K, Loros JJ, Dunlap JC. Interconnected feedback loops in the Neurospora circadian system. Science. 2000;289:107–110. [PubMed]
25. Görl M, Merrow M, Huttner B, Johnson J, Roenneberg T, Brunner M. A PEST-like element in FREQUENCY determines the length of the circadian period in Neurospora crassa. EMBO J. 2001;20:7074–7084. [PMC free article] [PubMed]
27. Logan DA, Koch AL, Dong W, Griffith J, Schüttler H-B, et al. Genome-wide expression analysis of genetic networks in Neurospora crassa. Bioinformation. 2007;1(10):390–395. [PMC free article] [PubMed]
28. Nakashima H. A liquid culture system for the biochemical analysis of the circadian clock of Neurospora. Plant Cell Physiol. 1981;22:231–238.
29. Davis RH, De Serres FG. Genetic and microbiological research techniques for Neurospora crassa. Microbiological Reviews. 1970;27:79–143.
30. Dharmananda S. Studies of the circadian clock of Neurospora crassa: light-induced phase shifting. Santa Cruz: Ph.D. dissertation. University of California – Santa Cruz; 1980.
31. Arnold J, Dong W, Griffith J, Choi D, Baldwin J, et al. Systems Biology of the Biological Clock. 2006. pp. 119–125. IEEE Proceedings of the First International Conference on Computational Systems Biology (ICCSB-2006),
32. Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, et al. The genome sequence of the filamentous fungus Neurospora crassa. Nature. 2003;422:859–868. [PubMed]
33. Dong W, Tang X, Yu Y, Griffith J, Nilsen R, et al. Systems biology of the Neurospora biological clock. IET Systems Biology. 2007;1(5):257–265. [PubMed]
34. Dunlap JC, Borkovich KA, Henn MR, Turner GE, Sachs MS, et al. Enabling a community to dissect an organism: overview of the Neurospora functional genomics project. Advances in Genetics. 2007;57:49–96. [PMC free article] [PubMed]
35. Kasuga T, Townsend JP, Tian C, Gilbert LB, Mannhaupt G, et al. Long-oligomer microarray profiling in Neurospora crassa reveals the transcriptional program underlying biochemical and physiological events of conidial germination. Nucl Acids Res. 2005;33:6469–6485. [PMC free article] [PubMed]
36. McDonald MJ, Rosbash M. Microarray analysis and organization of circadian gene expression in Drosophila. Cell. 2001;107:567–578. [PubMed]
37. Mannhaupt G, Montrone C, Haase D, Mewes HW, Aign V, et al. What's in the genome of a filamentous fungus: analysis of the Neurospora crassa genome. Nucl Acids Res. 2003;31:1944–1954. [PMC free article] [PubMed]
38. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. [PMC free article] [PubMed]
39. De Hoon MJ, Imoto S, Nolan J, Miyano S. Open source clustering software. Bioinformatics. 2004;20:1453–1454. [PubMed]
40. Saldanha AJ. Java Treeview – extensible visualization of microarray data. Bioinformatics. 2004;20:3246–3248. [PubMed]
41. Giles NH, Geever RF, Asch DK, Avalos J, Case ME. Organization and regulation of the Qa (Quinic Acid) genes in Neurospora crassa and other fungi. J Heredity. 1991;82:1–7. [PubMed]
42. Nowrousian M, Duffield GE, Loros JJ, Dunlap JC. The gene frequency is required for temperature-dependent regulation of many clock-controlled genes in Neurospora crassa. Genetics. 2003;164:923–933. [PMC free article] [PubMed]
43. Yu Y. Monte Carlo studies of genetic networks with special reference to the biological clock of Neurospora crassa. Athens, GA: Ph.D. Dissertation. University of Georgia; 2005.
44. Ueda HR, Matsumoto A, Kawamura M, Iino M, Tanimura T, et al. Genome-wide transcriptional orchestration of circadian rhythms in Drosophila. J Biol Chem. 2002;277:14048–14052. [PubMed]
45. Lin Y, Han M, Shimada B, Wang L, Gibler TM, et al. Influence of the period-dependent circadian clock on diurnal, circadian, and aperiodic gene expression in Drosophila melanogaster. Proc Natl Acad Sci U S A. 2002;99:9562–9567. [PMC free article] [PubMed]
46. Gutierrez RA, Ewing RM, Cherry JM, Green PJ. Identification of unstable transcripts in Arabidopsis by cDNA microarray analysis: rapid decay is associated with a group of touch- and specific clock-controlled genes. Proc Natl Acad Sci U S A. 2002;99:11513–11518. [PMC free article] [PubMed]
47. Harmer SL, Hogenesch JB, Straume M, Chang H-S, Han B, et al. Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science. 2000;290:2110–2113. [PubMed]
48. Covington MF, Harmer SL. The circadian clock regulates auxin signaling and responses in Arabidopsis. PLoS Biology. 2004;5:1773–1784. [PMC free article] [PubMed]
49. Michael TP, McClung CR. Enhancer trapping reveals widespread circadian clock transcriptional control in Arabidopsis. Plant Physiology. 2003;132:629–639. [PMC free article] [PubMed]
50. Storey JD. A direct approach to false discovery rates. J R Stat Soc B. 2002;64:479–498.
51. Benjamini Y, Hochberg Y. Controlling the false discovery rate – a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B. 1995;57:289–300.
52. Dai J, Li L, Kim S, Kimball B, Jazwinski SM, et al. Exact sample size needed to detect dependence in 2×2×2 tables. Biometrics. 2007;63:1245–1252. [PMC free article] [PubMed]
53. Pregueiro AM, Liu Q, Baker CL, Dunlap JC, Loros JJ. The Neurospora Checkpoint Kinase 2: a regulatory link between circadian and cell cycles. Science. 2006;313:644–649. [PubMed]
54. Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002;297:1183–1186. [PubMed]
55. Heisenberg W. The Physical Principles of the Quantum Theory. New York: Dover Publications; 1949.
56. Kaldi K, Gonzalez BH, Brunner M. Transcriptional regulation of the Neurospora circadian clock gene wc-1 affects the phase of circadian output. EMBO Reports. 2006;7:199–204. [PMC free article] [PubMed]
57. Jones CA, Greer-Phillips SE, Borkovich KA. The response regulator RRG-1 functions upstream of a MAPK pathway impacting asexual development, female fertility, osmotic stress and fungicide resistance in Neurospora crassa. Molecular Biology of the Cell. 2007;18:2123–2136. [PMC free article] [PubMed]
58. Vitalini MW, de Paula RM, Goldsmith CS, Jones CA, Borkovich KA, et al. Circadian rhythmicity mediated by temporal regulation of the activity of p38 MAPK. Proc Natl Acad Sci U S A. 2007;104:18223–18228. [PMC free article] [PubMed]
59. De Paula RM, Lewis ZA, Greene AV, Seo KS, Morgan LW, et al. Two circadian timing circuits in Neurospora crassa cells share components and regulate distinct rhythmic processes. J Biol Rhythms. 2006;21:159–168. [PubMed]
60. Lewis ZA, Correa A, Schwerdtfeger C, Link KL, Xie X, et al. Overexpression of White Collar-1 (WC-1) activates circadian clock-associated genes, but is not sufficient to induce most light-regulated gene expression in Neurospora crassa. Molecular Microbiology. 2002;45:917–931. [PubMed]
61. Ma L, Li J, Qu L, Hager J, Chen Z, et al. Light control of Arabidopsis development entails coordinated regulation of genome expression and cellular pathways. Plant Cell. 2001;13:2589–2607. [PMC free article] [PubMed]
62. Crosthwaite SK, Loros JJ, Dunlap JC. Light-induced resetting of a circadian clock is mediated by a rapid increase in frequency transcript. Cell. 1995;81:1003–1012. [PubMed]
63. Fambrough D, McClure K, Kazlauskas A, Lander ES. Diverse signaling pathways activated by growth factor receptors induce broadly overlapping, rather than independent, sets of genes. Cell. 1999;97:727–741. [PubMed]
64. Pierani A, Brenner-Morton S, Chiang C, Jessell TM. A sonic-hedgehog-independent, retinoid-activated pathway of neurogenesis in the ventral spinal cord. Cell. 1999;97:903–915. [PubMed]
65. Colot HV, Park G, Turner GE, Ringelberg C, Crew CM, et al. A high-throughput gene knockout procedure for Neurospora reveals functions for multiple transcription factors. Proc Natl Acad Sci U S A. 2006;103:10352–10357. [PMC free article] [PubMed]
66. Cheng P, He Q, Wang L, Liu Y. Regulation of the Neurospora circadian clock by an RNA helicase. Genes and Develop. 2005;19:234–241. [PMC free article] [PubMed]
67. Inada M, Guthrie C. Identification of Lhp1p-associated RNAs by microarray analysis in Saccharomyces cerevisiae reveals association with coding and noncoding RNAs. Proc Natl Acad Sci U S A. 2004;101:434–439. [PMC free article] [PubMed]
68. Collis SJ, Barber LJ, Clark AJ, Martin JS, Ward JD, et al. HCLK2 is essential for the mammalian S-phase checkpoint and impacts on Chk1 stability. Nature Cell Biology. 2007;9:391–401. [PubMed]
69. Miller BH, McDearmon EL, Panda S, Hayes KR, Zhang J, et al. Circadian and CLOCK-controlled regulation of the mouse transcriptome and cell proliferation. Proc Natl Acad Sci U S A. 2007;104:3342–3347. [PMC free article] [PubMed]
70. Yang Y, He Q, Cheng P, Wrage P, Yarden O, et al. Distinct roles for PP1 and PP2A in the Neurospora circadian clock. Genes and Develop. 2004;18:255–260. [PMC free article] [PubMed]
71. He Q, Cha J, He Q, Lee H-C, Yang Y, et al. CKI and CKII mediate the FREQUENCY-dependent phosphorylation of the WHITE COLLAR complex to close the Neurospora circadian negative feedback loop. Genes and Develop. 2007;20:2552–2565. [PMC free article] [PubMed]
72. Masloff S, Jacobsen S, Poggeler S, Kuck U. Functional analysis of the C6 zinc finger gene pro1 involved in fungal sexual development. Fungal Genet Biol. 2002;36:17–116. [PubMed]
73. Gutierrez RA, Stokes TL, Thum K, Xu X, Obertello M, et al. Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proc Natl Acad Sci U S A. 2008;105:4939–4944. [PMC free article] [PubMed]
74. Correa A, Bell-Pedersen D. Distinct signaling pathways from the circadian clock participate in regulation of rhythmic conidiophore development in Neurospora crassa. Eukaryotic Cell. 2002;1:273–280. [PMC free article] [PubMed]
75. Youssar L, Avalos J. Light-dependent regulation of the cut-1 of Neurospora, involved in the osmotic stress response. Fung Genet And Biol. 2006;43:752–763. [PubMed]
76. Davidian M, Giltinan DM. Nonlinear Models for Repeated Measurement Data. New York: Chapman and Hall; 1995.
77. Townsend JP. Resolution of large and small differences in gene expression using models for the Bayesian analysis of gene expression levels and spotted DNA microarrays. BMC bioinformatics. 2004;5:54. doi:10.1186/1471-2105-5-54. [PMC free article] [PubMed]
78. Gerber AP, Herschlag D, Brown PO. Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLOS Biology. 2004;2:342–354. [PMC free article] [PubMed]
79. Belden WJ, Loros JJ, Dunlap JC. Execution of the circadian negative feedback loop in Neurospora requires the ATP-dependent chromatin-remodeling enzyme CLOCKSWITCH. Molecular Cell. 2006;25:587–600. [PubMed]
80. Froehlich AC, Loros JJ, Dunlap JC. Rhythmic binding of a WHITE COLLAR-containing complex to the frequency promoter is inhibited by FREQUENCY. Proc Natl Acad Sci U S A. 2003;100:5914–5919. [PMC free article] [PubMed]
81. Liu Y, Garceau NY, Loros JJ, Dunlap JC. Thermally regulated translational control of FRQ mediates aspects of temperature responses in the Neurospora circadian clock. Cell. 1997;89:477–486. [PubMed]
82. Ruoff P, Loros JJ, Dunlap JC. The relationship between FRQ-protein stability and temperature compensation in the Neurospora circadian clock. Proc Natl Acad Sci U S A. 2005;102:17681–17686. [PMC free article] [PubMed]
83. Sawyer LA, Hennessy JM, Peixoto AA, Rosato E, Parkinson H, et al. Natural variation in a Drosophila clock gene and temperature compensation. Science. 1997;278:2117–2120. [PubMed]
84. Michael TP, Salome PA, Yu HJ, Spender TR, Sharp EL, et al. Enhanced fitness conferred by naturally occurring variation in the circadian clock. Science. 2003;302:1049–1053. [PubMed]
85. Michael TP, Park S, Kim T-S, Booth J, Byer A, et al. Simple sequence repeats provide a substrate for phenotypic variation in the Neurospora crassa circadian clock. PLOS One. 2007;2(8):e795. [PMC free article] [PubMed]
86. Wagner A. Circuit topology and the evolution of robustness in two-gene circadian oscillators. Proc Natl Acad Sci U S A. 2005;102:11775–11780. [PMC free article] [PubMed]
87. Belden WJ, Larrondo LF, Froehlich AC, Shi M, Chen C-H, et al. The band mutation in Neurospora crassa is a dominant allele of ras-1 implicating RAS signaling in circadian output. Genes and Dev. 2007;21:1494–1505. [PMC free article] [PubMed]
88. Jazwinski SM. Growing old: metabolic control and yeast ageing. Annu Rev Microbiol. 2002;56:769–792. [PubMed]
89. D'mello NP, Childress AM, Franklin DS, Kale SP, Pinswasdi C, et al. Cloning and characterization of LAG1, a longevity-assurance gene in yeast. Journal of Biological Chemistry. 1994;269:15451–15459. [PubMed]
90. Lakowski B, Hekimi S. Determination of life-span in Caenorhabditis elegans by four clock genes. Science. 1996;272:1010–1013. [PubMed]
91. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, et al. A genomic regulatory network for development. Science. 2002;295:1669–1678. [PubMed]
92. Smith J, Theodoris C, Davidson EH. A gene regulatory network subcircuit drives a dynamic pattern of gene expression. Science. 2007;318:794–797. [PubMed]
93. Loros JJ, Dunlap JC. Genetic and molecular analysis of Circadian rhythms in Neurospora. Annu Rev Physiol. 2001;63:757–794. [PubMed]
94. Bloomfield P. Fourier analysis of time series: an introduction. New York: Wiley; 1976.
95. Emery B, de la Cruz J, Rocak S, Deloche O, Linder P. Has1p, a member of the DEAD-box family, is required for 40S ribosomal subunit biogenesis in Saccharomyces cervisiae. Mol. Microb. 2004;52:141–158. [PubMed]
96. Cheng P, Yang Y, Liu Y. Interlocked feedback loops contribute to the robustness of the Neurospora circadian clock. Proc Natl Acad Sci U S A. 2001;98:7408–7413. [PMC free article] [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...