- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Bioinformatics
- PMC2832819

# Estimating replicate time shifts using Gaussian process regression

^{1}Kevin K. Lin,

^{}

^{2,}

^{†}Bogi Andersen,

^{2}Padhraic Smyth,

^{1}and Alexander Ihler

^{1,}

^{*}

^{1}Department of Computer Science and

^{2}Departments of Medicine and Biological Chemistry, University of California Irvine, Irvine, CA 92697, USA

^{}Corresponding author.

^{†}Present address: Cancer Research Institute, University of California, San Francisco, CA 94158, USA

## Abstract

**Motivation:** Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate.

**Results:** We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study.

**Availability:** Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/

**Contact:** ihler/at/ics.uci.edu

## 1 INTRODUCTION

A typical microarray time-course expression dataset consists of measurements taken at a relatively small number of time points (e.g. 5 to 10), where at each time point microarray measurements are obtained on a small number (e.g. 3) of replicate samples. There has been considerable work in recent years in bioinformatics on the development of statistical techniques for accurately inferring expression profiles from such data, in the face of both measurement noise and biological variation across replicates (Bar-Joseph, 2004; Tai and Speed, 2006). In this article, we focus on a different source of variation that has received little attention to date, namely uncertainty about the precise biological time at which measurements were taken. We are specifically interested in the case where replicates that were measured at the same time point are in fact developing at different rates and correspond to different developmental times. The general intuition we pursue is that even though the underlying true expression profiles for each gene are masked by considerable noise, we can nonetheless infer time shifts for each replicate by analyzing all genes simultaneously.

As a motivating example, in this article, we use a time-series dataset of mRNA expression for the hair-growth cycle in the mouse, with microarray measurements for 6333 hair-cycle-related genes from 2 or 3 replicates at each of 9 time points, resulting in a total of 24 individual mice (Lin *et al.*, 2004, 2009); see Figure 1. Hair follicles grow in repeated cycles, each of which can be coarsely broken down into three phases: anagen, catagen and telogen. These cycles have been well-characterized morphologically, but are understood incompletely at the molecular level. Time-course microarray data have been shown to be useful for automated, reliable identification of hair cycle-associated genes (Lin *et al.*, 2004). However, in addition to the usual measurement noise, a significant source of variability arises from the fact that each sample in the time course is taken from a different individual, or replicate, and since each replicate develops at a slightly different rate, we can never obtain truly synchronous time points. From morphological observation, we find that after a few weeks, replicates which are of the same age (time since birth) may differ in the stage of hair follicle development by as much as 2–3 days. If the expression profile is changing rapidly over time, these developmental differences can result in major discrepancies among the replicates' observations, leading to poor estimates of the underlying expression patterns. Moreover, if the time interval between successive samples is relatively small, it is even possible for two replicates measured at successive time points to be in reverse order, i.e. the replicate measured at the later time point may be less developed than the earlier measurement. Figure 2 illustrates this point with images from two pairs of replicates at successive time points. If the estimated profiles are disturbed by such desynchronization, analysis tasks, such as estimating the underlying expression profiles and identifying periodically expressed genes, become more inherently difficult. These desynchronization effects have been previously discussed but not addressed in a systematic manner in existing literature (Erdal *et al.*, 2004; Wichert *et al.*, 2004). For datasets measuring multiple modalities (transcriptome and proteome, for example) or relatively short transient responses, we may expect timing and desyncronization effects to be more pronounced and methods of coping with these effects may become critical to analysis.

**...**

In this article, we describe a Gaussian process regression (GPR) approach that explicitly models desynchronization effects among replicates. GPR is a Bayesian non-linear regression technique, and has been previously applied in several contexts in bioinformatics (Gao *et al.*, 2008; Kirk and Stumpf, 2009; Lawrence *et al.*, 2007; Yuan, 2006). Rather than specifying predetermined shapes or other parametric assumptions, such as linear or polynomial regression, GPR is a semi-parametric method that uses the data themselves to represent the function, smoothed by an assumed covariance structure. Since we have little if any prior knowledge to determine what expression patterns to expect for different genes, it is reasonable to use GPR to automatically discover the shapes. We note in passing that our general approach for modeling desynchronization among replicates should be generally applicable to other statistical profile modeling methods such as splines (e.g. Bar-Joseph *et al.*, 2003).

To consider replicate desynchronization, we assume that each replicate has an ‘ideal’ physiological age, and that the expression profile viewed with respect to the physiological age is shared across replicates. The observed age is then the ideal physiological age plus a time shift due to developmental drift. We model the time shift using a Gaussian prior, and use a maximum *a posteriori* (MAP) approach to estimate the time shifts and expression patterns simultaneously. We evaluate our method using the aforementioned hair-cycle dataset, comparing our predicted time shifts with morphological observations, and also using cross-validation to measure the predictive accuracy of the model. Our experiments indicate that the time-shift estimates from our model both agree with independent morphological evidence and provide more accurate prediction of expression profiles for out-of-sample replicates and genes. Our approach should be generally useful for reducing uncertainty and improving the quality of inferred profiles for time-course microarray data, as well as more specific tasks such as analysis of differential expression in time-course data (Storey *et al.*, 2005).

Our method is substantially different from work on ‘aligning’ time-course datasets using techniques, such as dynamic time-warping algorithms (Aach and Church, 2001), continuous-time modeling of expression data by B-splines (Bar-Joseph *et al.*, 2003; Kaminski and Bar-Joseph, 2007) and discriminative hidden Markov models (Lin *et al.*, 2008). Those approaches focus on finding an optimal matching between two sequences by aligning the time points, provided that there are enough time points for matching to occur. Such methods, however, are not directly applicable to situations where each replicate is measured at only one time point, as in the aforementioned hair-cycle dataset. In contrast, the methodology we propose in this article can be applied whether each replicate is observed at a single time or at many.

Like any method of estimating expression profiles, we assume that the signals are sufficiently smooth to be estimated from the measurements. Furthermore, we assume that at least some signals are smooth compared with the temporal uncertainty, to allow the direction and magnitude of shifts to be estimated. Finally, we also assume that all genes are affected jointly by the time shift, which could be violated if some genes are influenced by external timing effects; in theory the model could be extended to include such effects. Our method is most similar to the general framework of total least squares (TLS) or error-in-variables (EIV) modeling in regression (Markovsky and Van Huffel, 2007; Van Huffel *et al.*, 2007), which minimizes the weighted sum of errors on both dependent and independent variables; our model can be viewed as a Gaussian process (GP) version of TLS or EIV.

## 2 METHODS

We used Affymetrix Mouse Genome 430 2.0 DNA microarrays to profile mRNA expression of 45 101 probe sequences in mouse back skin in the second hair growth cycle (Lin *et al.*, 2009). Nine representative time points were selected to measure the gene expression, shown in Figure 1. Two or three replicates were profiled at each time point, and we restricted our attention to 6333 genes that had been previously determined to be hair-cycle regulated (Lin *et al.*, 2004). Expression values were normalized by taking logarithms and subtracting their mean value across the replicates and time points.

### 2.1 GPR

Let *t*_{i}, for *i*=1, 2, …, *n* be the collection of time points at which we measure the expression (Fig. 1). Let *y*^{g,m}(*t*) be the expression of the *g*-th gene from the *m*-th replicate at time *t*. For convenience, we write the data in vector form, defining *y*^{g,m}_{j}=*y*^{g,m}(*t*_{j}), and writing *y*^{g}=[*y*^{g,1}_{1}, *y*^{g,2}_{1}, *y*^{g,1}_{2},…, *y*^{g,2}_{n}, *y*^{g,3}_{n}]^{T} and ** y**=[

*y*^{1},

*y*^{2},…,

*y*^{N}].

Let us first suppose that there is no time shifting in the data. We model the expression profile *y*^{g,m}(*t*) as a GP, so that any finite number of *y*^{g,m}(*t*_{i}) have a jointly Gaussian distribution. Since the expression data of each gene have been normalized to have mean zero across time, we assume that the mean of this GP is zero, and the GP is determined by its covariance function. A common, reasonable choice of covariance function is the squared exponential (Rasmussen and Williams, 2006)

where σ_{f}^{2} is the variance of any particular point *y*^{g,m}(*t*), and *l* is the length parameter. For two nearby time points, *t*_{i}≈*t*_{i′}, we have *k*(*t*_{i}, *t*_{i′})≈σ_{f}^{2}, meaning the profile values are highly correlated. In contrast, when *t*_{i} is further away from *t*_{i′}, *k*(*t*_{i}, *t*_{i′}) decreases toward zero, making the profile values uncorrelated. The length parameter *l* determines how fast the correlation decays with time.

The microarray data are noisy due to measurement errors and biological factors—two measurements taken at the same physiological age will not be exactly equal. We therefore model each observation as a noisy observation of the underlying expression profile. This is equivalent to modifying the covariance function to

where δ(·; ·) is the Kronecker delta, equal to one if and only if *t*_{i}=*t*_{i′} and *m*=*m*′. Then, the joint distribution of *y*^{g} is Gaussian,

where the covariance matrix ** K** is given by

Notice that the last term in Equation (1) is non-zero only in the diagonal of ** K**. This ensures that if σ

_{n}

^{2}>0 then

**is always non-singular.**

*K*We further assume that the expression profiles of different genes *y*^{g} are statistically independent, and share the same GP parameters: **θ**={l, σ_{f}, σ_{n}}. We can estimate the parameters by maximizing the log likelihood, which is given by

Using GPR, we can estimate the gene expression at any given time point *t*_{*}. For a given *t*_{*}, the conditional distribution of *y*^{g}(*t*_{*}), denoted *y*^{g}_{*}, given the observed data *y*^{g}, is also Gaussian:

where *K*_{**}=*k*(*y*^{g}_{*}, *y*^{g}_{*}) and

The minimum mean squared error estimate for *y*^{g}_{*} is the mean of this conditional distribution , and the estimate uncertainty is given by the conditional variance:

### 2.2 Modeling temporal uncertainty

Now suppose that each replicate has an ideal, physiological age, denoted , which corresponds to the degree of physiological development at the time at which the replicate is measured. Our observed age *t*_{i}, which is the nominal or temporal time at which the replicate is measured, can be treated as a noisy observation of :

where τ^{m}_{i} is the time shift associated with the *m*-th replicate at time *t*_{i}; for convenience, we also write the vector **τ**=[τ^{1}_{1}, τ^{2}_{1},…, τ^{3}_{n}]^{T}.

It is natural to model the τ^{m}_{i} using a Gaussian prior distribution,

To incorporate the time shifts into GPR, we still model the expression values using a GP as in Equation (2), but Equation (1) is replaced by

Thus, the expression profiles are GP with respect to the ideal, physiological ages , rather than the nominal or observed ages *t*_{i}.

We employ a MAP approach to estimate the τ^{m}_{i} and **θ**, by optimizing the posterior distribution over both. The posterior distribution is given by

Maximization of Equation (4) was carried out using the large-scale *fminunc* algorithm in MATLAB's non-linear optimization toolbox. This algorithm uses a trust-region-based Newton method, which iteratively optimizes a local quadratic approximation to the objective function in a small neighborhood around the current estimate. For more information, see the documentation (MathWorks, 2009).

Note that Equation (4) places a prior distribution on **τ**, but assumes no (informative) prior information about **θ**. The former reflects our intent that biological time be similar to the measurement time, i.e. we should not allow **τ** to take on arbitrary values. However, we have no specific prior information about the GP parameters **θ**; if additional information were available it could be included in Equation (4).

## 3 RESULTS

We estimate the time shifts and GP parameters by optimizing Equation (4) for the microarray data described in Section 2.1, and can compute the estimated profiles for each gene using Equation (3). When referring to the data, we use the notation ‘*t*_{i}-*m*’ to represent the *m*-th replicate measured at the *i*-th time point, *t*_{i}. For example, ‘23-1’ denotes the first replicate measured on the 23rd day.

Figure 3 shows example curves fit with and without time shifts. In general, the overall curve shapes are similar when estimated with and without time shifting, since the **τ** remain small. However, time shifts can refine the pattern, decrease random fluctuation and reduce uncertainty. Figure 3a–c shows three genes fit using GPR without time shifting. The curves display considerable random fluctuation and fitting errors. It is hard to discern meaningful patterns. Figure 3d–f fits the same genes, but including time shifts. The small fluctuations in Figure 3a–c are interpreted as arising from the time shifts; the estimated curves are smoother, highlighting the underlying patterns, and the estimated noise σ_{n} is decreased.

**...**

The changes in estimated parameters are quantified in Figure 4, which shows the estimated values of **θ** as σ_{τ} is increased. For σ_{τ} near zero, little or no time shifting occurs. As σ_{τ} increases, the length scale *l* also increases, suggesting that the random fluctuations are decreased in the shifted data. The noise σ_{n} decreases, indicating a better fit to the data, and σ_{f} increases, as we explain the variations within each time point as variation across time. Although this could indicate overfitting, we show later that time shifted profiles improve the predictive accuracy in cross-validation, suggesting that these changes do in fact reflect the underlying structure of the data. We also see that the parameter estimates are fairly stable across a wide range of σ_{τ}.

**θ**=[

*l*, σ

_{n}, σ

_{f}]

^{T}when using different values of σ

_{τ}are plotted. The length scale

*l*increases when σ

_{τ}increases, and the value of

**θ**changes little when σ

_{τ}

**...**

We can also characterize the stability of our estimate using the Laplacian approximation to the posterior, characterized by a covariance matrix around the MAP estimate. We find that the primary direction of uncertainty is an equal shift of all τ (which yields the same expression profiles and is thus unobservable from the data); the second allows for an equal shift of replicates in days 23–29, which yields very similar profiles due to the large gap between day 29 and day 34. Ignoring these two directions, the estimates of τ are highly certain, each with a residual posterior standard deviation (SD) between 0.01 and 0.07 (compared with a minimum 2-day separation between measurement times).

It is important to note that the time shifts are optimized using the entire set of genes, i.e. they are not optimized individually for each gene. For example, 34-1 (the circled point in Fig. 2d–f ) is not strongly encouraged to shift by the profile in Figure 3d, since the expression profile is relatively flat during that period. Similarly, there is only a slight improvement obtained by the large time shift indicated by the arrow at datum 34-1 in Figure 3d. However, if we look at the gene in Figure 3e and f, the rightward time shift of 34-1 greatly reduces its distance from the profile, strongly suggesting that the time shift of 34-1 is preferred by these and similarly shaped genes. The time shifts integrate the information across all the genes, and can thus reflect the relative development stage of the entire system.

### 3.1 Comparing with observed morphology

To verify that the time shifts we learn correspond to actual differences in the developmental rates among the replicates, we can compare to estimates of the physiological age based on morphological observations. We had a domain expert independently estimate the developmental stage and rank the replicates in order of degree of development using only images from histological sections of the replicates taken at the time of measurement. We then compared this ranking with the replicate order predicted by our algorithm's estimates of biological time , based solely on the gene expression measurements.

The rank correlation coefficient between these two global orderings was found to be 0.98, showing that they are in close correspondence. However, it is not immediately obvious how to measure the significance of this number. Both rankings rely on the nominal ordering of measurements indicated by their respective days, e.g. that measurements on day 23 are almost certainly earlier than measurements from day 34. It is difficult to know how much influence this implied ordering has on the rankings.

However, we can control for the implied ordering and assess the significance of our ranking quantitatively by comparing only the *within-group* orders, i.e. the relative ordering of replicates measured at the same time. These relative orders are listed in Table 1. For example, for day 23, both our expert and the algorithm ranked 23-1 as being at an earlier developmental stage than 23-2, so the order for both is [23-1, 23-2], abbreviated as [1, 2] on day 23. We can then compute the rank correlation coefficient ρ_{i} between the orderings at each time point, also shown in Table 1. We score the full ranking by simply taking the average score (correlation coefficient) at each time point, yielding .

It is then possible to assess the significance of , compared with the plausible null hypothesis that the ordering of replicates measured at the same day is uniformly random. We compute the *P*-value, or probability under the null hypothesis of obtaining a score as high or higher than the observed correlation, to be 0.0037. This shows that the expert ranking based on histology and the algorithm ranking based on expression are in fact in close correspondence.

However, this *P*-value may underestimate the quality of the algorithm ranking. We have controlled for the information from the measurement day by using within-group rankings, but this ignores any reordering of replicates across days. Both rankings occasionally reversed the ordering implied by the measurement day, ranking a replicate at an earlier day as being developmentally later than some replicate at the next time point. Quantifying the significance of these reversals is difficult, but we can discuss them anecdotally.

In the morphologically based ordering, there were only two instances in which replicates at successive time points appeared to be in reversed order of morphological development. These are replicates 25-1 and 23-2, in which the latter appears to be further developed in the Anagen stage of the hair cycle, and the pair 39-3 and 37-2, in which the latter appears to be in a later stage of the Catagen stage, as shown in Figure 2. Both of these order reversals were correctly predicted by the model, as shown in Figure 5. However, the model also predicts two other reversals (also at time points 37–39), which were not predicted by our expert. Without ground truth, the accuracy and significance of these reversals is difficult to quantify.

Finally, we test the sensitivity of the algorithm's ordering to the choice of the parameter σ_{τ}, which controls how easy it is to shift each time away from its nominal value. Figure 5 shows the time shifts found using several different values of σ_{τ}. Although the values of τ_{i} do change with the variance σ_{τ}, the relative differences among the τ_{i} do not change significantly, especially among replicates measured at the same time point. This makes the rank correlation relatively stable and insensitive to σ_{τ}.

### 3.2 Predictive accuracy in cross-validation

Another way to measure whether the learned time shifts correspond to real developmental phenomena is to check whether they improve our ability to *predict* the expression levels of data not used in the learning process. We use cross-validation to maximize the amount of data on which we can make predictive measurements.

Specifically, we subdivide the data into two sets, training and test, in which the test set is made up of the intersection of a subset of genes and a subset of the replicates (Fig. 6a). Only the training dataset is used for learning the expression patterns of the genes and time shifts of the replicates, maximizing Equation (4) on the training set to obtain an estimate of **θ** and **τ**. Since the training set includes *some* genes for every replicate, we obtain an estimate of its time shift τ_{i} and thus its estimated biological time . Similarly, since the training data include *some* replicate measurements for every gene in the training data, we can compute an estimate of the gene's profile and predicted expression level at the held-out replicates' times *t*_{i} using Equation (3). We then measure the mean squared error (MSE) between the predicted values and the measured expression levels for the test replicates and genes, to see if the time shifts improve our predictive accuracy.

**a**) Data partitioning for cross-validation. The testing data consists of the intersection of a subset of genes and of replicates; this allows us to predict the profile shape for all genes and time shift for all replicates while withholding unused data

**...**

We use a cross-validation strategy for evaluating the predictions. First, we randomly partition the genes into 10 subsets. In each round of cross-validation, one of these subsets is used to define the testing set, and the other nine subsets are assigned to the training set. Similarly, we randomly select one replicate at each time point to define the test set; recall that the test set is defined as the intersection of the test genes and test replicates. We select 100 such groups for each subset of genes; each round of cross-validation consists of leaving out the all the data corresponding to the randomly selected replicates for the current partition of genes.

In each round of cross-validation, we predict the time shift and expression profile of each gene/replicate pair, and measure the error from the observed expression value. We compute the MSE for each replicate by averaging over all rounds in which that replicate was left out, and compare with the MSE for a prediction made without time shifts. The percentage reduction in MSE is shown for each replicate in Figure 6b. We can see that most replicates improve their MSE with time shifts; only two (25-1 and 41-1) show any increase, and these increases are relatively small. The average MSE over all replicates is 0.2042 with time shifts as compared with 0.2515 without, resulting in a 20% reduction of average MSE. Figure 6c shows an anecdotal example of the held-out measurements and their estimated time shifts compared with the profile learned on the training data.

Finally, to test whether the predictive accuracy is sensitive to the variance in time, we performed the same cross-validation study at various values of σ_{τ}. The results are shown in Figure 7. For very low σ_{τ}, time shifting is essentially disallowed and the MSE matches that of the model without time shifts. However, as σ_{τ} increases, the MSE drops rapidly, and is relatively stable over the rest of the interval. We conclude that beyond very low values (empirically σ_{τ}>0.1), the choice of σ_{τ} does not greatly influence the predictive accuracy.

## 4 DISCUSSION AND CONCLUSION

We have argued that due to biological variation, time-course microarray data suffer from noise not only in the observed expression values, but also in the axis corresponding to the time of observation. Typical estimates of the expression profile assign all the uncertainty to the observation value, providing less accuracy and lower confidence in the resulting shapes. By introducing a model with uncertainty in the time axis, we can accurately infer the relative degree of development in each replicate and improve our estimates of the temporal expression profile.

The predictions of our GPR model show high consistency with human-generated estimates based on morphological observation (Table 1). Differences between the two estimation methods could be due to a number of factors. The expert ranking involves some subjective uncertainty, which can be hard to gauge; for example, the orientations of the histological sections are not always ideal, making morphological estimates more difficult. There is also always the possibility of human error in the data collection and processing.

As an example, we examined the three replicates on the 27th day, in which the ordering of the estimated time shifts from the model differs significantly from the expert's ordering based on morphological results (Table 1). We find that the keratin-associated protein expressions, an excellent marker for anagen progression, supports the hypothesis that replicate 27-4 is in fact more delayed in development as predicted by the algorithm.

In addition to not requiring histological sections, one advantage to the regression approach to estimating time shifts is that it provides a quantitative estimate of the amount of relative development, as compared with a subjective ordering. For the purposes of improving the estimated shapes of expression profiles, the proposed method also has the advantage that it works directly with the relevant measurements. While this carries some risk of overfitting, the predictive improvement seen in cross-validation suggests that the phenomena being identified are real effects in the expression data.

In terms of further improvements, in the general approach presented here, we modeled different gene expression profiles as independent GP sharing the same GP parameters and time shifts. While these assumptions give results that justify the inclusion of time shifts, there is still room to further refine and improve upon the model. For example, since different genes promote or inhibit one another in a regulation network and often share similar basic shapes, the assumption of independence is overly simple. One possible way to address this issue would be to group the genes into clusters, where each cluster shares an underlying shape or parameters. The grouping could be estimated simultaneously with the GP parameters, by modeling the data as a mixture model and using statistical estimation techniques such as the expectation–maximization method. There may also be temporal effects that are not shared by all genes or are unrelated to development, such as external effects, which would require further extension of the model.

*Funding*: National Institutes of Health–National Institute of Arthritis and Musculoskeletal and Skin Diseases (grant AR 44882 to B.A., including a BIRT revision award); National Science Foundation Grant (NSF IIS-0431085 to P.S.); National Library of Medicine–National Research Service (Award 5 T15 LM00744 to K.K.L.).

*Conflict of interest*: none declared.

## REFERENCES

- Aach J, Church G. Aligning gene expression time series with time warping algorithms. Bioinformatics. 2001;17:495–508. [PubMed]
- Bar-Joseph Z. Analyzing time series gene expression data. Bioinformatics. 2004;20:2493–2503. [PubMed]
- Bar-Joseph Z, et al. Continuous representations of time-series gene expression data. J. Comput. Biol. 2003;10:341–356. [PubMed]
- Erdal S, et al. BIBE'04: Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering. Los Alamitos, CA, USA: IEEE Comp. Soc.; 2004. A time series analysis of microarray data; pp. 366–375.
- Gao P, et al. Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics. 2008;24:i70–i75. [PubMed]
- Kaminski N, Bar-Joseph Z. A patient-gene model for temporal expression profiles in clinical studies. J. Comput. Biol. 2007;14:324–338. [PubMed]
- Kirk P, Stumpf M. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. Bioinformatics. 2009;25:1300–1306. [PMC free article] [PubMed]
- Lawrence ND, et al. Modelling transcriptional regulation using Gaussian processes. In: Schölkopf B, et al., editors. Advances in Neural Information Processing Systems 19. Cambridge, MA: MIT Press; 2007. pp. 785–792.
- Lin K, et al. Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance. Proc. Natl Acad. Sci. USA. 2004;101:15955–15960. [PMC free article] [PubMed]
- Lin K, et al. Circadian clock genes contribute to the regulation of hair follicle cycling. PLoS Genet. 2009;5:e1000573. [PMC free article] [PubMed]
- Lin T, et al. Alignment and classification of time series gene expression in clinical studies. Bioinformatics. 2008;24:i147–i155. [PMC free article] [PubMed]
- Markovsky I, Van Huffel S. Overview of total least-squares methods. Signal Processing. 2007;87:2283–2302.
- MathWorks. Unconstrained nonlinear optimization. 2009 Available at http://www.mathworks.com/access/helpdesk/help/toolbox/optim/ug/brnoxr7-1.html (last accessed date January 12, 2010)
- Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Cambridge, MA, USA: MIT Press; 2006.
- Storey J, et al. Significance analysis of time course microarray experiments. Proc. Natl Acad. Sci. USA. 2005;102:12837–12842. [PMC free article] [PubMed]
- Tai Y, Speed T. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Stat. 2006;34:2387–2412.
- Van Huffel S, et al. Total least squares and errors-in-variables modeling. Comput. Stat. Data Anal. 2007;52:1076–1079.
- Wichert S, et al. Identifying periodically expressed transcripts in microarray time series data. Bioinformatics. 2004;20:5–20. [PubMed]
- Yuan M. Flexible temporal expression profile modelling using the Gaussian process. Comput. Stat. Data Anal. 2006;51:1754–1764.

**Oxford University Press**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (431K)

- Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data.[Bioinformatics. 2009]
*Kirk PD, Stumpf MP.**Bioinformatics. 2009 May 15; 25(10):1300-6. Epub 2009 Mar 16.* - Discovering biological progression underlying microarray samples.[PLoS Comput Biol. 2011]
*Qiu P, Gentles AJ, Plevritis SK.**PLoS Comput Biol. 2011 Apr; 7(4):e1001123. Epub 2011 Apr 14.* - Optimal allocation of replicates for measurement evaluation studies.[Genomics Proteomics Bioinformatics. 2006]
*Zakharkin SO, Kim K, Bartolucci AA, Page GP, Allison DB.**Genomics Proteomics Bioinformatics. 2006 Aug; 4(3):196-202.* - Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.[Stat Appl Genet Mol Biol. 2010]
*Liu X, Rattray M.**Stat Appl Genet Mol Biol. 2010; 9:Article42. Epub 2010 Dec 9.* - Classification based upon gene expression data: bias and precision of error rates.[Bioinformatics. 2007]
*Wood IA, Visscher PM, Mengersen KL.**Bioinformatics. 2007 Jun 1; 23(11):1363-70. Epub 2007 Mar 28.*

- High-Dimensional Longitudinal Genomic Data: An analysis used for monitoring viral infections[IEEE signal processing magazine. 2012]
*Carin L, Hero A III, Lucas J, Dunson D, Chen M, Heñao R, Tibau-Puig A, Zaas A, Woods CW, Ginsburg GS.**IEEE signal processing magazine. 2012 Jan 1; 29(1)108-123* - Analysis of Time-Resolved Gene Expression Measurements across Individuals[PLoS ONE. ]
*Elo LL, Schwikowski B.**PLoS ONE. 8(12)e82340* - Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters[BMC Bioinformatics. ]
*Hensman J, Lawrence ND, Rattray M.**BMC Bioinformatics. 14252* - Predicting Viral Infection From High-Dimensional Biomarker Trajectories[Journal of the American Statistical Associa...]
*Chen M, Zaas A, Woods C, Ginsburg GS, Lucas J, Dunson D, Carin L.**Journal of the American Statistical Association. 2011 Jan 1; 106(496)1259-1279* - Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm[PLoS ONE. ]
*Darkins R, Cooke EJ, Ghahramani Z, Kirk PD, Wild DL, Savage RS.**PLoS ONE. 8(4)e59795*

- Estimating replicate time shifts using Gaussian process regressionEstimating replicate time shifts using Gaussian process regressionBioinformatics. Mar 15, 2010; 26(6)770PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...