- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC3318967

# Validity and Power in Hemodynamic Response Modeling: A Comparison Study and a New Approach

^{1}Department of Statistics, Columbia University, New York, New York

^{2}Department of Psychology, Columbia University, New York, New York

^{*}Correspondence to: Martin Lindquist, Department of Statistics, 1255 Amsterdam Ave., 10th Fl., MC 4409, New York, NY 10027. Email: ude.aibmuloc.tats@nitram

## Abstract

One of the advantages of event-related functional MRI (fMRI) is that it permits estimation of the shape of the hemodynamic response function (HRF) elicited by cognitive events. Although studies to date have focused almost exclusively on the magnitude of evoked HRFs across different tasks, there is growing interest in testing other statistics, such as the time-to-peak and duration of activation as well. Although there are many ways to estimate such parameters, we suggest three criteria for optimal estimation: 1) the relationship between parameter estimates and neural activity must be as transparent as possible; 2) parameter estimates should be independent of one another, so that true differences among conditions in one parameter (e.g., hemodynamic response delay) are not confused for apparent differences in other parameters (e.g., magnitude); and 3) statistical power should be maximized. In this work, we introduce a new modeling technique, based on the superposition of three inverse logit functions (IL), designed to achieve these criteria. In simulations based on real fMRI data, we compare the IL model with several other popular methods, including smooth finite impulse response (FIR) models, the canonical HRF with derivatives, nonlinear fits using a canonical HRF, and a standard canonical model. The IL model achieves the best overall balance between parameter interpretability and power. The FIR model was the next-best choice, with gains in power at some cost to parameter independence. We provide software implementing the IL model.

**Keywords:**fMRI, hemodynamic response, magnitude, delay, latency, brain imaging, timing, analysis, neuroimaging methods

## INTRODUCTION

Linear and nonlinear statistical models of functional MRI (fMRI) data simultaneously incorporate information about the shape, timing, and magnitude of task-evoked hemodynamic responses. Most brain research to date has focused on the magnitude of evoked activation, although magnitude cannot be measured without assuming or measuring timing and shape information as well. Currently, however, there is increasing interest in measuring onset, peak latency, and duration of evoked fMRI responses [Bellgowan et al., 2003; Henson et al., 2002; Hernandez et al., 2002; Menon et al., 1998; Miezin et al., 2000; Rajapakse et al., 1998; Saad et al., 2003]. Measuring timing and duration of brain activity has obvious parallels to the measurement of reaction time widely used in psychological and neuroscientific research, and thus may be a powerful tool for studying brain correlates of human performance. Recent studies, for instance, have found that although event-related blood oxygenation level-dependent (BOLD) responses evolve slowly in time, meaningful latency differences between averaged responses on the order of 100–200 ms can be detected [Aguirre et al., 1999; Bellgowan et al., 2003; Formisano and Goebel, 2003; Formisano et al., 2002; Henson et al., 2002; Hernandez et al., 2002; Liao et al., 2002; Richter et al., 2000]. In addition, accurate modeling of hemodynamic response function (HRF) shape may prevent both false-positive and -negative results from arising due to ill-fitting constrained canonical models [Calhoun et al., 2004; Handwerker et al., 2004].

A number of fitting procedures exist that potentially allow for characterization of the latency and duration of fMRI responses. It requires only a model that extracts the shape of the HRF to different types of cognitive events. In analyzing the shape, summary measures of psychological interest (e.g., magnitude, delay, and duration) can be estimated. In this article we focus on the estimation of response height (H), time-to-peak (T), and full-width at half-maximum (W) as potential measures of response magnitude, latency, and duration (Fig. 1). These are not the only measures that are of interest—time-to-onset is also important, although it appears to be related to T but less reliable [Miezin et al., 2000]—but they capture some important aspects of the response that may be of interest to psychologists, as they relate to the latency and duration of brain responses to cognitive events. As we show here, not all modeling strategies work equally well for this purpose—i.e., they differ in the validity and the statistical precision of the estimates they provide.

Ideally, estimated parameters of the HRF (e.g., H, T, and W) should be interpretable in terms of changes in neuronal activity, and they should be estimated such that statistical power is maximized. The issue of interpretability is complex, as the evoked HRF is a complex, nonlinear function of the results of neuronal and vascular changes [Buxton et al., 1998; Logothetis, 2003; Mechelli et al., 2001; Vazquez and Noll, 1998; Wager et al., 2005]. Essentially, the problem can be divided into two parts, shown in Figure 2.

**...**

The first issue is the question of whether changes in physiological, neuronal-level parameters (such as the magnitude, delay, and duration of evoked changes in neuronal activity) translate into changes in corresponding parameters of the HRF. Potential relationships are schematically depicted on the left side of Figure 2. Ideally, changes in neuronal parameters would each produce unique changes in one parameter of the HR shape, shown as solid arrows. However, neuronal changes may produce true changes in multiple aspects of the HR shape, as shown by the dashed arrows on the left side of Figure 2. The second issue is whether changes in the evoked HR are uniquely captured by parameter estimates of H, T, and W. That is, whatever combination of neurovascular effects leads to the evoked BOLD response, does the statistical model of the HRF recover the true magnitude, time-to-peak, and width of the response? This issue concerns the accuracy of the statistical model of the evoked response and the independence of H, T, and W parameter estimates, irrespective of whether the true HR changes were produced by uniquely interpretable physiological changes.

In this article we start from the assumption that meaningful changes can be captured in a linear or nonlinear time-invariant system, and chiefly address the second issue of whether commonly used HR models can accurately estimate true changes in the height, time-to-peak, and width of HR responses. That is, we assess the interpretability of H, T, and W estimates (right boxes in Fig. 2) given true changes in the shape of the evoked signal response (center boxes in Fig. 2).

Importantly, however, the complex relationship between neuronal activity and evoked signal response also places important constraints on the ultimate neuronal interpretation of evoked fMRI signal. While a full analysis of BOLD physiology is beyond the scope of the current work, we provide a brief analysis of some important constraints in the Discussion, and refer the reader to more detailed descriptions of BOLD physiology [Buxton et al., 1998; Logothetis, 2003; Mechelli et al., 2001; Vazquez and Noll, 1998; Wager et al., 2005]. In spite of this limitation, the estimation of the magnitude, latency, and width of empirical BOLD responses to psychological tasks is of great interest, because these responses may provide meaningful brain-based correlates of cognitive activity [e.g., Bellgowan et al., 2003; Henson et al., 2002].

### To Assume or Not to Assume?

Typically used linear and nonlinear models for the HRF vary greatly in the degree to which they make a priori assumptions about the shape of the response. In the most extreme case, the shape of the HRF is completely fixed; a canonical HRF is assumed, and the height (i.e., amplitude) of the response alone is allowed to vary [Worsley and Friston, 1995]. The magnitude of the height parameter is taken to be an estimate of the strength of activation. By contrast, one of the most flexible models, a finite impulse response (FIR) basis set, contains one free parameter for every time-point following stimulation in every cognitive event type modeled [Glover, 1999; Goutte et al., 2000; Ollinger et al., 2001]. Thus, the model is able to estimate an HRF of arbitrary shape for each event type in each voxel of the brain. A popular related technique is the *selective averaging* of responses following onsets of each trial type [Dale and Buckner, 1997; Maccotta et al., 2001]; a time × condition analysis of variance (ANOVA) model is often used to test for differences between event types.

Many basis sets fall somewhere midway between these two extremes and have an intermediate number of free parameters, providing the ability to model a family of plausible HRFs throughout the brain. For example, a popular choice is to use a canonical HRF and its derivatives with respect to time and dispersion (we use TD to denote this hereafter) [Friston et al., 1998; Henson et al., 2002]. Such approaches also include the use of basis sets composed of principal components [Aguirre et al., 1998; Woolrich et al., 2004], cosine functions [Zarahn, 2002], radial basis functions [Riera et al., 2004], spline basis sets, and a Gaussian model [Rajapakse et al., 1998]. Recently, a method was introduced [Woolrich et al., 2004] that allows the specification of a set of optimal basis functions. In this method a large number of sensibly shaped HRFs are randomly generated and singular value decomposition is used on the set of functions to find a small number of basis sets that optimally span the space of the generated functions. Another promising approach uses spectral basis functions to provide independent estimates of magnitude and delay in a linear modeling framework [Liao et al., 2002].

Because linear regression is limited in its ability to provide independent estimates of multiple parameters of the HRF, a number of researchers have used nonlinear fitting of a canonical function with free parameters for magnitude and onset/peak delay [Kruggel and von Cramon, 1999; Kruggel et al., 2000; Miezin et al., 2000]. The most common criticisms of such approaches are their computational costs and potential convergence problems, although increases in computational power make nonlinear estimation over the whole brain feasible.

In general, the more basis functions used in a linear model or the more free parameters in a nonlinear one, the more flexible the model is in measuring the magnitude and other parameters of interest. However, flexibility comes at a cost: More free parameters means more error in estimating them, fewer degrees of freedom, and decreased power and validity if the model regressors are collinear. In addition, even if the basis functions themselves are orthogonal, as with a principal components basis set, this does not guarantee that the regressors, which model multiple overlapping events throughout an experiment, are orthogonal. Finally, it is easier and statistically more powerful to interpret differences between task conditions (e.g., A – B) on a single parameter such as height than it is to test for differences in multiple parameters (A1A2A3 – B1B2B3)—conditional, of course, on the interpretability of those parameter estimates. The temporal derivative of the canonical SPM HRF, for example, is not uniquely interpretable in terms of activation delay; both magnitude and delay are functions of the two parameters [Calhoun et al., 2004; Liao et al., 2002].

All these problems with multiple-parameter basis sets suggest that using a single, canonical HRF is a good choice. Indeed, it offers optimal power if the shape is specified exactly correctly. However, the shape of the HRF varies as a function of both task and brain region, and any fixed model is bound to be wrong in much of the brain [Birn et al., 2001; Handwerker et al., 2004; Marrelec et al., 2003; Wager et al., 2005]. If the model is incorrectly specified, then statistical power will decrease, and the model may also produce invalid and biased results. In addition, using a canonical HRF provides no way to assess latency and duration—in fact, differences between conditions in response latency will be confused for differences in amplitude [Calhoun et al., 2004].

Thus, neither the fixed-response nor the completely flexible response appear to be optimal solutions, and using a restricted set of basis functions is an alternative that may preserve validity and power within a plausible range of true HRFs [Woolrich et al., 2004]. However, an advantage of the more flexible models is that *height, latency*, and *response width* (duration) can potentially be assessed. This article is dedicated to consideration of the validity and power of such estimates using several common basis sets. In this work we also introduce a new technique for modeling the HRF, based on the superposition of three inverse logit functions (IL), which balances the need for both interpretability and flexibility of the model. In simulations based on actual HRFs measured in a group of 10 participants, we compare the performance of this model to four other popular choices of basis functions. These include an enhanced smooth FIR filter [Goutte et al., 2000], a canonical HRF with time and dispersion derivatives (TD) [Calhoun et al., 2004; Friston et al., 1998], the nonlinear (NL) fit of a Gamma function used by Miezin et al. [2000], and the canonical SPM HRF [Friston et al., 1998]. We show that the IL model can capture magnitude, delay, and duration of activation with less error than the other methods tested, and provides a promising way to flexibly but powerfully test the magnitude and timing of activation across experimental conditions.

### Criteria for Model Comparison

Ideally, differences in estimates of H, T, and W across conditions would reflect differences in the height, time-to-peak, and width of the true BOLD response (and, ideally, unique changes in corresponding neuronal effects as well, although this is unlikely under most conditions due to the complex physiology underlying the BOLD effect). These relationships are shown as solid lines connecting true signal responses and estimated responses in the right side of Figure 2. A 1:1 mapping between true and estimated parameters would render estimated parameters uniquely interpretable in terms of the underlying shape of the BOLD response. As the example above illustrates, however, there is not always a clean 1:1 mapping, indicated by the dashed lines in Figure 2. True differences in delay may appear as estimated differences in H (for example) if the model cannot accurately account for differences in delay. This potential for cross-talk exists among all the estimated parameters. We refer to this potential as *confusability*, defined as the bias in a parameter estimate that is induced by true changes in another nominally unrelated parameter. In our simulations, based on empirical HRFs, we independently varied true height, time-to-peak, and response width (so that the true values are known). We show that there is substantial confusability between true differences and estimates, and that this confusability is dependent on the HRF model used. Thus, the chosen modeling system places practical constraints on the interpretability of H, T, and W estimates.

Of course, the interpretability of H, T, and W estimates also depends on the relationship between underlying changes in neural activity and changes in the magnitude and shape of the true fMRI signal [Buckner, 2003; Buxton et al., 1998; Logothetis, 2003; Riera et al., 2004], shown by solid arrows (expected relationships) and dashed arrows (problematic relationships) on the left side of Figure 2. Underlying BOLD physiology limits the ultimate interpretability of the parameter estimates in terms of physiological parameters—e.g., prolonged changes in postsynaptic activity. Because of the complexity of making such interpretations, we do not attempt to relate BOLD signal to underlying neuronal activity, but rather treat the evoked HRF as a signal of interest. Future work may provide the basis for more accurate models of BOLD responses with physiological parameters that can be practically applied to cognitive studies [e.g., Buxton et al., 1998]. For the present, we feel it is important to acknowledge some of the theoretical limitations imposed by BOLD physiology on the interpretation of evoked BOLD magnitude, latency, and response width, and thus we return to this point in the following sections.

## MATERIALS AND METHODS

In this section we introduce a method for modeling the hemodynamic response function, based on the superposition of three inverse logit (IL) functions, and describe how it compares to four other popular techniques—a nonlinear fit on two gamma functions (NL), the canonical HRF + temporal derivative (TD), a finite impulse response basis set (FIR), and the canonical SPM HRF (Gam)—in simulations based on empirical fMRI data.

### Overview of the Models

We begin with an overview of the models included in our simulation study.

#### i) Inverse logit model

The logit function is defined as x = log(*p*(1−*p*)^{−1}), where *p* takes values between 0 and 1. Conversely, we can express *p* in terms of *x* as:

This function is typically referred to as the inverse logit function and an example is shown in Figure 3A. In the continuation we will denote this function as *L*(*x*), i.e., *L*(*x*) = *p*.

*L*((

*t*−

*T*)/

*D*) with parameters: (

**A**) α = 1.0,

*T*= 15 and

*D*= 1.33, (

**B**) α = −1.3,

*T*= 27 and

*D*= 2.5 and (

**C**) α = 0.3,

*T*= 66 and

*D*= 2. (

**D**) The three functions in (A)–(C) superimposed (bold

**...**

It is important to note a number of important properties of *L*(*x*). It is an increasing function of x, which takes the values 0 and 1 in the limits. In addition, *L*(*t−T*) = 0.5 when *t* = *T*.

To derive a model for the hemodynamic response function that can efficiently capture the details that are inherent in the function, such as the positive rise and the postactivation undershoot, we will use a superposition of three separate inverse logit functions: The first describing the rise following activation, the second the subsequent decrease and undershoot, while the third describes the stabilization of the HRF, shown in Figure 3A–C.

Our model of the hemodynamic response function, *h*(*t*), can therefore be written in the following form:

In this particular model the function *h*(*t*) will be based on nine variable parameters (seven free parameters after imposing additional constraints), given by θ = (α_{1}, *T*_{1}, *D*_{1}, α_{2}, *T*_{2}, *D*_{2}, α_{3}, *T*_{3}, *D*_{3}). The α parameters control the direction and amplitude of the curve. If α is positive, α · *L*(*x*) will be an increasing function that takes values between 0 and α. If α is negative, α · *L*(*x*) will be a decreasing function that takes values between 0 and −α. The parameter T is used to shift the center of the function *T* time units. In effect, it defines the time point, *x*, where *L*(*x*) = 1/2 and can be used as a measure of the time to half-peak. Finally, the parameter *D* controls the angle of the slope of the curve, and works as a scaling parameter.

In our implementation of the model we begin by constraining the amplitude of the third inverse logit function, so that the fitted response ends at magnitude 0, by setting α_{3} = |α_{2}| − |α_{1}|. In addition, we want the function *h*(*t*; θ) to begin at zero at the time point *t* = 0. Therefore, we place the constraint *h*(0 θ) = 0 on the model, which implies that:

By applying these two constraints on the amplitude of the basis functions, this leads to a model with seven variable parameters. Fig. 3A–C shows an example of how varying the parameters can control the shape of the function *L*(*x*). By superimposing these three curves we obtain the function depicted in Figure 3D, which shows an example of an IL fit (solid line) to an empirical HRF (dashed line). Note that this function efficiently captures the major details typically present in the HRF and illustrates how effective three inverse logit functions can be in describing its basic shape.

The interpretability of the parameters in the model are increased if the first and second and the second and third IL functions are made as orthogonal as possible to one another. This will be true if the following conditions hold:

and

where *k* is a constant (see Appendix for more details). To ensure that these constraints hold, restrictions can be placed on the space of possible parameter values allowed in fitting the model.

##### Problem formulation

Let us define *f*(*t* θ) to be the convolution between the IL model for the hemodynamic response, denoted by *h*(*t* θ), and a known stimulus function, *s*(*t*). Our nonlinear regression model for the fMRI response at time *t _{i}* can be written as:

where ε_{i} ~ *N*(0, *V*σ^{2}). In matrix format we can write this as:

where *Y* = (*y*_{l}, …, *y _{N}*)

^{T}is the data vector,

*E*= (ε

_{1}, …, ε

_{N})

^{T}is a noise vector, and

*F*(

*X*; θ) = (

*f*(

*t*

_{1}; θ), …,

*f*(

*t*; θ))

_{N}^{T}.

The goal of our analysis is to find the parameters θ* such that the model best fits the data in the least-squares sense, i.e., we seek:

where

Under the assumption that the noise is independent and identically distributed (*iid*), then *V* = *I* and Eq. 9 can be written in the form:

In this situation the value θ* that maximizes *S*(θ) is equivalent to the maximum likelihood estimate (MLE) of θ.

It is well known that fMRI noise typically exhibits temporal dependence and it is crucial that this dependence be taken into consideration when fitting the model. In our implementation we assume that the noise term can be modeled using an AR(1) model. As *F*(*X*;θ) is a nonlinear function in θ, the process of finding the parameters that maximize Eq. 9 will almost always involve using an iterative search method. In order to speed up the computational efficiency of the applied algorithm, we would like to avoid repeatedly inverting the matrix *V*. Under the assumption of AR(1) noise we can fortunately express the inverse of *V* as:

where *d* = 1 + ϕ^{2}. Using this expression allows us to circumvent the need for repeated inversion of the correlation matrix and we can rewrite Eq. 9 as:

where

Note that for ϕ = 0, the cost functions defined in Eqs. 9 and 12 are equivalent. In the continuation we will include the ϕ term when referring to θ, i.e., θ = (θ, ϕ).

The optimization problem stated in Eq. 12 can be solved using a number of different methods. Traditionally, deterministic methods for solving the problem have been used but recently, with increased computational power, stochastic approaches have received increased attention.

##### Deterministic solutions

The optimization problem stated in Eq. 8 can be solved using numerical algorithms such as the Gauss-Newton or Levenberg-Marquardt algorithms. Both these methods are iterative and make use of the Jacobian of the objective function at the current solution. In addition, they both have fast rates of convergence. The Gauss-Newton has quadratic convergence, which implies that there exists a constant μ > 0 such that:

for each iteration *k*, where θ_{k} denotes the estimate of the parameter vector after the *k ^{th}* iteration and θ the true minimum. The Levenberg-Marquardt algorithm combines the Gauss-Newton algorithm with the method of gradient descent to guarantee convergence with quadratic convergence near the minimum. Although the convergence properties are comparable, the Levenberg-Marquardt algorithm is more robust, in the sense that it is able to find a solution even if it starts out far away from the final minimum. Both the Gauss-Newton and Levenberg-Marquardt algorithms are easily implemented for the IL model, using the fact that the inverse logit function has a straightforward derivative

*L*′ (

*x*) =

*L*(

*x*)(1 −

*L*(

*x*)).

##### Stochastic solutions

The problem with deterministic methods is that they always converge to the nearest local minimum-error from the initial value, regardless of whether it is a local or global minimum. Hence, the parameter estimate is strongly dependent on the initial values given to the algorithm. As it is common for nonlinear functions to have multiple local minima in addition to the global minimum that is being sought, it may be beneficial to use a stochastic approach that samples points across all of parameter space, as they are less likely to converge to a local minimum. Although such methods are computationally slower than deterministic methods, they are more likely to find the global extreme point and will at the very least allow us to investigate whether the fits obtained using the faster deterministic methods are accurate.

The *simulated annealing* algorithm [Kirkpatrick et al., 1983; Metropolis et al., 1953] is one such approach, which involves moving about randomly in parameter space searching for a solution that minimizes the value of the cost function. This method allows for an initially wide exploration of parameter space, which is increasingly narrowed about the global extreme point as the method progresses. This is possible, as the algorithm employs a random search that not only accepts changes that lead to a decrease in the value of the cost function, but also some changes that increase it.

There are four steps to implementing the simulated annealing algorithm:

- Choose an initial value for the parameter vector θ
_{0}. (Unlike the L-M algorithm, this choice is not critical.) - Choose a new candidate solution, θ
_{i+1}, based on a random perturbation of the current solution of θ_{i}. - If the candidate solution decreases the error, as defined by the cost function
*S*(θ) (Eq. 12), then automatically accept the new solution. If the error increases, accept the candidate solution with probability min{exp((*S*(θ_{i}) −*S*(θ_{i+1}))/τ_{i}),1}, where τ_{i}is the so-called*temperature function*at iteration*i*. The temperature function decreases for each iteration of the algorithm and as τ_{i}→ 0 the parameters will only be updated if Δ*h*< 0. - Update τ
_{i}to τ_{i+1}and repeat from Step 2.

Setting the temperature function is a critical part of the simulated annealing method, as high values of τ give wider exploration, and less chance of getting stuck in a local minimum, while lower values reduce the likelihood of moving unless the error is decreased. By starting out with a large value of τ and letting it converge to zero, we are allowing for a wide exploration in the beginning of the algorithm, which will narrow as the number of iterations increase. If the temperature function is allowed to decrease at a slow enough pace the global minimum can be reached with probability 1. However, it is typically not practical to use such a slowly decreasing schedule, and therefore it cannot be guaranteed that a global optimum will be reached.

The candidate solution is obtained by perturbing the current solution by the outcome of a uniformly distributed random variable, which we will denote Δθ. In our implementation we vary the amount each of the components of θ are allowed to jump according to the following:

The objective function, as it is stated in Eq. 12, is not convex. Therefore, whether or not a deterministic solution will converge to its global optimum will be strongly dependent on the initial values given to the algorithm. To circumvent this issue, we recommend using the simulated annealing approach, and this is the model fitting method we will use in the rest of this article. To determine an appropriate temperature function we randomly generated a number of sensibly shaped HRFs, which we used as pilot data to calibrate our schedule. In our implementation we let τ_{i} = *C*/log(1 + *i*), where C is a large positive number chosen so that the acceptance rate of the algorithm is ~80%. For the simulation study performed here we used values on the order of *r*_{1} = 5, *r*_{2} = 0.1, *r*_{3} = 0.1, and ϕ = 0.1. It should be noted that other distribution functions could have been used instead of the uniform to perturb the solution.

We tested the convergence properties of the simulated annealing approach at a number of randomly chosen starting points and it converged in a consistent manner to the global minimum. Simulated annealing converged much more reliably than the deterministic methods. In order to better characterize the distribution of the parameter estimates obtained using simulated annealing, we also performed a series of 1,000 simulations on each of five plausible signal-to-noise ratio (SNR) levels for fMRI data, ranging from 0.05–0.5. Visual inspection of the distributions suggested that the parameter estimates were normally distributed for each SNR. This conclusion was supported by tests of skewness and kurtosis on each distribution, for which the 95% confidence intervals all contained 0, as expected if parameter estimates follow a normal distribution.

#### ii) Nonlinear fit on two Gamma functions

The model consists of a linear combination of two Gamma functions with a total of six variable parameters, i.e.:

where *A* controls the amplitude, α and β control the shape and scale, respectively, and *c* determines the ratio of the response to undershoot. Γ represents the gamma function, which acts as a normalizing parameter. This model can fit a wide variety of different HRF shapes within the ranges of commonly observed event-related responses. The six parameters of the model are fit using the Levenberg-Marquardt algorithm.

#### iii) Temporal derivative

This model consists of a linear combination of the canonical HRF, which is described in greater detail in section (v), and its temporal derivative. Therefore, there are two variable parameters: the amplitudes of the HRF and its derivative. Amplitude estimation was performed using the estimation procedure outlined in Calhoun et al. [2004].

#### iv) Smooth FIR

In our implementation we used a semiparametric smooth FIR model [Goutte et al., 2000], as it was expected to outperform the standard FIR model. In general, the FIR basis set contains one free parameter for every time point following stimulation in every cognitive event type modeled. Assume that *x*(*t*) is a T-dimensional vector of stimulus inputs, which is equal to 1 at time *t* if a stimuli is present at that time point and 0 otherwise. Now we can define the design matrix corresponding to the FIR filter of order *d* as:

In addition, let Y be the vector of measurements.

The traditional least-square solution,

is very sensitive to noise. The individual parameter estimates will also be noisy, which increases the variance of H, T, and W estimates considerably. In particular, FIR HRF estimates contain high-frequency noise that is unlikely to actually be part of the underlying hemodynamic response. To constrain the fit to be smoother (but otherwise of arbitrary shape), Goutte et al. [2000] put a Gaussian prior on β and calculated the maximum a posteriori estimate:

where the elements of Σ are given by

This is equivalent to the solution of the least-square problem with a penalty function, i.e., β_{map} is the solution to the problem:

where *S _{ij}* are the components of the matrix Σ

^{−1}. Note that replacing Σ with the identity matrix gives the ridge regression solution [Jain, 1985]. As with ridge regression, the estimates will be biased with a certain amount of shrinkage.

The parameters of this model are *h, υ* and σ. The parameter *h* controls the smoothness of the filter and Goutte recommends that this value be set a priori to:

We used this value in our implementation. In calculating the filter, only the ratio of the parameters *υ* and σ is actually of interest, and we determined empirically, using pilot data, that the ratio

gave rise to adequately smooth FIR estimates, without giving rise to significant biases in the estimates due to shrinkage.

#### v) Gamma

This model again consists of a linear combination of two Gamma functions. However in this implementation all parameters except the amplitude is fixed, giving rise to a model with only one variable parameter. The other parameters were set to be α_{1} = 6, α_{2} = 16, β_{1} = β_{2} = 1, and *c* = 1/6, which are the defaults implemented in SPM99 and SPM2.

### Estimating Parameters

After fitting each of the models, the next step is to estimate the height (H), time-to-peak (T), and width (W). Of particular interest is to estimate the difference in H, T, and W across different psychological event types. Most of the models used have closed form solutions describing the fits (the Gamma-based models and IL), and hence clear estimates of H, T, and W can be derived from combinations of parameter estimates. However, a lack of closed form solution (e.g., for FIR models) does not preclude reading off the values from the fits.

When H, T, and W cannot be calculated directly using a closed form solution, we used the following procedure to estimate them from fitted HRF estimates. Height estimates are calculated by taking the derivative of the model function and setting it equal to 0. In order to ensure that this is a maximum, we should check that the second derivative is less than 0. If dual peaks exist, we choose the first one. Hence, our estimate of time-to-peak is *T* = min{*t* *h*′ (*t*) = 0 & *h*″(*t*) < 0}, where *t* indicates time and *h*′(*t*) and *h*″(*t*) denote first and second derivatives of the HRF *h*(*t*). For high-quality HRFs this is sufficient, but in practical applications in a wide range of studies, it is also desirable to constrain the peak to be neither the first nor last parameter estimate. To estimate the peak we use *H* = *h*(*T*). Finally, to estimate the width we perform the following steps:

- Find the earliest time point
*t*such that_{u}*t*>_{u}*T*and*h*(*t*) <_{u}*H*/2, i.e., the last point*before*the peak that lies below half maximum. - Find the latest time point
*t*such that_{l}*t*<_{l}*T*and*h*(*t*) <_{t}*H*/2, i.e., the last point*after*the peak that lies below half maximum. - As both
*t*and_{u}*t*take values below 0.5_{l}*H*, the distance*d*=*t*−_{u}*t*overestimates the width. Similarly, both_{l}*t*_{u−1}and*t*_{l+1}take values above 0.5*H*, so the distance*d*=*t*_{u−1}−*t*_{l+1}underestimates the width. We use linear interpolation to get a better approximation of the time points between (*t*_{l}, t_{u+1}) and (*t*_{u−1},*t*) where_{u}*h*(*t*) is equal to 0.5*H*. According to this reasoning, we find that:where$$W=({t}_{u-1}+{\mathrm{\Delta}}_{u})-({t}_{l+1}+{\mathrm{\Delta}}_{l})$$(24)and$${\mathrm{\Delta}}_{l}=\frac{h({t}_{l+1})-0.5H}{h({t}_{l+1})-h({t}_{l})}$$(25)For high-quality HRFs this procedure suffices, but if the HRF estimates begin substantially above or below 0 (the session mean), then it may be desirable to calculate local HRF deflections by calculating H relative to the average of the first one or two estimates.$${\mathrm{\Delta}}_{u}=\frac{h({t}_{u-1})-0.5H}{h({t}_{u-1})-h({t}_{u})}.$$(26)

For the Gamma-based models simple contrasts exist for the magnitude. For TD we use the bias corrected amplitude estimate given by Calhoun et al. [2004]. For the IL model we derive a number of contrasts in the Appendix, the results of which are presented here. If the constraints given in Eqs. 4 and 5 hold, the first and second logit functions are approximately orthogonal and the estimates of H, T, and W are given by:

and

Note that the estimates of H and T are independent of one another. The estimate of W depends to a certain degree on both H and T, but the simulation studies we present here show that it is less impacted by changes in H and T than the other models.

Note that although we use model-derived estimates of H, T, and W where possible, the direct approach of estimation from the fitted HRFs is also valid. This is aided in the case of the IL model by the fact that the inverse logit function has a straightforward derivative, as *L*′(*x*) = *L*(*x*)(1 − *L*(*x*)).

### Simulation Study

The simulations are based on actual HRFs obtained from a visual-motor task in 10 participants (spiral gradient echo imaging at 3T, 0.5 s TR) [Noll et al., 1995]. Seven oblique slices were collected through visual and motor cortex at high temporal resolution, 3.12 × 3.12 × 5 mm voxels, TR = 0.5 s, TE = 25 ms, flip angle = 90°, FOV = 20 cm. Participants viewed contrast-reversing checkerboards (16 Hz, 250 ms stimulation, full-field to 30° of visual angle) and made manual button-press responses upon detection of each stimulus. “Events” consisted of 1, 2, 5, 6, 10, or 11 such stimuli spaced 1 s apart, followed by 30 s of rest (open-eye fixation). For the simulation study, we used the 5-stimulus events only; 16 such events were presented to each participant. BOLD activity time-locked to event onset, averaged across a region in the left primary visual cortex defined in a separate localizer scan for each individual, served as the true HRFs in our simulation. Thus, we obtained 10 empirical HRFs, one for each participant. These data have been used previously to describe nonlinearities in BOLD data [Wager et al., 2005].

We began by constructing stimulus functions for 6-min runs of randomly intermixed event types (A, B), occurring at random intervals of length 2–18 seconds. Assuming a linear time-invariant system, the stimulus functions were convolved with the empirically derived HRFs, and AR(1) noise was added to the resulting time course.

The HRFs for A and B were modified prior to creating the time course in order to create three kinds of “true” effects an A – B amplitude difference, time-to-peak difference, and duration difference. In total we ran three types of simulations:

- S1. (Height modulation) The HRF corresponding to event B has half of the amplitude of the HRF corresponding to event A. In this scenario there is a true A – B difference in H of 0.5, but no time-to-peak or duration difference.
- S2. (Delay modulation) The HRF corresponding to event B has a 3-s onset delay compared to HRF A. In this scenario there is a 3-s difference in T between the HRFs, but no amplitude or duration difference.
- S3. (Duration modulation) The width of HRF B is increased by 4 s compared to HRF A by extending the time at peak by eight time points (0.5 s TR). In this scenario there is a 4-s difference in W between the HRFs, but no amplitude or time-to-peak difference.

Each of these three simulations was performed using the HRF for each of the 10 participants without modifications for HRF A, and modified as above for HRF B. For each participant the simulation was repeated 1,000 times using different simulated AR(1) noise in each repetition.

We were interested in the efficiency and bias of A – B differences for individuals and in the group analysis treating participant as a random effect. For each participant in each simulation we estimated A – B differences in H, T, and W. We quantified the relative statistical power of each type of model to recover these “true” effects. We also quantified the confusability of true differences in one effect (e.g., the manipulation of T in S2) with apparent differences (bias) in another (e.g., the estimated W in S2). This was accomplished by examining the relative statistical power across model types for detecting these “crossed” effects, whose magnitude—if H, T, and W estimates are independent—should be 0, as well as calculating how the true change in one parameter induced changes in the bias of the other nonmodulated parameters.

### Application to Voxel-wise Time Courses

Using data from the same experiment described in the previous section, we extracted the time courses from individual voxels, contained in the visual cortex, from each of the 10 subjects. To each voxel-wise time course we applied the five different fitting procedures used here and estimated H, T, and W for each.

### Relationships between Neural Activity and Activation Parameters

Relating neural activity to model parameters is complex, and ultimately places constraints on the interpretation of the parameter estimates. Here, we conduct a preliminary exploration of the conditions under which changes in neuronal activation parameters may lead to specific changes in corresponding HR parameters. We stress that our analysis here is necessarily greatly simplified; however, it may provide some rules of thumb for the range of conditions under which H, T, and W might roughly correspond to changes in neuronal activity magnitude, onset delay, and duration. For the purposes of this illustration, we assume that changes in neural firing rates (or postsynaptic activity) during brief periods of cognitive activity constitute neural “events”—for example, an “event” may consist of a brief memory refreshing operation that increases neural activity briefly and recurs with some frequency.

The theoretical relationships between neural events (event magnitude, event train onset, and event train duration) and fMRI signal (H, T, and W) vary depending on the duration of event trains and nonlinear properties of the response. We consider these relationships assuming linear responses and, separately, nonlinear magnitude saturation effects using estimates from previous work [Wager et al., 2005]. To construct what HR responses might look like if the response saturates nonlinearly in time, we performed the modified convolution procedure described in Wager et al. [2005] using event trains that varied in event magnitude, onset, and duration. We vary the length of epochs from brief, 1-s events to 18-s stimulation epochs, and consider whether true differences between two conditions A and B yield estimated differences only in the parameters varied or in others as well.

## RESULTS

### Organization of Results

In three simulations we varied the true difference in H (S1), T (S2), and W (S3) between two versions of the same empirical HRFs (HRF A and HRF B). In Figures 4--66 the results are shown for each of the three simulations. In the top row the true effects are shown by horizontal lines, and means and error bars for each of the 10 “participants,” each with a unique empirically derived HRF, are shown by the vertical lines. In the bottom panels the between-subjects (“random effects”) means and standard errors are shown. These can be used to assess the significance of the modulated HRF A – HRF B effect in each simulation, as well as biases in estimates of nonmodulated parameters. Figure 7A summarizes these results in bias vs. variance plots for the H, T, and W effects for each simulation type. Figure 7B (which we denote as confusability plots) shows a scatterplot of the change in bias for the two nonmodulated parameters for each simulation type. Tables I-III show the average magnitude (M), latency (L), and width (W) over the “participants” and repetitions for each of the five models and event types, and can be used to assess the accuracy of each fit. For comparison purposes, the true values imposed by the manipulations are also shown on the bottom row. Finally, Table IV provides an overall summary of statistical power for estimating both modulated and nonmodulated (crosstalk) effects across all the simulations.

**...**

**...**

**A:**Bias vs. variance plots for the estimated A-B difference. Each row represents a simulation (S1–S3) and each column represents an estimated parameter (H, T, and W).

**B:**Scatterplots of the change in bias for the two nonmodulated parameters, induced

**...**

For each simulation type (S1–S3) we will discuss the bias present in the estimates of H, T, and W, for both event types (A and B), using each of the five different models. Figure 8 shows typical fits for each event, model, and simulation type and gives an indication of the apparent biases present in the estimates. We will also discuss the accuracy of each model in estimating A-B effects, the confusability of modulated effects with those that are not modulated, and the power of each method to detect true effects. Below follows a description of the results for each simulation type.

### Simulation 1: Modulation of Height

The results of Simulation 1 are summarized in Table I and Figures 4, ,7,7, and and8.8. Truth was an A – B H difference of 0.5, with no modulation of T or W. Table I shows the average estimates of the parameters H, T and W for each event type and each model. The means and error bars for each of the 10 “participants” are shown by the vertical lines in the top panel of Figure 4. In the bottom panel the between-subjects means and standard errors are shown, as would be most relevant for a group analysis. These results are summarized in the bias vs. variance plot appearing in the first row of Figure 7A. The first column of Figure 7B shows the change in bias in the estimated T and W effect that is induced by the change in height. Finally, the first column of Figure 8 shows a typical fit for each model, selected to be representative of the thousands of model fits performed.

When the height of HRF B is modulated, the IL model gives a good overall fit for each event type, although T is slightly underestimated (Table I). Figure 4 shows that the IL model produces accurate estimates of the A – B height difference. Further, Table IV shows that the method is second in statistical power to the smooth FIR model. The IL model also produces the least bias in both T and W (bias is undesirable) for any of the models (Figures 4, ,7;7; Table IV). Clearly, there is almost no crosstalk present, as both the A-B latency and width effects are nonsignificant. This can also be seen in the first column of Figure 7B, as the point corresponding to the IL model lies extremely close to the origin.

The NL model effectively estimates the A-B height difference. However, this model has the least statistical power of all included models. In addition, Table I shows that both H and W are underestimated for both HRF A and B. In addition, as Figures 4 and and77 show, amplitude modulation induces bias in estimates of T (HRF B is estimated to peak later).

The TD model gives perhaps the best overall estimates of A-B effects, although it is not the most powerful. Table I shows that in the individual fits for HRF A and B, the estimated parameter values for H and T are consistently close to the true values. However, the estimates of W are underestimated for both event types. Table IV shows that the TD model, together with the IL model, has the lowest parameter confusability of all the models—i.e., T and W estimates are relatively unaffected by modulation of H, and are not statistically significant. Each of the other three models has some degree of confusability with T and W.

For the FIR model there is a surprisingly strong bias present in the estimate of both T and W, although the bias in T induced by the amplitude change is a fraction of the power to detect changes in H. The bias arises solely from the estimate of HRF B. The model parameters indicate that this method gives rise to an estimated HRF that is taller and has a shorter width and a later peak than the true curve. The estimate of HRF A, on the other hand, is extremely accurate, and this model is the most statistically powerful at detecting the A – B height difference.

Finally, the estimate of height for the Gam model is biased for both HRF A and B, but the estimate of A-B is accurate. The bias arises due to the fact that the true width of the underlying HRFs is shorter than the width of the canonical fitted function, which causes the estimate of H to be too low. Note that the blue bars in Figure 4 imply that no estimate is available for T and W using the Gam model; i.e., both the width and latency are fixed when using a canonical HRF.

### Simulation 2: Modulation of Hemodynamic Delay

Simulation 2 involved a true 3-s difference in T, and no modulation of H and W. The results are summarized in Table II and Figures 5, ,7,7, and and8.8. Table II shows the average estimates over the 1,000 repetitions for each event type and model. The results for each of the 10 individual “participants” are shown in Figure 5, while the second row of Figure 7A and the second column of Figure 7B show the bias vs. variance and confusability plots, respectively. Finally, the second column of Figure 8 shows a typical fit for each model, selected to be representative of all the model fits performed.

**...**

For true changes in T we obtain a good fit with the IL model, with no significant crosstalk present (Figs. 5, ,7;7; Table IV). The NL model gives a rather accurate estimate for the difference in time-to-peak, but H and W estimates for HRF B are severely corrupted by the delay. Thus, the delayed HRF B has a substantially smaller estimated magnitude, and modulation of T also induces A – B differences in both the estimates of H and W (Fig. 7; Table IV).

For the TD model, the estimate of the parameters of HRF B is underestimated for both H and T. The shift is too large for this model to handle, as it can only handle shifts of ~1 s. Modulation of T induces A – B differences in both the estimates of H and W (Fig. 7; Table IV).

The FIR model, on the other hand, gives a good overall fit for both event types with the width being slightly underestimated. The estimates of the A – B differences are extremely accurate, with little to no confusability present. In addition, it is the most statistically powerful at detecting the A – B latency difference.

As expected, the Gam model is unable to handle shifts in T, and a strong bias is induced in H. In addition, since the latency and width are fixed, we have no estimate of these components. These results are not surprising, as this is a highly constrained model that is only effective if the true shape is consistent with the model. It is therefore unable to appropriately model shifts in onset or prolonged duration in the underlying signal.

### Simulation 3: Modulation of Response Width

Finally, Simulation 3 involved a 4-s extension of W for condition B, and no modulation of H or T. The results are summarized in Table III and Figures 6--8.8. Table III shows the average estimates over the 1,000 repetitions for each event type and model, while the results for each of the 10 “participants” are shown in Figure 6. The third row of Figure 7A shows bias vs. variance plots and the third column of Figure 7B shows confusability plots. The last column of Figure 8 shows a typical fit for each model, selected to be representative of the thousands of model fits performed.

When the width of HRF B is extended, the IL model produces differences in estimated W (desirable) and T (undesirable). Figure 6 shows that the IL model provides the most accurate estimates of W, and though the power to detect differences in W is second to the smooth FIR model, it is substantially greater than the other models. The IL model also shows the least bias in estimates of H and T. It should be noted from studying Table III that in the individual fits for HRF A and B, the estimated parameter values are consistently very close to the true values.

With true differences in W, the amplitude estimate of HRF B using the NL model is consistently underestimated, leading to a bias in H for A – B. Estimated differences in T are also created, and these are actually more reliable than estimates of W (Table IV). Since the shape of the gamma density is fixed in this model, the shape can be scaled but not stretched. Hence, the increased width pulls the function away from its true position during the rise, thus delaying the time-to-peak and shortening the width. Thus, true differences in some measures (H, T, and W) are highly confusable, as they induce estimated differences in multiple measures.

For TD the magnitude estimate of HRF B is consistently overestimated. The estimate of T will be clouded by the estimate of width (T is overestimated, W is underestimated). The added width pulls the function away from its true position during the rise, thus delaying the time-to-peak and thereby shortening the width. The model has difficulty detecting the true A-B effect in W. In fact, estimated differences in both H and T are created that are both more reliable than estimates of W.

The FIR model fits the general shape of both event types well, except for the fact that the FIR model has a difficult time modeling the plateau present in event type B. The plateau has a length of 4 s and the time-to-peak is estimated uniformly over the plateau, giving a mean T estimate that overestimates by ~2 s. Estimated differences in T were more reliable than estimates of W.

Lastly, as expected, strong bias exists for the Gam model, as this model is unable to handle prolonged duration in the underlying signal.

### Application to Voxel-wise Time Courses

We applied the five fitting methods to time courses obtained from individual voxels contained in the visual cortex. Figure 9 shows the results from one representative subject, whose data consisted of 89 separate voxels. Panels B and C show representative fits from an individual voxel and panel A illustrates the consistency of the estimators over the 89 voxels. Consistency is important, as we expect brain responses in these prelocalized regions of the visual cortex to be relatively homogeneous across voxels (which average over ocular dominance columns and other functional features), and so it is likely that much of the variability across voxels in some of the fits is due to error. The results show that the IL model gives the most consistent estimates across the 89 voxels for each of H, T, and W.

### Relationships between Neural Activity and Activation Parameters

Figure 10A shows a train of brief stimulus events (vertical lines) occurring every 1 s for 18 s, which are intended to serve as a simplified model of neural activity, and the HR shape that is predicted from the (nonlinear) results in Wager et al. [2005]. Different task states may change the *magnitude* of neural activity during events, the *onset latency* for the event train, and/or the *duration* of the event train. If the “true” HR delay predicted by our model varies as a function of changes in true neural magnitude (and so on for other parameter combinations), then the HRF will be of limited usefulness, because it cannot provide information about the type of neuronal change that occurred.

**A:**A train of events (18 s; one burst of simulated neural activity per second) and the predicted activation

**...**

We first deal with the interpretability of H estimates. For brief events, increases in H were caused by either true increases in magnitude or increases in duration. This is because increases in the duration of brief events (Fig. 10D,G) tended to translate into changes in HR height. Changes in H for the three types of simulated neuronal effects (increases in magnitude, onset latency, and duration) are shown by the solid lines in Figure 10E–G. Conversely, true increases in magnitude did not evoke changes in T or W (Fig. 10B,E).

Figure 10B shows HRFs for conditions A and B (solid and dashed lines, respectively) at short and long epoch durations. Figure 10E shows epoch duration on the x-axis and parameter differences (A – B) on the y-axis; an ideal, unbiased response would be a flat line at 0.5 for H (solid line) and flat lines at zero for T and W (dashed and dotted lines, respectively). That is, magnitude increases produced expected increases in H for brief events, although observing H cannot tell us about whether the magnitude or duration of neuronal activity was different across conditions. For longer epochs, magnitude increases produced increases only in H; the confusability between true duration and apparent height fell to zero after about 8 s. Thus, the HRF height for brief events is not uniquely interpretable, but the HRF height for longer epochs is. ^{*}

We next turn to the interpretability of estimates of T. For brief events, changes in T could be caused by true changes in onset (Fig. 10C,F) or by changes in duration (Fig. 10D,G). This is because duration increases also increased the peak latency. For longer epochs, T changes could be caused by true changes in onset or changes in height (Fig. 10B,E). This is because height increases disproportionately affect the early part of the HR (a nonlinear effect not observed with the linear canonical HRF), shifting T earlier for intense stimuli. Thus, T changes are not uniquely interpretable in terms of neuronal latency.

Changes in W, for short epochs, were not reliably evoked by any method; true changes in duration produced the expected changes in W at much reduced levels (Fig. 10D,G). Changes in W for all types of simulated neuronal effects are shown by the dotted lines in Figure 10E–G. For long epochs, changes in W were produced only by changes in duration, and these appeared to reach their asymptotic true values with a 10-s stimulation epoch (that is, 10 s for condition A and 13 s for condition B in our simulations). Thus, changes in W may be interpreted as changes in neuronal response duration.

## DISCUSSION

To date most fMRI studies have been primarily focused on estimating the magnitude of evoked HRFs across different tasks. However, there is a growing interest in testing other statistics as well, such as the time-to-peak and duration of activation [Bellgowan et al., 2003; Formisano and Goebel, 2003; Richter et al., 2000]. The onset and peak latencies of the HRF can, for instance, provide information about the timing of activation for various brain areas and the width of the HRF provides information about the duration of activation. However, the independence of these parameter estimates has not been properly assessed, as it appears that even if basis functions are independent (or a nonlinear fitting procedure provides nominally independent estimates), the parameter estimates from real data may not be independent.

The present study seeks to both bridge this gap in the literature and present a new estimation method based on the use of inverse logit functions. To assess independence, we determine the amount of confusability between estimates of height (H), time-to-peak (T), and full-width at half-maximum (W) and actual manipulations in the amplitude, time-to-peak and duration of the stimulus. This was investigated using a simulation study that was based on empirical HRFs and illustrated how a variety of popular methods work on actual fMRI data. It is important to note that this is not an exhaustive survey of HRF fitting methods, and some very promising linear methods are not addressed in our simulations [e.g., Henson et al., 2002; Liao et al., 2002]. In addition, Ciuciu et al. [2003] introduced an unsupervised FIR model which estimates its parameters using an EM-type algorithm. This promising approach may potentially improve on the fit of the smoothed (supervised) FIR used here and decrease the amount of confusability present in that model.

In this work we identified the interpretability of parameter estimates and statistical power to detect true effects as two important criteria for a modeling system. Our results show that with any of the models we tested there is some degree of confusability between true differences and estimates. With some models the confusability is profound. For example, delaying the onset of activation by 3 s produced highly reliable changes in estimated response magnitude in most models tested. Even models that attempt to account for delay such as a gamma function with nonlinear fitting [Miezin et al., 2000] or temporal and dispersion derivatives [Calhoun et al., 2004; Friston et al., 1998] showed strong biases. As might be expected, the derivative models and related methods [e.g., Henson et al., 2002; Liao et al., 2002] may be quite accurate for very short shifts in latency (<1 s) but become progressively more inaccurate as the shift increases. The IL model and the smooth FIR model did not show large biases, and the IL model showed by far the least amount of confusability of all the models examined.

The strongest biases were found for all models when the response width was manipulated by extending the HRF at its peak by 4 s. No model was bias-free, but the IL model showed no bias in H and only a slight bias in T (Table IV). This feature may be useful in comparing task conditions that have processes that are extended in time over a number of seconds, such as working memory and expectation/anticipation paradigms and tasks with long separation between phases of trials (e.g., cue–target). Thus, the FIR model sacrifices some interpretability, particularly in dealing with prolonged stimulation periods, for the benefit of power. It may be an excellent choice for modeling shorter-duration events, whereas the IL model may fare better with longer and more variable epochs. In fact, the ability to model both events and extended epochs is a design feature that motivated our development of the IL model.

Notably, the smooth FIR model had the highest power for estimating true effects of all the models (Table IV). The canonical HRF did not fare well because the empirical HRFs on which our study was based tended to peak earlier than the canonical HRF, and because individual differences in the shape and timing of activity were translated into differences in H. The IL and smooth FIR models can account for individual differences in timing and delay without affecting H, which increases power in H estimation. The nonlinear gamma and derivative-based models have a limited ability to do this, and power is lower on average across H and T estimates. Interestingly, the derivative model has high power for estimating H but not T, and vice versa for the nonlinear gamma model. The IL and smooth FIR models are both consistently high in power and less biased than either of the other methods, with the FIR model having higher power, but increased bias compared to the IL model. As for the individual model fits, both the FIR and IL models are able to accurately fit HRF A (Tables I-III). However, the IL model is far more effective at modeling HRF B in all three simulation types, and thereby gives rise to less crosstalk than the FIR model.

### Relationships between Neural Activity and Activation Parameters

As mentioned in the Introduction, problems with parameter interpretability can come from two major sources. This article addresses the simpler issue of whether differences in evoked HRF shape can be accurately captured by a variety of linear models. The best models (IL and smooth FIR) were able to accurately capture changes in HRs with high sensitivity and specificity; that is, changes in one estimate were seldom confused for another. Ultimately, researchers may want to interpret parameter changes in terms of underlying neuronal activity. This is a much more complex problem that involves building physiological models of the sources of BOLD signal [Buxton et al., 1998; Logothetis, 2003; Mechelli et al., 2001; Vazquez and Noll, 1998; Wager et al., 2005].

Based on preliminary analysis using a simple nonlinear model [Wager et al., 2005], it appears that estimated latency differences are not uniquely attributable to neuronal onset delays, but could be caused by true differences in firing rate, delay, or duration. Estimated width differences may generally be attributable to increases in the duration of neuronal activity. For brief events, estimated height differences could be caused by either duration increases or activity magnitude increases. For longer epochs (>8 s) estimated height differences are caused only by increases in firing rate. These results do not render models of the HRF useless; finding differences in HRF time-to-peak among conditions would constitute scientific evidence that may correspond with behavioral performance or distinguish the responses of one brain region from another. In addition, finding a significant difference in T but no difference in W (for brief events) or no difference in H (for long events) may be sufficient evidence to make a claim about differences in neuronal onset latency. Other combinations of significant results may be similarly interpretable, depending on the specifics of the study.

However, this simulation has many limitations, including that it does not attempt to model physiological parameters, and second, that the nonlinearity estimates used do not take into account differences in stimulation density. In these simulations, all models use trains of brief stimuli repeated at 1-s intervals, consistent with the density used in the experiments from which the nonlinearity estimates were derived [Wager et al., 2005]. In addition, the nonlinear model here provides a rough characterization of nonlinearities, which may vary both with brain region and with task. Thus, these results are suggestive, but cannot provide definitive guidelines on the complex issue of how evoked HRF shapes may be related to underlying neuronal activity.

### Choosing a Hemodynamic Response Model

When determining which HRF model to use, the first question one is faced with is how strongly assumptions should be made a priori. Models with few assumptions and many variable parameters have the flexibility to model a large variety of shapes and are able to handle unexpected behavior in the underlying response. However, as the number of parameters in the model increases, the number of degrees of freedom in the statistical tests of the parameters decreases. In addition, it is also much simpler and more statistically powerful to test contrasts across event types (e.g., A – B) on a single parameter such as height than it is to test for differences in multiple parameters (e.g., A1A2A3 – B1B2B3). An ANOVA F-test will accomplish the goal of testing for multiple parameters, but the statistical power of the test decreases sharply as a function of the number of parameters included in the test, and then the problem remains of interpreting which parameters are carrying the difference.

Critically, free parameters in most flexible basis sets are not directly interpretable (e.g., as the response magnitude or latency). Consider, for example, the TD model. Let us denote A1 and B1 as the responses to the canonical HRF for conditions A and B, A2 and B2 the temporal derivatives, and A3 and B3 the dispersion derivatives. One cannot simply fit the basis set and compute the contrast A1 – B1, ignoring the other parameters, and interpret the result as the difference in magnitude between A and B. This is because the amplitude of the fitted response depends on a combination of all three parameters, and so each one is only interpretable in the context of the others.

This suggests that perhaps using a single canonical HRF may be the best choice. If, in fact, the actual shape of the HRF matches the model perfectly and that the shape is invariant across the brain, using a single canonical HRF offers optimal power. However, it is reasonable to assume that the shape of the HRF varies as a function of both task and brain region, and therefore any fixed model will undoubtedly be wrong in much of the brain, and will be wrong to different degrees across individuals. If the model is incorrectly specified, then statistical power decreases and the model may also produce invalid and biased results, as was shown in our study. As is well known in statistics, the fact that a linear model explains a significant amount of the variance in the data is no guarantee that the underlying model is correct. For example, imagine that one conducts an experiment with trials spaced 15 s apart. A canonical HRF such as that used in SPM, consisting of a positive-going gamma function peaking at 6 s and a negative-going gamma function peaking at 16 s, is used to model the response at the onset of each trial. Now imagine that a particular brain region shows activity increases not in response to the trial onset, but in the intertrial interval in preparation for the predictable onset of the next trial. Such a region would be likely to show a *negative* activation, leading the researchers to erroneously infer that the region was deactivated by the task. In fact, in our example it is activated in anticipation of the task. Such potential problems require the checking of assumptions, including that the model is correctly specified, which is difficult to do in brain imaging due to the massive number of tests involved (although methods have been developed [Luo and Nichols, 2003]). Finally, using a canonical HRF provides no way to assess latency and duration and differences between conditions in response latency will be confused for differences in amplitude [Calhoun et al., 2004].

In this work we introduced a new HRF modeling technique, based on the superposition of three inverse logit functions, which attempts to balance flexibility and ease of interpretation. Our study showed the efficiency of the fitting procedure compared with four other commonly used models. In particular, the IL model was by far the most effective at modeling the combination of HRF types A and B for each of the three types of simulations, and therefore gave rise to significantly less crosstalk than the other models. The mayor drawback of our method is that it is relatively time-consuming using a nonlinear fitting procedure. The ultimate speed of the IL model will depend on whether deterministic (e.g., Gauss-Newton, L-M algorithms) or stochastic (simulated annealing) are used. The deterministic algorithms take on the order of 5 times longer than the FIR model, while the simulated annealing algorithm roughly doubles that time. As an alternative to nonlinear least-squares fitting, one could instead use a priori knowledge to specify each parameter in the model, except for the three amplitude parameters, and use the three resulting inverse logit functions as temporal basis functions in the GLM framework. Alternatively, one could follow the methodology outlined in Woolrich et al. [2004] and generate a large number of plausible HRF shapes, by randomly sampling values for the parameters from an appropriate range. Using singular value decomposition one can thereafter find the optimal basis set that spans the space of generated functions and use this set as the temporal basis functions.

## CONCLUSIONS

In this work we introduce a new technique for modeling the HRF, based on the superposition of three inverse logit functions (IL), which balances the need for interpretability and flexibility of the model. In simulations based on actual HRFs, measured on a group of 10 participants, we compare the performance of this model to four other popular choices of basis functions. We show that the IL model can capture magnitude, delay, and duration of activation with less error than the other methods tested, and therefore provides a promising way to flexibly but powerfully test the magnitude and timing of activation across experimental conditions.

## Acknowledgments

The authors thank Andrew Gelman for helpful suggestions in preparing the article.

## APPENDIX

#### Conditions to Ensure Minimal Overlap between the IL Functions

The interpretability of the parameters in the IL model are increased if the first and second and the second and third IL functions are made as orthogonal as possible to one another. This implies that the rise in the first function needs to stabilize prior to the decrease in the second function. In principal the first function will not reach its maximum value of 1 until *t* = ∞. However, one can set a constraint to the effect that the first function needs to complete 99% of its rise prior to the second function completing 1% of its decrease, i.e., assuming *L*_{1}(*t*_{1}) = 0.99 and *L*_{2}(*t*_{2}) = 0.01 then we need to derive constraints that ensure that *t*_{1} < *t*_{2} holds.

To find these constraints we need to reexpress *t*_{1} and *t*_{2} in terms of the parameters of the model. Define *t*_{1} as the time point when *L*_{1}(*t*_{1}) = *c*, where *c* = 0.99 in the example above, but can reasonably be set to take other values as well. This implies that:

Through simple algebra, this equation can be rewritten as:

where *k* = log((*c*^{−1} − 1)^{−1}).

In a similar manner we can rewrite *t*_{2} as:

Combining these two expressions, the condition *t*_{1} < *t*_{2} can be written as

Using exactly the same reasoning, an equivalent condition for minimizing the overlap between the second and third IL function is given by:

#### Parameter Estimates

Assuming the two constraints (33) and (34) hold, the estimates for height, time-to-peak, and width are easily expressed as functions of the parameters of the model.

##### Height

Assuming *c* ≈ 1, the first and second IL function will have minimal overlap and the height can be reasonably estimated as the amplitude of the first logit function, i.e., *H* = δ_{1}.

##### Time-to-peak

Again, assuming *c* ≈ 1, the time-to-peak can be estimated, using Eq. 31, as *T* = *T*_{1} − *D*_{1}*k*.

##### Width

To find the full-width at half-maximum, we need to determine (a) the time point when the first IL function reaches half of its height and (b) the time point when the second IL function crosses 0.5δ_{1}. The time point (a) is simply given by *T*_{1}, so the problem boils down to finding time point (b), i.e., we want to find the time *t** when δ_{2}*L*_{2}(*t**) = 0.5δ_{1}. This implies that,

which can be rewritten as:

Hence, the FWHM is the distance between *t** and *T*_{1}, i.e.:

## Footnotes

^{*}Note, however, that these results do not necessarily hold for processes with a different neuronal density (e.g., spike bursts every 500 ms instead of 1 s), and they are presented mainly for illustrative purposes here.

## References

- Aguirre GK, Zarahn E, D’Esposito M. The variability of human, BOLD hemodynamic responses. Neuroimage. 1998;8:360–369. [PubMed]
- Aguirre GK, Singh R, D’Esposito M. Stimulus inversion and the responses of face and object-sensitive cortical areas. Neuroreport. 1999;10:189–194. [PubMed]
- Bellgowan PS, Saad ZS, Bandettini PA. Understanding neural system dynamics through task modulation and measurement of functional MRI amplitude, latency, and width. Proc Natl Acad Sci U S A. 2003;100:1415–1419. [PMC free article] [PubMed]
- Birn RM, Saad ZS, Bandettini PA. Spatial heterogeneity of the nonlinear dynamics in the fMRI BOLD response. Neuroimage. 2001;14:817–826. [PubMed]
- Buckner RL. The hemodynamic inverse problem: making inferences about neural activity from measured MRI signals. Proc Natl Acad Sci U S A. 2003;100:2177–2179. [PMC free article] [PubMed]
- Buxton RB, Wong EC, Frank LR. Dynamics of blood flow and oxygenation changes during brain activation: the balloon model. Magn Reson Med. 1998;39:855–864. [PubMed]
- Calhoun VD, Stevens MC, Pearlson GD, Kiehl KA. fMRI analysis with the general linear model: removal of latency-induced amplitude bias by incorporation of hemodynamic derivative terms. Neuroimage. 2004;22:252–257. [PubMed]
- Ciuciu P, Poline J-B, Marrelec G, Idier J, Pallier Ch, Benali H. Unsupervised robust non-parametric estimation of the hemodynamic response function for any fMRI experiment. IEEE Trans Med Imag. 2003;22:1235–1251. [PubMed]
- Dale AM, Buckner RL. Selective averaging of rapidly presented individual trials using fMRI. Hum Brain Mapp. 1997;5:329–340. [PubMed]
- Formisano E, Goebel R. Tracking cognitive processes with functional MRI mental chronometry. Curr Opin Neurobiol. 2003;13:174–181. [PubMed]
- Formisano E, Linden DE, Di Salle F, Trojano L, Esposito F, Sack AT, Grossi D, Zanella FE, Goebel R. Tracking the mind’s image in the brain. I. Time-resolved fMRI during visuospatial mental imagery. Neuron. 2002;35:185–194. [PubMed]
- Friston KJ, Josephs O, Rees G, Turner R. Nonlinear event-related responses in fMRI. Magn Reson Med. 1998;39:41–52. [PubMed]
- Glover GH. Deconvolution of impulse response in event-related BOLD fMRI. Neuroimage. 1999;9:416–429. [PubMed]
- Goutte C, Nielsen FA, Hansen LK. Modeling the haemodynamic response in fMRI using smooth FIR filters. IEEE Trans Med Imag. 2000;19:1188–1201. [PubMed]
- Handwerker DA, Ollinger JM, D’Esposito M. Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage. 2004;21:1639–1651. [PubMed]
- Henson RN, Price CJ, Rugg MD, Turner R, Friston KJ. Detecting latency differences in event-related BOLD responses: application to words versus nonwords and initial versus repeated face presentations. Neuroimage. 2002;15:83–97. [PubMed]
- Hernandez L, Badre D, Noll D, Jonides J. Temporal sensitivity of event-related fMRI. Neuroimage. 2002;17:1018–1026. [PubMed]
- Jain RK. Ridge regression and its application to medical data. Comput Biomed Res. 1985;18:363–368. [PubMed]
- Kirkpatrick S, Gerlatt CD, Jr, Vecchi MP. Optimization by simulated annealing. Science. 1983;220:671–680. [PubMed]
- Kruggel F, von Cramon DY. Temporal properties of the hemodynamic response in functional MRI. Hum Brain Mapp. 1999;8:259–271. [PubMed]
- Kruggel F, Wiggins CJ, Herrmann CS, von Cramon DY. Recording of the event-related potentials during functional MRI at 3.0 Tesla field strength. Magn Reson Med. 2000;44:277–282. [PubMed]
- Lange N, Zeger SL. Non-linear Fourier time series analysis for human brain mapping by functional magnetic resonance imaging (with discussion) Appl Stat J R Stat Soc C. 1997;46:1–29.
- Liao CH, Worsley KJ, Poline JB, Aston JA, Duncan GH, Evans AC. Estimating the delay of the fMRI response. Neuroimage. 2002;16:593–606. [PubMed]
- Logothetis NK. The underpinnings of the BOLD functional magnetic resonance imaging signal. J Neurosci. 2003;23:3963–3971. [PubMed]
- Luo WL, Nichols TE. Diagnosis and exploration of massively univariate neuroimaging models. Neuroimage. 2003;19:1014–1032. [PubMed]
- Maccotta L, Zacks JM, Buckner RL. Rapid self-paced event-related functional MRI: feasibility and implications of stimulus-versus response-locked timing. Neuroimage. 2001;14:1105–1121. [PubMed]
- Marrelec G, Benali H, Ciuciu P, Pelegrini-Issac M, Poline JB. Robust Bayesian estimation of the hemodynamic response function in event-related BOLD fMRI using basic physiological information. Hum Brain Mapp. 2003;19:1–17. [PubMed]
- Mechelli A, Price CJ, Friston KJ. Nonlinear coupling between evoked rCBF and BOLD signals: a simulation study of hemodynamic responses. Neuroimage. 2001;14:862–872. [PubMed]
- Menon RS, Luknowsky DC, Gati JS. Mental chronometry using latency-resolved functional MRI. Proc Natl Acad Sci U S A. 1998;95:10902–10907. [PMC free article] [PubMed]
- Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21:1087–1092.
- Miezin FM, Maccotta L, Ollinger JM, Petersen SE, Buckner RL. Characterizing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing. Neuroimage. 2000;11:735–759. [PubMed]
- Noll DC, Cohen JD, Meyer CH, Schneider W. Spiral K-space MR imaging of cortical activation. J Magn Reson Imag. 1995;5:49–56. [PubMed]
- Ollinger JM, Shulman GL, Corbetta M. Separating processes within a trial in event-related functional MRI. Neuroimage. 2001;13:210–217. [PubMed]
- Rajapakse JC, Kruggel F, Maisog JM, von Cramon DY. Modeling hemodynamic response for analysis of functional MRI time-series. Hum Brain Mapp. 1998;6:283–300. [PubMed]
- Richter W, Somorjai R, Summers R, Jarmasz M, Menon RS, Gati JS, Georgeopolous AP, Tegeler C, Ugurbil K, Kim SG. Motor area activity during mental rotation studied by time-resolved single-trial fMRI. J Cogn Neurosci. 2000;12:310–320. [PubMed]
- Riera JJ, Watanabe J, Kazuki I, Naoki M, Aubert E, Ozaki T, Kawashima R. A state-space model of the hemodynamic approach: nonlinear filtering of BOLD signals. Neuroimage. 2004;21:547–567. [PubMed]
- Saad ZS, DeYoe EA, Ropella KM. Estimation of FMRI response delays. Neuroimage. 2003;18:494–504. [PubMed]
- Vazquez AL, Noll DC. Nonlinear aspects of the BOLD response in functional MRI. Neuroimage. 1998;7:108–118. [PubMed]
- Wager TD, Vazquez A, Hernandez L, Noll DC. Accounting for nonlinear BOLD effects in fMRI: parameter estimates and a model for prediction in rapid event-related studies. Neuroimage. 2005;25:206–218. [PubMed]
- Woolrich MW, Behrens TE, Smith SM. Constrained linear basis sets for HRF modelling using variational Bayes. Neuroimage. 2004;21:1748–1761. [PubMed]
- Worsley KJ, Friston KJ. Analysis of fMRI time-series revisited—again. Neuroimage. 1995;2:173–181. [PubMed]
- Zarahn E. Using larger dimensional signal subspaces to increase sensitivity in fMRI time series analyses. Hum Brain Mapp. 2002;17:13–16. [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (2.1M)

- Adaptively and spatially estimating the hemodynamic response functions in fMRI.[Med Image Comput Comput Assist Interv. 2011...]
*Wang J, Zhu H, Fan J, Giovanello K, Lin W.**Med Image Comput Comput Assist Interv. 2011; 14(Pt 2):269-76.* - The impact of temporal regularization on estimates of the BOLD hemodynamic response function: a comparative analysis.[Neuroimage. 2008]
*Casanova R, Ryali S, Serences J, Yang L, Kraft R, Laurienti PJ, Maldjian JA.**Neuroimage. 2008 May 1; 40(4):1606-18. Epub 2008 Jan 26.* - Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses.[Neuroimage. 2004]
*Handwerker DA, Ollinger JM, D'Esposito M.**Neuroimage. 2004 Apr; 21(4):1639-51.* - Model driven EEG/fMRI fusion of brain oscillations.[Hum Brain Mapp. 2009]
*Valdes-Sosa PA, Sanchez-Bornot JM, Sotero RC, Iturria-Medina Y, Aleman-Gomez Y, Bosch-Bayard J, Carbonell F, Ozaki T.**Hum Brain Mapp. 2009 Sep; 30(9):2701-21.* - The continuing challenge of understanding and modeling hemodynamic variation in fMRI.[Neuroimage. 2012]
*Handwerker DA, Gonzalez-Castillo J, D'Esposito M, Bandettini PA.**Neuroimage. 2012 Aug 15; 62(2):1017-23. Epub 2012 Feb 14.*

- Recent developments in optimal experimental designs for functional magnetic resonance imaging[World Journal of Radiology. 2014]
*Kao MH, Temkit M, Wong WK.**World Journal of Radiology. 2014 Jul 28; 6(7)437-445* - Change point estimation in multi-subject fMRI studies[NeuroImage. 2010]
*Robinson LF, Wager TD, Lindquist MA.**NeuroImage. 2010 Jan 15; 49(2)1581-1592* - Larger neural responses produce BOLD signals that begin earlier in time[Frontiers in Neuroscience. ]
*Thompson SK, Engel SA, Olman CA.**Frontiers in Neuroscience. 8159* - MULTISCALE ADAPTIVE SMOOTHING MODELS FOR THE HEMODYNAMIC RESPONSE FUNCTION IN FMRI[The annals of applied statistics. 2013]
*Wang J, Zhu H, Fan J, Giovanello K, Lin W.**The annals of applied statistics. 2013 Jun; 7(2)904-935* - Detection of epileptic activity in fMRI without recording the EEG[NeuroImage. 2012]
*Lopes R, Lina JM, Fahoum F, Gotman J.**NeuroImage. 2012 Apr 15; 60(3)1867-1879*

- Validity and Power in Hemodynamic Response Modeling: A Comparison Study and a Ne...Validity and Power in Hemodynamic Response Modeling: A Comparison Study and a New ApproachNIHPA Author Manuscripts. Aug 2007; 28(8)764PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...