• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS Comput Biol. Mar 2010; 6(3): e1000709.
Published online Mar 12, 2010. doi:  10.1371/journal.pcbi.1000709
PMCID: PMC2837394

Comparing Families of Dynamic Causal Models

Konrad P. Kording, Editor

Abstract

Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data.

Author Summary

Bayesian model comparison provides a formal method for evaluating different computational models in the biological sciences. Emerging application domains include dynamical models of neuronal and biochemical networks based on differential equations. Much previous work in this area has focussed on selecting the single best model. This approach is useful but can become brittle if there are a large number of models to compare and if different subjects use different models. This paper shows that these problems can be overcome with the use of Family Level Inference and Bayesian Model Averaging within model families.

Introduction

Mathematical models of scientific data can be formally compared using Bayesian model evidence [1][3], an approach that is now widely used in statistics [4], signal processing [5], machine learning [6], natural language processing [7], and neuroimaging [8][10]. An emerging area of application is the evaluation of dynamical system models represented using differential equations, both in neuroimaging [11] and systems biology [12][14].

Much previous practice in these areas has focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model [15][18]. This ‘best model’ approach is very useful but, as we shall see, can become brittle if there are a large number of models to compare, or if in the analysis of data from a group of subjects, different subjects use different models (as is the case for a random effects analysis [19]). This brittleness, refers to the fact that which is the best model can depend critically on which set of models are being compared. In random effects analysis, augmenting the comparison set with a single extra model can, for example, reverse the ranking of the best and second best models. To address this issue we propose the combination of two further approaches (i) family level inference and (ii) Bayesian model averaging within families.

We envisage that these methods will be useful for the comparison of large numbers of models (eg. tens, hundreds or thousands). In the context of neuroimaging, for example, inferences about changes in brain connectivity can be made using Dynamic Causal Models [20],[21]. These are differential equation models which relate neuronal activity in different brain areas using a dynamical systems approach. One can then ask a number of generic questions. For example: Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by changes in forward or backward connections? A schematic of a DCM used in this paper is shown in Figure 1. The particular questions we will address in this paper are (i) which regions receive driving input? and (ii) which connections are modulated by other experimental factors?

Figure 1
Dynamic Causal Models.

This paper proposes that the above questions are best answered by ‘Family level inference’. That is inference at the level of model families, rather than at the level of the individual models themselves. As a simple example, in previous work [19] we have considered comparison of a number of DCMs, half of which embodied linear hemodynamics and half nonlinear hemodynamics. The model space was thus partitioned into two families; linear and nonlinear. One can compute the relative evidence of the two model families to answer the question: does my imaging data provide evidence in favour of linear versus nonlinear hemodynamics? This effectively removes uncertainty about aspects of model structure other than the characteristic of interest.

We have provided a simple illustration of this approach in previous work [19]. We now provide a formal introduction to family level inference and describe the key issues. These include, importantly, the issue of how to deal with families that do not contain the same number of models. Additionally, this paper shows how Bayesian model averaging can be used to provide a summary measure of likely parameter values for each model family. We provide an example of family-level inference using data from neuroimaging, a DCM study of auditory word processing, but envisage that the methods can be applied throughout the biological sciences. Before proceeding we note that the use of Bayesian model averaging is a standard approach in the field of Bayesian statistics [4], but has yet to be applied extensively in computational biology. The use of model families is also accomodated naturally within the framework of hierarchical Bayesian models [1] and is proposed to address the well known issue of model dilution [4].

Materials and Methods

This section first briefly reviews DCM and methods for computing the model evidence. We then review the fixed and random effects methods for group level model inference, which differ as to whether or not subjects are thought to use the same or a different model. This includes the description of a novel Gibbs sampling method for random effects model inference that is useful when there are many models to compare. We then show that, for random effects inference, the selection of the single best model can be critically dependent on the set of models that are to be compared. This then motivates the subsequent subsection on family level inference, in which inferences about model characteristics are invariant to the comparison set. We describe family level inference in both a fixed and random effects context. The final subsection then describes a sample-based algorithm for implementing Bayesian model averaging using the notion of model families.

Dynamic Causal Models

Dynamic Causal Modelling is a framework for fitting differential equation models of neuronal activity to brain imaging data using Bayesian inference. The DCM approach can be applied to functional Magnetic Resonance Imaging (fMRI), Electroencephalographic (EEG), Magnetoencephalographic (MEG), and Local Field Potential (LFP) data [22]. The empirical work in this paper uses DCM for fMRI. DCMs for fMRI comprise a bilinear model for the neurodynamics and an extended Balloon model [23] for the hemodynamics. The neurodynamics are described by the following multivariate differential equation

equation image
(1)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e011.jpg indexes continuous time and the dot notation denotes a time derivative. The An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e012.jpgth entry in An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e013.jpg corresponds to neuronal activity in the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e014.jpgth region, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e015.jpg is the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e016.jpgth experimental input.

A DCM is characterised by a set of ‘exogenous connections’, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e017.jpg, that specify which regions are connected and whether these connections are unidirectional or bidirectional. We also define a set of input connections, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e018.jpg, that specify which inputs are connected to which regions, and a set of modulatory connections, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e019.jpg, that specify which intrinsic connections can be changed by which inputs. The overall specification of input, intrinsic and modulatory connectivity comprise our assumptions about model structure. This in turn represents a scientific hypothesis about the structure of the large-scale neuronal network mediating the underlying cognitive function. A schematic of a DCM is shown in Figure 1.

In DCM, neuronal activity gives rise to fMRI activity by a dynamic process described by an extended Balloon model [24] for each region. This specifies how changes in neuronal activity give rise to changes in blood oxygenation that are measured with fMRI. It involves a set of hemodynamic state variables, state equations and hemodynamic parameters, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e020.jpg. In brief, for the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e021.jpgth region, neuronal activity An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e022.jpg causes an increase in vasodilatory signal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e023.jpg that is subject to autoregulatory feedback. Inflow An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e024.jpg responds in proportion to this signal with concomitant changes in blood volume An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e025.jpg and deoxyhemoglobin content An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e026.jpg.

equation image
(2)

Outflow is related to volume An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e028.jpg through Grubb's exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e029.jpg [20]. The oxygen extraction is a function of flow An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e030.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e031.jpg is resting oxygen extraction fraction. The Blood Oxygenation Level Dependent (BOLD) signal is then taken to be a static nonlinear function of volume and deoxyhemoglobin that comprises a volume-weighted sum of extra- and intra-vascular signals [20]

equation image
(3)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e033.jpg is resting blood volume fraction. The hemodynamic parameters comprise An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e034.jpg and are specific to each brain region. Together these equations describe a nonlinear hemodynamic process that converts neuronal activity in the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e035.jpg th region An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e036.jpg to the fMRI signal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e037.jpg (which is additionally corrupted by additive Gaussian noise). Full details are given in [20],[23].

In DCM, model parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e038.jpg are estimated using Bayesian methods. Usually, the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e039.jpg parameters are of greatest interest as these describe how connections between brain regions are dependent on experimental manipulations. For a given DCM indexed by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e040.jpg, a prior distribution, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e041.jpg is specified using biophysical and dynamic constraints [20]. The likelihood, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e042.jpg can be computed by numerically integrating the neurodynamic (equation 1) and hemodynamic processes (equation 2). The posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e043.jpg is then estimated using a nonlinear variational approach described in [23],[25]. Other Bayesian estimation algorithms can, of course, be used to approximate the posterior density. Reassuringly, posterior confidence regions found using the nonlinear variational approach have been found to be very similar to those obtained using a computationally more expensive sample-based algorithm [26].

Model Evidence

This section reviews methods for computing the evidence for a model, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e044.jpg, fitted to a single data set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e045.jpg. Bayesian estimation provides estimates of two quantities. The first is the posterior distribution over model parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e046.jpg which can be used to make inferences about model parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e047.jpg. The second is the probability of the data given the model, otherwise known as the model evidence. In general, the model evidence is not straightforward to compute, since this computation involves integrating out the dependence on model parameters

equation image
(4)

A common technique for approximating the above integral is the Variational Bayes (VB) approach [27]. This is an analytic method that can be formulated by analogy with statistical physics as a gradient ascent on the ‘negative variational Free Energy’ (or Free Energy for short), An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e049.jpg, of the system. This quantity is related to the model evidence by the relation [27],[28]

equation image
(5)

where the last term in Eq.(5) is the Kullback-Leibler (KL) divergence between an ‘approximate’ posterior density, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e051.jpg, and the true posterior, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e052.jpg. This quantity is always positive, or zero when the densities are identical, and therefore An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e053.jpg is bounded below by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e054.jpg. Because the evidence is fixed (but unknown), maximising An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e055.jpg implicitly minimises the KL divergence. The Free Energy then becomes an increasingly tighter lower bound on the desired log-model evidence. Under the assumption that this bound is tight, model comparison can then proceed using An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e056.jpg as a surrogate for the log-model evidence.

The Free Energy is but one approximation to the model evidence, albeit one that is widely used in neuroimaging [29],[30]. A simpler approximation, the Bayesian Information Criterion (BIC) [11], uses a fixed complexity penalty for each parameter. This is to be compared with the free energy approach in which the complexity penalty is given by the KL-divergence between the prior and approximate posterior [11]. This allows parameters to be differentially penalised. If, for example, a parameter is unchanged from its prior, there will be no penalty. This adaptability makes the Free Energy a better approximation to the model evidence, as has been shown empirically [6],[31].

There are also a number of sample-based approximations to the model evidence. For models with small numbers of parameters the Posterior Harmomic Mean provides a good approximation. This has been used in neuroscience applications, for example, to infer based on spike data whether neurons are responsive to particular features, and if so what form the dependence takes [32]. For models with a larger number of parameters the evidence can be well approximated using Annealed Importance Sampling (AIS) [33]. In a comparison of sample-based methods using synthetic data from biochemical networks, AIS provided the best balance between accuracy and computation time [13]. In other comparisons, based on simulation of graphical model structures [6] the Free Energy method approached the performance of AIS and clearly outperformed BIC. In this paper model evidence is approximated using the Free Energy.

Fixed Effects Analysis

Neuroimaging data sets usually comprise data from multiple subjects as the perhaps subtle cognitive effects one is interested in are often only manifest at the group level. In this and following sections we therefore consider group model inference where we fit models An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e057.jpg to data from subjects An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e058.jpg. Every model is fitted to every subjects data. In Fixed Effects (FFX) Analysis it is assumed that every subject uses the same model, whereas Random Effects (RFX) Analysis allows for the possibility that different subjects use different models. This section focusses on FFX.

Given that our overall data set, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e059.jpg, which comprises data for each subject, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e060.jpg, is independent over subjects, we can write the overall model evidence as

equation image
(6)

Bayesian inference at the model level can then be implemented using Bayes rule

equation image
(7)

Under uniform model priors, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e063.jpg, the comparison of a pair of models, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e064.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e065.jpg, can be implemented using the Bayes Factor which is defined as the ratio of model evidences

equation image
(8)

Given only two models and uniform priors, the posterior model probability is greater than 0.95 if the BF is greater than twenty. Bayes Factors have also been stratified into different ranges deemed to correspond to different strengths of evidence. ‘Strong’ evidence, for example, corresponds to a BF of over twenty [34]. Under non-uniform priors, pairs of models can be compared using Odds Ratios. The prior and posterior Odds Ratios are defined as

equation image
(9)
equation image

resepectively, and are related by the Bayes Factor

equation image
(10)

When comparing two models across a group of subjects, one can multiply the individual Bayes factors (or exponentiate the sum of log evidence differences); this is referred to as the Group Bayes Factor (GBF) [16]. As is made clear in [19] the GBF approach implicitly assumes that every subject uses the same model. It is therefore a Fixed Effects analysis. If one believes that the optimal model structure is identical across subjects, then an FFX approach is entirely valid. This assumption is warranted when studying a basic physiological mechanism that is unlikely to vary across subjects, such as the role of forward and backward connections in visual processing [35].

Random Effects Analysis

An alternative procedure for group level model inference allows for the possibility that different subjects use different models. This may be the case in neuroimaging when investigating pathophysiological mechanisms in a spectrum disease or when dealing with cognitive tasks that can be performed with different strategies. RFX inference is based on the characteristics of the population from which the subjects are drawn. Given a candidate set of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e070.jpg models, we denote An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e071.jpg as the frequency with which model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e072.jpg is used in the population. We also refer to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e073.jpg as the model probability.

We define a prior distribution over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e074.jpg which in this paper, and in previous work [19], is taken to be a Dirichlet density (but see later)

equation image
(11)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e076.jpg is a normalisation term and the parameters, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e077.jpg, are strictly positively valued and can be interpreted as the number of times model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e078.jpg has been observed or selected. For An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e079.jpg the density is convex in An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e080.jpg-space, whereas for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e081.jpg it is concave.

Given that we have drawn An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e082.jpg subjects from the population of interest we then define the indicator variable An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e083.jpg as equal to unity if model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e084.jpg has been assigned to subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e085.jpg. The probability of the ‘assignation vector’, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e086.jpg, is then given by the multinomial density

equation image
(12)

The model evidence, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e088.jpg, together with the above densities for model probabilities and model assignations constitutes a generative model for the data, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e089.jpg (see figure 1 in [19]). This model, can then be inverted to make inferences about the model probabilites from experimental data. Such an inversion has been described in previous work, which developed an approximate inference procedure based on a variational approximation [19] (this was in addition to the variational approximation used to compute the Free Energy for each model). The robustness and accuracy of this method was verified via simulations using data from synthetic populations with known frequencies of competing models [19]. This algorithm produces an approximation to the posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e090.jpg on which subsequent RFX inferences are based.

As we shall see in the following section, unbiased family level inferences require uniform priors over families. This requires that the prior model counts, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e091.jpg, take on very small values (see equation 24). These values become smaller as the number of models in a family increases. It turns out that although the variational algorithm is robust for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e092.jpg, it is not accurate for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e093.jpg. This is a generic problem with the VB approach and is explained further in the the supporting material (see file Text S1). For this reason, in this paper we choose to take a Gibbs sampling instead of a VB approach. Additionally, the use of Gibbs sampling allows us to relax the assumption made in VB that the posterior densities over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e094.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e095.jpg factorise [19]. Gibbs sampling is the Monte-Carlo method of choice when it is possible to iteratively sample from the conditional posteriors [1]. Fortunately, this is the case with the RFX models as we can iterate between sampling from An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e096.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e097.jpg. Such iterated sampling eventually produces samples from the marginal posteriors An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e098.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e099.jpg by allowing for a sufficient burn-in period after which the Markov-chain will have converged [1]. The procedure is described in the following section.

Gibbs sampling for random effects inference over models

First, model probabilites are drawn from the prior distribution

equation image
(13)

where by default we set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e101.jpg for all An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e102.jpg (but see later). For each subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e103.jpg and model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e104.jpg we use the model evidences from model inversion to compute

equation image
(14)
equation image

Here, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e107.jpg is our posterior belief that model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e108.jpg generated the data from subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e109.jpg (these posteriors will be used later for Bayesian model averaging). For each subject, model assignation vectors are then drawn from the multinomial distribution

equation image
(15)

We then compute new model counts

equation image
(16)

and draw new model probabilities

equation image
(17)

Equations 14 to 17 are then iterated An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e113.jpg times. For the results in this paper we used a total of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e114.jpg samples and discarded the first 10,000. These remaining samples then constitute our approximation to the posterior distribution An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e115.jpg. From this density we can compute usual quantities such as the posterior expectation, denoted An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e116.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e117.jpg. This completes the description of model level inference.

The above algorithm was derived for Dirichlet priors over model probabilities (see equation 11). The motivation for the Dirichlet form originally derived from the use of a free-form VB approximation [27] in which the optimal form for the approximate posterior density over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e118.jpg would be a Dirichlet if the prior over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e119.jpg was also a Dirichlet. This is not a concern in the context of Gibbs sampling. In principle any prior density over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e120.jpg will do, but for continuity with previous work we follow the Dirichlet approach.

We end this section by noting that the Gibbs sampling method is to be preferred over the VB implementation for model level inferences in which the number of models exceeds the number of subjects, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e121.jpg. This is because it is important that the total prior count, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e122.jpg, does not dominate over the number of subjects, otherwise posterior densities will be dominated by the prior rather than the data. This is satisfied, for example, by An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e123.jpg. However, as described in the the supporting material (see file Text S1), the VB implementation does not work well for small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e124.jpg. But if we wish to compare a small number of models then VB is the preferred method because it is faster as well as being accurate, as shown in previous simulations [19].

Comparison Set

We have so far described procedures for Bayesian inference over models An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e125.jpg. These models comprise the comparison set, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e126.jpg. This section points out a number of generic features of Bayesian model comparison.

First, for any data set there exists an infinite number of possible models that could explain it. The purpose of model comparison is not to discover a ‘true’ model, but to determine that model, given a set of plausible alternatives, which is most ‘useful’, ie. represents an optimal balance between accuracy and complexity. In other words Bayesian model inference has nothing to say about ‘true’ models. All that it provides is an inference about which is more likely, given the data, among a set of candidate models.

Second, we emphasise that posterior model probabilities depend on the comparison set. For FFX inference this can be clearly seen in equation 7 where the denominator is given by a sum over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e127.jpg. Similarly, for RFX inference, the dependence of posterior model probabilities on the comparison set can be seen in equation 14. Other factors being constant, posterior model probabilities are therefore likely to be smaller for larger An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e128.jpg.

Our third point relates to the ranking of models. For FFX analysis the relative ranking of a pair of models is not dependent on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e129.jpg. That is, if An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e130.jpg then An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e131.jpg for any two comparison sets An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e132.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e133.jpg that contain models An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e134.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e135.jpg. This follows trivially from equation 7 as the comparison set acts only as a normalisation term.

However, for group random effects inference the ranking of models can be critically dependent on the comparison set. That is, if An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e136.jpg then it could be that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e137.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e138.jpg is the posterior expected probability of model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e139.jpg given comparison set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e140.jpg. The same holds for other quantities derived from the posterior over An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e141.jpg, such as the exceedance probability (see [19] and later). This means that the decision as to which is the best model depends on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e142.jpg. This property arises because different subjects can use different models and we illustrate it with the following example.

Consider that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e143.jpg comprises just two models An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e144.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e145.jpg. Further assume that we have An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e146.jpg subjects and model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e147.jpg is preferred by 7 of these subjects and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e148.jpg by the remaining 10. We assume, for simplicity, that the degrees of preference (ie differences in evidence) are the same for each subject. The quantity An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e149.jpg then simply reflects the proportion of subjects that prefer model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e150.jpg [19]. So An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e151.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e152.jpg and for comparison set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e153.jpg model 2 is the highest ranked model. Although the differences in posterior expected values are small the corresponding differences in exceedance probabilities will be much greater. Now consider a new comparison set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e154.jpg that contains an addditional model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e155.jpg. This model is very similar to model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e156.jpg such that, of the ten subjects who previously preferred it, six still do but four now prefer model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e157.jpg. Again, assuming identical degrees of preference, we now have An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e158.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e159.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e160.jpg. So, for comparison set An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e161.jpg model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e162.jpg is now the best model. So which is the best model: model one or two?

We suggest that this seeming paradox shows, not that group random effects inference is unreliable, but that it is not always appropriate to ask which is the best model. As is usual in Bayesian inference it is wise to consider the full posterior density rather than just the single maximum posterior value. We can ask what is common to models two and three. Perhaps they share some structural assumption such as the existence of certain connections or other characteristic such as nonlinearity. If one were to group the models based on this characteristic then the inference about the characteristic would be robust. This notion of grouping models together is formalised using family-level inference which is described in the following section. One can then ask: of the models that have this characteristic what are the typical parameter values? This can be addressed using Bayesian Model Averaging within families.

Family Inference

To implement family level inference one must specify which models belong to which families. This amounts to specifying a partition, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e163.jpg, which splits S into An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e164.jpg disjoint subsets. The subset An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e165.jpg contains all models belonging family An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e166.jpg and there are An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e167.jpg models in the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e168.jpg th subset.

Different questions can be asked by specifying different partitions. For example, to test model space for the ‘effect of linearity’ one would specify a partition into linear and nonlinear subsets. One could then test the same model space for the ‘effect of seriality’ using a different partition comprising serial and parallel subsets. The subsets must be non-overlapping and their union must be equal to S. For example, when testing for effects of “seriality”, some models may be neither serial or parallel; these models would then define a third subset.

The usefulness of the approach is that many models (perhaps all models) are used to answer (perhaps) all questions. This is similar to factorial experimental designs in psychology [36] where data from all cells are used to assess the strength of main effects and interactions. We now relate the two-levels of inference: family and model.

Fixed effects

To avoid any unwanted bias in our inference we wish to have a uniform prior at the family level

equation image
(18)

Given that this is related to the model level as

equation image
(19)

the uniform family prior can be implemented by setting

equation image
(20)

The posterior distribution over families is then given by summing up the relevant posterior model probabilities

equation image
(21)

where the posterior over models is given by equation 7. Because posterior probabilities can be very close to unity we will sometimes quote one minus the posterior probability. This is the combined probability of the alternative hypotheses which we refer to as the alternative probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e173.jpg.

Random effects

The family probabilities are given by

equation image
(22)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e175.jpg is the frequency of the family of models in the population. We define a prior distribution over this probability using a Dirichlet density

equation image
(23)

A uniform prior over family probabilities can be obtained by setting An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e177.jpg for all An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e178.jpg. From equations 13 and 22 we see that this can be achieved by setting

equation image
(24)

We can then run the Gibbs sampling method described above for drawing samples from the posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e180.jpg. Samples from the family probability posterior, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e181.jpg, can then be computed using equation 22.

The posterior means, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e182.jpg, are readily computed from these samples. Another option is to compute an exceedance probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e183.jpg, which corresponds to the belief that family An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e184.jpg is more likely than any other (of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e185.jpg families compared), given the data from all subjects:

equation image
(25)

Exceedance probabilities are particularly intuitive when comparing just two families as they can be written:

equation image
(26)

Family level inference addresses the issue of ‘dilution’ in model selection [4]. If one uses uniform model priors and many models are similar, then excessive prior probability is allocated to this set of similar models. One way of avoiding this problem is to use priors which dilute the probability within subsets of similar models ([4]). Grouping models into families, and setting model priors according to eg. equation 24, achieves exactly this.

Bayesian Model Averaging

So far, we have dealt with inference on model-space, using partitions into families. We now consider inference on parameters. Usually, the key inference is on models, while the maximum a posteriori (MAP) estimates of parameters are reported to provide a quantitative interpretation of the best model (or family). Alternatively, people sometimes use subject-specific MAP estimates as summary statistics for classical inference at the group level. These applications require only a point (MAP) estimate. However for completeness, we now describe how to access the full posterior density on parameters, from which MAP estimates can be harvested.

The basic idea here is to use Bayesian model averaging within a family; in other words, summarise family-specific coupling parameters in a way that avoids brittle assumptions about any particular model. For example, the marginal posterior for subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e188.jpg and family An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e189.jpg is

equation image
(27)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e191.jpg is our variational approximation to the subject specific posterior and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e192.jpg is the posterior probability that subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e193.jpg uses model An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e194.jpg. We could take this to be An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e195.jpg under the FFX assumption that all subjects use the same model, or An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e196.jpg under the RFX assumption that each subject uses their own model (see equation 14).

Finally, to provide a single posterior density over subjects one can define the parameters for an average subject

equation image
(28)

and compute the posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e198.jpg from the above relation and the individual subject posteriors from equation 27.

Equation 27 arises from a straightforward application of probability theory in which a marginal probability is computed by marginalising over quantities one is uninterested in (see also equation 4 for marginalising over parameters). Use of equation 27 in this context is known as Bayesian Model Averaging (BMA) [4],[37]. In neuroimaging BMA has previously been used for source reconstruction of MEG and EEG data [9]. We stress that no additional assumptions are required to implement equation 27.

One can make An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e199.jpg small or large. If we make An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e200.jpg, the entire model-space, the posteriors on the parameters become conventional Bayesian model averages where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e201.jpg. Conversely, if we make An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e202.jpg, a single model, we get conventional parameter inference of the sort used when selecting the best model; i.e., An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e203.jpg. This is formally identical to using An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e204.jpg under the assumption that the posterior model density is a point mass at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e205.jpg. More generally, we want to average within families of similar models that have been identified by inference on families.

One can see from equation 27 that models with low probability contribute little to the estimate of the marginal density. This property can be made use of to speed up the implementation of BMA by excluding low probability models from the summation. This can be implemented by including only models for which

equation image
(29)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e207.jpg is the minimal posterior odds ratio. Models satisfying this criterion are said to be in Occam's window [38]. The number of models in the window, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e208.jpg, is a useful indicator as smaller values correspond to peakier posteriors. In this paper we use An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e209.jpg. We emphasise that the use of Occam's window is for computational expedience only.

Although it is fairly simple to compute the MAP estimates of the Bayesian parameter (MAP) averages analytically, the full posteriors per se have a complicated form. This is because they are mixtures of Gaussians (and delta functions for models where some parameters are precluded a priori). This means the posteriors can be multimodal and are most simply evaluated by sampling. The sampling approach can be implemented as follows. This generates An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e210.jpg samples from the posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e211.jpg. For each sample, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e212.jpg, and subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e213.jpg we first select a model as follows. For RFX we draw from

equation image
(30)

where the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e215.jpgth element of the vector An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e216.jpg is the posterior model probability for subject An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e217.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e218.jpg (we will use the expected values from equation 14). For FFX the model probabilities are the same for all subjects and we draw from

equation image
(31)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e220.jpg is the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e221.jpg vector of posterior model probabilities with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e222.jpgth element equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e223.jpg. For each subject one then draws a single parameter vector, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e224.jpg from the subject and model specific posterior

equation image
(32)

These An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e226.jpg samples can then be averaged to produce a single sample

equation image
(33)

One then generates another sample by repeating steps 30/31, 32 and 33. The An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e228.jpg samples then provide a sample-based representation of the posterior density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e229.jpg from which the usual posterior means and exceedance probabilities can be derived. Model averaging can also be restricted to be within-subject (using equations 30/31 and 32 only). Summary statistics from the resulting within-subject densities can then be entered into standard random effects inference (eg using t-tests) [19].

For any given parameter, some models assume that the parameter is zero. Other models allow it to be non-zero and its value is estimated. The posterior densities from equation 27 will therefore include a delta function at zero, the height of which corresponds to the posterior probability mass of models which assume that the parameter is zero. For the applications in this paper, the posterior densities from equation 27 will therefore correspond to a mixture of delta functions and Gaussians because An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e230.jpg for DCMs have a Gaussian form. This is reminiscent of the model selection priors used in [39] but in our case we have posterior densities.

Results

We illustrate the methods using neuroimaging data from a previously published study on the cortical dynamics of intelligible speech [17]. This study applied dynamic causal modelling of fMRI responses to investigate activity among three key multimodal regions: the left posterior and anterior superior temporal sulcus (subsequently referred to as regions P and A respectively) and pars orbitalis of the inferior frontal gyrus (region F). The aim of the study was to see how connections among regions depended on whether the auditory input was intelligible speech or time-reversed speech. Full details of the experimental paradigm and imaging parameters are available in [17].

An example DCM is shown in figure 1. Other models varied as to which regions received direct input and which connections could be modulated by ‘speech intelligibility’. Given that each intrinsic connection can be either modulated or not, there are An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e231.jpg possible patterns of modulatory connections. Given that the auditory stimulus is either a direct input to a region or is not there are An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e232.jpg possible patterns of input connectivity. But we discount models without any input so this leaves 7 input patterns. The 64 modulatory patterns were then crossed with the 7 input patterns producing a total of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e233.jpg different models. These models were fitted to data from a total of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e234.jpg subjects (see [17] for details). Overall An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e235.jpg DCMs were fitted. The next two sections focus on family level inference. As this is a methodological paper we present results using both an FFX and RFX approach (ordinarily one would use either FFX or RFX alone).

Input Regions

Our first family level inference concerns the pattern of input connectivity. To this end we assign each of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e236.jpg models to one of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e237.jpg input pattern families. These are family A (models 1 to 64), F (65 to 128), P (129 to 192), AF (193 to 256), PA (257 to 320), PF (321 to 384) and PAF (285 to 448). Family PA, for example, has auditory inputs to both region P and A.

The first two numerical columns of Table 1 show the posterior family probabilities from an FFX analysis computed using equation 21. These are overwhelmingly in support of models in which region P alone receives auditory input (alternative probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e238.jpg). The last two columns in Table 1 show the corresponding posterior expectations and exceedance probabilities from an RFX analysis computed using equation 25. The conclusions from RFX analysis are less clear cut. But we can say, with high confidence (total exceedance probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e239.jpg) that either region A alone or region P alone receives auditory input. Out of these two possibilities it is much more likely that region P alone receives auditory input (exceedance probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e240.jpg) rather than region A (exceedance probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e241.jpg). Figure 2 shows the posterior distributions An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e242.jpg, from an RFX analysis, for each of the model families.

Figure 2
RFX posterior densities for input families.
Table 1
Inference over input families.

Forward versus Backward

Having established that auditory input most likely enters region An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e254.jpg we now turn to a family level inference regarding modulatory structure. For this inference we restrict our set of candidate models, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e255.jpg, to the 64 models receiving input to region An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e256.jpg. We then assign each of these models to one of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e257.jpg modulatory families. These were specified by first defining a hierarchy with region P at the bottom, A in the middle and F at the top; in accordance with recent studies that tend to place F above A in the language hierarchy [40]. For each structure we then counted the number of forward, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e258.jpg, and backward, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e259.jpg, connections and defined the following families: predominantly forward (F, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e260.jpg), predominantly backward (B, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e261.jpg), balanced (BAL, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e262.jpg), or None.

The first two numerical columns of Table 2 show the posterior family probabilities from an FFX analysis. We can say, with high confidence (total posterior probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e263.jpg) that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e264.jpg. The last two columns in Table 2 show the posterior expectations and exceedance probabilities from an RFX analysis. These were computed from the posterior densities shown in Figure 3. The conclusions we draw, in this case, are identical to those from the FFX analysis. That is, we can say, with high confidence (total exceedance probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e265.jpg) that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e266.jpg.

Figure 3
RFX Posterior densities for modulatory families.
Table 2
Inference over modulatory families.

Relating Family and Model Levels

Family level posteriors are related to model level posteriors via summation over family members according to equation 21 for FFX and equation 22 for RFX. Figure 4 shows the how the posterior probabilities over input families break down into posterior probabilities for individual models. Figure 5 shows the same for the modulatory families.

Figure 4
Model level inference for input families.
Figure 5
Model level inference for modulatory families.

The maximum posterior model for the input family inference is model number 185 having posterior probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e278.jpg. Given that all families have the same number of members, the model priors are uniform, so the maximum posterior model is also the one with highest aggregate model evidence. This model has input to region P and modulatory connections as shown in Figure 6(a).

Figure 6
Likely models.

The model evidence for the DCMs fitted in this paper was computed using the free energy approximation. This is to be contrasted with previous work in which (the most conservative of) AIC and BIC was used [17]. One notable difference arising from this distinction is that the top-ranked models in [17] contained significantly fewer connections than those in this paper (one sample t-test, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e287.jpg). The top 10 models in [17] contained an average 2.4 modulatory connections whereas those in this paper contained an average of 4.5. This difference reflects the fact that the AIC/BIC approximation to the log evidence penalizes models for each additional connection (parameter) without considering interdependencies or covariances amongst parameters, whereas the free energy approximation takes such dependencies into account.

Model Averaging

We now follow up the family-level inferences about input connections with Bayesian model averaging. As previously discussed, this is especially useful when the posterior model density is not sharply peaked, as is the case here (see Figure 4. All of the averaging results in this paper are obtained with an Occam's window defined using a minimal posterior odds ratio of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e288.jpg.

For FFX inference the input was inferred to enter region P only. We therefore restrict the averaging to those 64 models in family P. This produces 16 models in Occam's window (itself indicating that the posterior is not sharply peaked). The worst one is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e289.jpg with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e290.jpg. The posterior odds of the best relative to the worst is only An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e291.jpg (the largest it could be is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e292.jpg), meaning these models are not significantly better than one another. Four of the models in Occam's window are shown in Figure 6. Figure 7 shows the posterior densities of average modulatory connections (averaging over models and subjects). The height of the delta functions in these histograms correspond to the total posterior probability mass of models which assume that the connection is zero.

Figure 7
Average Modulatory Connections from FFX for input family P.

For RFX inference the input was inferred to most likely enter region P alone (posterior exceedance probability, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e293.jpg). In the RFX model averaging the Occam's windowing procedure was specific to each subject, thus each subject can have a different number of models in Occam's window. For the input model P family there were an average of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e294.jpg models in Occam's window and Figure 8 shows the posterior densities of the average modulatory connections (averaging over models and subjects). Both the RFX and FFX model averages within family P show that only connections from P to A, and from P to F, are facilitated by speech intelligibility.

Figure 8
Average Modulatory Connections from RFX for input family P.

Discussion

This paper has investigated the formal comparison of models using Bayesian model evidence. Previous application of the method in the biological sciences has focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. We have shown that this ‘best model’ approach, though useful when the number of models is small, can become brittle if there are a large number of models, and if different subjects use different models.

To overcome this shortcoming we have proposed the combination of two further approaches (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic one is interested in. Bayesian model averaging can then be used to provide a summary measure of likely parameter values for each family.

We have applied these approaches to neuroimaging data, specifically a DCM study of auditory word processing using fMRI. Our results indicate that spoken words most likely stimulate a region in posterior STS and that if the word is intelligible connections are strengthened both to anterior STS and an inferior frontal region. These conclusions were drawn based on family level inference and Bayesian model averaging.

The model evidence for the DCMs fitted in this paper was computed using the free energy approximation whereas previous work used (the most conservative of) AIC and BIC [17]. This resulted in the highly ranked models containing significantly more connections than in the previous study. This is due to a bias in the AIC/BIC criterion which leads to overly simple models being selected. Previous work in graphical models favours the free energy approach over BIC [6] and work on biochemical models finds AIS to be the best of the more computationally expensive sampling methods. The relative merits of the different model selection criteria, as applied to brain imaging models and data, will be addressed in a future publication. The family level inference procedures described in this paper can be applied whatever method is used for estimating the model evidence.

Interestingly, the use of BMA produced an average network structure with speech input to region P, and modulatory connections from P to A and from P to F. This is exactly the winning model from earlier work [17] (based on AIC/BIC approximation of model evidence). It is not, however, the best model as indicated by the free energy. The model with the highest free energy (see figure 6(a)) does not, however, have significantly higher evidence than the second best model, or indeed, any model in Occam's window. This indicates that in the particular example we have studied the use of Bayes factors or posterior odds ratios would be inconclusive, whereas clear conclusions can be drawn from family level inference.

This paper has also introduced a Gibbs sampling method for RFX model level inference when the number of models is large. This sampling method should be preferred to the previously suggested VB method [19] when the number of models exceeds the number of subjects (ie. An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e295.jpg). We do emphasise, however, that for RFX model level inferences involving a small number of models (as in previous work [19]) the VB approach is perfectly valid, and is indeed the preferred approach because it is faster.

The issue of family versus model level inference is orthogonal to the issue of random versus fixed effects analysis. The same critera re. FFX versus RFX apply at the family level as at the model level. For the data in this paper one might use RFX analysis as auditory word processing is part of the high level language system and one expect might expect differences in the neuronal instantiation (eg. lateralisation). If the issue remains unclear one could adopt a more pragmatic approach by first implementing a FFX analysis, and if there appear to be outlying subjects, then one could follow this up with an RFX analysis.

Family level inferences under FFX assumptions are simple to implement. Families with (the same and) different numbers of models are accommodated by setting model priors using equation 20, model posteriors are computed using equation 7, and family level posteriors using equation 21. This is a simple non-iterative procedure. Family level inferences under RFX assumptions are more subtle and have been the main focus of this paper. Families with (equal and) unequal numbers of models are accommodated using the model priors in equation 24, model posteriors are computed using an iterative Gibbs sampling procedure, and family level posteriors are computed using equation 22. We envisage that family level inference under RFX assumptions will be particularly useful in neuroimaging studies of high level cognition or for clinical groups where there is a high degree of intersubject variability. Where subjects can be clearly divided into two or more groups on behavioural or other grounds (e.g. patients and controls), then it would be correct to group the models accordingly, and proceed with a between group analysis on selected parameters of the averaged models.

Finally, we comment on the broader issue of comparison of discrete models (the ‘Discrete’ approach adopted in this work) versus a hierarchical approach embodying Automatic Relevance Determination (ARD) in which irrelevant connections are ‘switched off’ during model fitting [41] (for the case of DCMs the ARD approach is currently hypothetical as no such algorithm has yet been implemented). The ARD approach provides an estimate of the marginal density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e296.jpg directly without recourse to Bayesian model averaging. The Discrete approach allows for quantitative family-level inferences about issues such as whether processing is serial or parallel, linear or nonlinear. Additionally, Bayesian Model Averaging can be used with the Discrete approach to provide estimates of the marginal density An external file that holds a picture, illustration, etc.
Object name is pcbi.1000709.e297.jpg. Overall, the ARD approach is probably the preffered method if one is solely interested in the marginal density over parameters, because it will likely be faster. If one is additionally interested in quantitative family-level inference then the Discrete approach would be the method of choice.

We expect that the comparison of model families will prove useful for a range of model comparison applications in biology, from connectivity models of brain imaging data, to behavioural models of learning and decision making, and dynamical models in molecular biology.

Supporting Information

Text S1

Supplementary Information

(0.08 MB PDF)

Acknowledgments

We thank Uta Noppeney and Dominich Bach for providing examples where the ranking of models from group random effects inference is critically dependent on the comparison set. We thank Nelson Trujillo-Barreto for discussions regarding dilution in model selection.

Footnotes

The authors have declared that no competing interests exist.

This work was supported by the Wellcome Trust. JD and KES also acknowledge support from Systems X, the Swiss Systems Biology Initiative and the University Research Priority Program “Foundations of Human Social Behavior” at the University of Zurich. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. Boca Raton: Chapman and Hall; 1995.
2. Bernardo J, Smith A. Bayesian Theory. Chichester: Wiley; 2000.
3. Mackay D. Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press; 2003.
4. Hoeting J, Madigan D, Raftery A, Volinsky C. Bayesian Model Averaging: A Tutorial. Statistical Science. 1999;14:382–417.
5. Penny W, Roberts S. Bayesian multivariate autoregresive models with structured priors. IEE Proceedings on Vision, Image and Signal Processing. 2002;149:33–41.
6. Beal M, Ghahramani Z. The Variational Bayesian EM algorithms for incomplete data: with application to scoring graphical model structures. In: Bernardo J, Bayarri M, Berger J, Dawid A, editors. Bayesian Statistics 7, Cambridge University Press; 2003.
7. Kemp C, Perfors A, Tenenbaum JB. Learning overhypotheses with hierarchical Bayesian models. Dev Sci. 2007;10:307–21. [PubMed]
8. Penny W, Kiebel S, Friston K. Variational Bayesian Inference for fMRI time series. NeuroImage. 2003;19:727–741. [PubMed]
9. Trujillo-Barreto N, Aubert-Vazquez E, Valdes-Sosa P. Bayesian model averaging in EEG/MEG imaging. NeuroImage. 2004;21:1300–1319. [PubMed]
10. Friston K, Harrison L, Daunizeau J, Kiebel S, Phillips C, et al. Multiple sparse priors for the M/EEG inverse problem. NeuroImage. 2008;39:1104–1120. [PubMed]
11. Penny W, Stephan K, Mechelli A, Friston K. Comparing Dynamic Causal Models. NeuroImage. 2004;22:1157–1172. [PubMed]
12. Girolami M. Bayesian inference for differential equations. Theoretical Computer Science. 2008;408:4–16.
13. Vyshemirsky V, Girolami M. Bayesian ranking of biochemical system models. Bioinformatics. 2008;24:833–9. [PubMed]
14. Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009;6:187–202. [PMC free article] [PubMed]
15. Acs F, Greenlee M. Connectivity modulation of early visual processing areas during covert and overt tracking tasks. Neuroimage. 2008;41:380–8. [PubMed]
16. Stephan K, Marshall J, Penny WD, Friston K, Fink G. Interhemispheric integration of visual processing during task-driven lateralization. Journal of Neuroscience. 2007;27:3512–3522. [PMC free article] [PubMed]
17. Leff A, Schofield T, Stephan K, Crinion J, Friston K, et al. The cortical dynamics of intelligible speech. J Neurosci. 2008;28:13209–15. [PMC free article] [PubMed]
18. Summerfield C, Koechlin E. A neural representation of prior information during perceptual inference. Neuron. 2008;59:336–47. [PubMed]
19. Stephan K, Penny W, Daunizeau J, Moran RJ, Friston KJ. Bayesian model selection for group studies. Neuroimage. 2009;46:1004–17. [PMC free article] [PubMed]
20. Friston K, Harrison L, Penny W. Dynamic Causal Modelling. NeuroImage. 2003;19:1273–1302. [PubMed]
21. Friston K. Causal modelling and brain connectivity in functional magnetic resonance imaging. PLoS Biol. 2009;7:e1000033. [PMC free article] [PubMed]
22. Daunizeau J, Kiebel SJ, Friston KJ. Dynamic causal modelling of distributed electromagnetic responses. Neuroimage. 2009;47:590–601. [PMC free article] [PubMed]
23. Friston K. Bayesian estimation of dynamical systems: An application to fMRI. NeuroImage. 2002;16:513–530. [PubMed]
24. Buxton R, Uludag K, Dubowitz D, Liu T. Modelling the hemodynamic response to brain activation. Neuroimage. 2004;23:220–233. [PubMed]
25. Friston K, Mattout J, Trujillo-Barreto N, Ashburner J, Penny W. Variational free energy and the Laplace approximation. Neuroimage. 2007;34:220–234. [PubMed]
26. Chumbley J, Friston K, Fearn T, Kiebel S. A Metropolis-Hastings algorithm for dynamic causal models. Neuroimage. 2007;38:478–87. [PubMed]
27. Penny W, Kiebel S, Friston K. Variational Bayes. In: Friston K, Ashburner J, Kiebel S, Nichols T, Penny W, editors. Statistical Parametric Mapping: The analysis of functional brain images. London: Elsevier; 2006.
28. Beal M. Variational algorithms for approximate Bayesian inference. 2003. Ph.D. thesis, University College London.
29. Woolrich M, Behrens T, Smith S. Constrained linear basis sets for HRF modelling using Variational Bayes. NeuroImage. 2004;21:1748–1761. [PubMed]
30. Sato M, Yoshioka T, Kajihara S, Toyama K, Goda N, et al. Hierarchical Bayesian estimation for MEG inverse problem. NeuroImage. 2004;23:806–826. [PubMed]
31. Roberts S, Penny W. Variational Bayes for Generalised Autoregressive models. IEEE Transactions on Signal Processing. 2002;50:2245–2257.
32. Cronin B, Stevenson I, Sur M, Kording K. Hierarchical Bayesian modeling and Markov chain Monte Carlo sampling for tuning curve analysis. J Neurophysiol. 2010;103:591–602. [PMC free article] [PubMed]
33. Neal RM. Annealed importance sampling. Statistics and Computing. 2001;11:125–139.
34. Raftery A. Bayesian model selection in social research. In: Marsden P, editor. Sociological Methodology. Cambridge, Mass: 1995. pp. 111–196.
35. Chen CC, Henson RN, Stephan KE, Kilner JM, Friston KJ. Forward and backward connections in the brain: a DCM study of functional asymmetries. Neuroimage. 2009;45:453–62. [PubMed]
36. Howell D. Statistical methods for psychology. Duxbury Press; 1992.
37. Penny W, Mattout J, Trujillo-Barreto N. Bayesian model selection and averaging. In: Friston K, Ashburner J, Kiebel S, Nichols T, Penny W, editors. Statistical Parametric Mapping: The analysis of functional brain images. London: Elsevier; 2006.
38. Madigan D, Raftery A. Model selection and accounting for uncertainty in graphical models using Occam's window. Journal of the American Statistical Association. 1994;89:1535–1546.
39. Clyde M, Parmigiani G, Vidakovic B. Multiple shrinkage and subset selection in wavelets. Biometrika. 1998;85:391–402.
40. Visser M, Jefferies E, Ralph MAL. Semantic Processing in the Anterior Temporal Lobes: A Meta-analysis of the Functional Neuroimaging Literature. J Cogn Neurosci: Epub ahead of print 2009 [PubMed]
41. MacKay DJC. Bayesian non-linear modeling for the prediction competition. In: GR H, editor. Maximum Entropy and Bayesian method. Santa Barbara: Kluwer Academic Publisher; 1993. pp. 221–234.

Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...