- Journal List
- NIHPA Author Manuscripts
- PMC3148097

# How each movement changes the next: an experimental and theoretical study of fast adaptive priors in reaching

## Abstract

Most voluntary actions rely on neural circuits that map sensory cues onto appropriate motor responses. One might expect that for everyday movements, like reaching, this mapping would remain stable over time, at least in the absence of error feedback. Here we describe a simple and novel psychophysical phenomenon in which recent experience shapes the statistical properties of reaching, independent of any movement errors. Specifically, when recent movements are made to targets near a particular location, subsequent movements to that location become less variable, but at the cost of increased bias for reaches to other targets. This process exhibits the variance-bias tradeoff that is a hallmark of Bayesian estimation. We provide evidence that this process reflects a fast, trial-by-trial learning of the prior distribution of targets. We also show that these results may reflect an emergent property of associative learning in neural circuits. We demonstrate that adding Hebbian (associative) learning to a model network for reach planning lead to a continuous modification of network connections that biases network dynamics toward activity patterns associated with recent inputs. This learning process quantitatively captures the key results of our experimental data in human subjects, including the effect that recent experience has on the variance-bias tradeoff. This network also provides a good approximation to a normative Bayesian estimator. These observations illustrate how associative learning can incorporate recent experience into ongoing computations in a statistically principled way.

Experience has a profound effect on nearly all human behavior. For goal-directed behaviors such as reaching, the neural circuits that map sensory cues onto motor output continuously adapt to maintain behavioral stability in the face of external perturbations or internal noise (Thoroughman and Shadmehr, 2000; Scheidt et al., 2001; Cheng and Sabes, 2007; Shadmehr and Mussa-Ivaldi 1994; Diedrichsen et al. 2010; Kluzik et al. 2008; Smith et al. 2006). There is evidence that these adaptive processes follow Bayesian statistical principles (Slijper et al., 2009; Kording and Wolpert, 2004; Wei and Kording 2010), so that behavior is guided by both current sensory signals and a prior expectation of those signals derived from experience. Similar observations have been made in a variety of perceptual and cognitive processes (Weiss et al. 2002; Miyazaki et al., 2005; Knill, 2007; Sato et al., 2007; Kersten et al., 2004; Lu et al., 2008; Lages and Heron, 2008), suggesting that Bayesian principles may capture a general property of neural computation. Despite this wealth of experimental evidence, it is not well understood how neural circuits could learn such priors from recent experience.

To better understand how behavior is shaped by recent actions, we investigated a novel form of experience-dependent learning using visually-guided reaching. We found that the sensorimotor system appears to maintain a prior expectation for movement planning that is continually updated based on the sequence of recent reaches. This phenomenon is consistent, at the qualitative level, with adaptive Bayesian estimation. We next explored our hypothesis that the activity of a sensorimotor network could itself create and maintain such priors via Hebbian learning. Specifically, we propose a model in which ongoing activity continuously modifies the structure of synaptic connections within a competitive neural network. These changes bias the network dynamics towards recent activity patterns, effectively creating a prior on the network computations. We show that this simple model accurately captures the results from several novel behavioral experiments, suggesting a potential mechanism for adaptive Bayesian estimation.

## Materials and Methods

### Experimental Procedures

A total of 24 healthy, right-handed participants were tested (10 female, age range: 18–32 years) Subjects were paid for their participation and were naïve to the purpose of the experiment. All the experimental procedures were approved by the UCSF Human Research Protection Program. Subjects performed a series of trials in which they reached to visual targets in a virtual feedback apparatus (Sober and Sabes, 2003;Fig 1a). At the beginning of each trial, subjects placed the tip of their right index finger at a central start location positioned approximately 29 cm in front of the midline of their chest. We used an arrow-field paradigm to guide their finger to the start location without providing visual information about absolute position (Sober and Sabes, 2005). After the start location was reached and after a variable delay (500–1500 ms), the target appeared 12 cm from the start location (unfilled circle, 15 mm in radius) and a “go tone” was played. Subjects were instructed to move as soon as possible and reach to put their finger in the center of the target circle. Once the finger had moved a quarter of the distance to the target, continuous feedback of finger position was displayed (filled white circle, 5 mm radius). Trials were terminated when the finger remained still in the target for 200 ms. At the end of each reach, participants received feedback in the form of a bonus score designed to encourage subjects to execute a quick, single and accurate reach. The score was based on both reaction time and the distance between the target center and the location where the finger first decelerated below 25 mm/sec. No bonus was given and a warning message appeared when the peak tangential velocity fell below 650 mm/sec or rose above 950 mm/sec.

**...**

Participants in Experiment 1 (n = 8, 3 female) were tested on eight blocks of 110 trials in a single session. Each block began with 10 context trials, followed by a randomized sequence of 80 context and 20 probe trials. Probe trial targets were fixed at *θ* = 150° (relative to rightward axis) for all trial blocks. For the context trials, target angles were selected from a different distribution in each trial block, either the “repeated target” condition with all trials at the probe-target location, a normal distribution of targets with standard deviation σ_{Target} = 1°, 2° 3°, 5°, 10° or 15°, or a uniform distribution of targets on the circle.

Participants in Experiment 2 (n=8, 3 female) were tested on six blocks of 90 trials in each of four experimental sessions. Each block began with 10 context targets, followed by two repetitions of each probe target (14 trials) randomized with 66 context trials. A single context-target distribution was used for each session, either the repeated target condition, a normal distribution with σ_{Target} = 7.5° or 15°, or a uniform target distribution. Seven probe target locations were used in this experiment. These were defined with respect to the repeated target location: *θ−θ*_{repeat} = 0°, ±30°, ±60° or ±90°. Half of the subjects were tested with *θ*_{repeat} = 150°, half with *θ*_{repeat} = 60°.

Participants in Experiment 3 (n=8, 4 female) were tested on 6 blocks of 120 trials. Targets were presented in sequential order, starting at 0° and proceeding in 3° increments around the circle. Trial blocks had either clockwise or counter-clockwise target rotation, 3 blocks per condition with order randomized.

### Data Analysis

Movement trajectories were obtained from the position of an infrared LED located on the right index fingertip. Generally, positional data were not smoothed before analysis. However on approximately 5% of trials there were missing data samples due to obstructed view of the fingertip LED. For these trials, the missing datapoints were interpolated using a cubic spline method (*spline* in Matlab). However, removing these trials appeared to have no qualitative affect on the final results.

The initial movement angle *θ*_{MV} was computed as the angle between fingertip position at the beginning of the reach and its position 100 ms after reach onset. This timing was chosen because it samples the movement before feedback-control mechanisms can affect reach dynamics. Since the variability of initial movement angles for a given target *θ* is always small compared to the full circle, circular statistics are not needed; therefore, the standard deviation of initial angles, *σ*_{MV}, was computed in the usual linear way; analysis with circular statistics yielded nearly identical results (not shown). For each subject, Grubbs’ test (Grubbs, 1950) was used to identify and exclude outlier trials from analysis (0.11% of trials excluded, with no more than three trials total excluded per subject). In order to quantify the movement error in the direction of the center probe target, *θ*_{repeat}, we defined the bias (toward *θ*_{repeat}) in terms of the angular errors for each pair of probe targets located at the same distance δ from the center target position,

In Experiment 3 we estimated the mean angular error, *θ _{MV}−θ_{targ}*, separately for blocks with clockwise and counter-clockwise target rotations.

### Normative Bayesian Model

We quantified the predictions of a simple Bayesian estimator (Figure 3) by computing the MAP (maximum *a posteriori)* estimate for a normal prior and likelihood with parameters fit to the experimental data. Bayes’ rule states

**...**

The true target *θ* gives rise to a sensory signal *x*, and we model the likelihood of *θ* given *x* as a normal distribution centered on *θ* with variance σ^{2}_{Likelihood}. The prior is modeled as a normal distribution with mean and variance σ^{2}_{Prior}. In this case, the peak of the posterior distribution, i.e. the MAP estimate of *θ*, is given by

where ${\sigma}_{\mathit{Posterior}}^{2}={({\sigma}_{\mathit{Prior}}^{-2}+{\sigma}_{\mathit{Likelihood}}^{-2})}^{-1}$. If we assume that the mean of the prior is known without uncertainty, then the variance of the MAP estimate is given by

In order to compare the normative Bayesian model with empirical data, we assume that the empirical initial angle is a direct readout of *θ _{MAP}* and that the prior distribution accurately reflects the distribution of context trial targets (i.e., =

*θ*

_{repeat}and ${\sigma}_{\mathit{Prior}}^{2}={\sigma}_{\mathit{Context}}^{2}$). The only remaining free parameter of the model is

*σ*, which we fit to the empirical data using Equation 4. This was done separately for each participant in Experiment 1 (maximum likelihood fit using

_{Likelihood}*fmincon*in Matlab). With this procedure, we obtained a mean

*σ*= 7.2°. We then predicted the mean bias in Experiment 2,

_{Likelihood}*θ*, using the Equation

_{MAP}−θNote, however, that this model and the fit value of *σ _{Likelihood}* are only meant to provide qualitative comparisons to the data, as the assumptions that go into the model, in particular that the prior variance matches the context variance, are unlikely to be correct, as described below.

### Adaptive Bayesian Model

The normative Bayes model described above does not address how priors are learned or the rate of this learning. In order to estimate subjects’ learning rates, we also implemented a simple adaptive version of the Bayesian model. For each trial *n*, the MAP estimate of the target location is computed using Equation 3, where the mean and variance of the prior are iteratively updated, given the current input *θ _{n}*, using the update rules

where *β* [0,1]β is the learning rate. The free parameters *β* and
${\sigma}_{\mathit{Likelihood}}^{2}$ were fit to the experimental data, minimizing the square error between the per-trial MAP estimates of target location and subjects’ movement errors (*fmincon* in Matlab). These fits were performed separately for each subject and each experimental session in Experiment 2 (excluding the Uniform condition). Cases where the fitted learning rate for a particular subject and session was either the maximum or minimum allowed value (0.999 and 0.001 respectively) were excluded from subsequent analysis (4 of 24 fits). With this procedure, the mean estimate for *σ _{Likelihood}* was 10.0° (6.9° standard deviation), slightly larger than that found with the normative Bayesian model.

### Adaptive Network—Model Architecture

The network architecture and dynamics largely follows an approach used previously (Deneve et al, 1999, 2001, 2007; Latham et al., 2003; Pouget et al., 2002) and were designed to yield line-attractor dynamics. The network consists of a single layer of 180 neurons that receive activation from both exogenous inputs and from lateral connections. On simulated trial *n*, the mean input activation to neuron *i* is a function of the difference between the target location *θ*(*n*) and the “preferred direction” of the neuron
${\theta}_{i}^{\ast}$:

where *ω* determines the spread of input activation across the population (full-width at half-max, or FWHM), *γ* is the input gain, and *I*_{0} is the baseline input rate. The preferred directions were evenly spaced at Δ= 2^{−} intervals. For network simulations, the dynamics of the network were iterated five times for each trial. The input activation for unit *i* on iteration *t* is given by *I _{i,t}* = |

*Ī*+

_{i}*η*|

_{i,t}_{+}, where |

*x*|

_{+}is the positive part of

*x*and the vector

*ξ*of noise terms is a mean-zero multivariate Guassian with covariance matrix

In words, the noise has a Fano factor *F* (ratio of variance to mean), a correlation coefficient *ρ* between nearest-neighbor units, and a correlation coefficient for other pairs that falls off with the distance between the two preferred directions, with a FWHM of . With the parameter values used in our simulations (see Table 1), we observed an average neuron-neuron correlation coefficient of 0.10 in the input activations.

In addition to the input activation, units received recurrent activation via lateral connections within the network. The initial connection strength between a pair of units is determined by the distance between the units’ preferred directions in the circular space:

where *ω _{L}* is the FWHM. On each iteration

*t*, the total activation for unit

*i*was then computed as the sum of the input and recurrent activations:

where *X _{i}*

_{,}

_{t}_{−1}are the firing rates on the previous iteration, with initial values

*X*

_{i}_{,0}= 0. Total activation was normalized on each iteration to obtain the new firing rates:

where the parameters *a* and *b* were set to the values 0.002 and 0.001, respectively.

After each iteration, the recurrent weights *W _{ij}* were updated using a normalized Hebbian learning rule (Oja 1982):

where *β* is the learning rate and *α* is the normalization parameter (“Oja’s alpha”).

At the end of each trial, the estimated movement vector (*θ _{MV}*) was calculated using a population vector decode (Georgopolous et al. 1988):

We selected to use five iterations per simulated trial because this decode converged after about 3–4 network iterations.

### Adaptive Network—Model Fitting

The network model parameters used in our simulations are shown in Table 1. These values were obtained by fitting the model to the context-dependent changes in movement variance observed experimentally. Specifically, we fit the model to the variance curves in Figure 2A (Experiment 1) and Figure 5A (Experiment 2) using the full-experiment simulations (see below); the parameters were selected to maximize the sum of the R^{2} values for each plot. For computational efficiency, this optimization was carried out in three steps. First, using a reduced network with N=90 neurons, we ran a large number of full-experiment simulations with random parameter values (approximately 3000 runs). We then qualitatively determined the subset of the parameters that did not correlate well with the goodness of fit: the baseline firing rate *I _{0}*, Fano factor

*F*, and the Oja’s α. In the second stage, we ran a large number of optimization runs for the remaining parameters on the N=90 network with different initial conditions, using a generic nonlinear optimization routine (Matlab’s

*fmincon*). From all these runs, we selected the best N=90 network. Finally, we increased the number of neurons from 90 to 180 and re-optimized the parameters from the best network, with the scale-dependent parameters

*τ*and

*β*adjusted inversely proportional to network size.

**...**

### Adaptive Network – Simulations

#### Block-wise simulations

We first measured the effect of context target distribution by independently simulating blocks of trials with different input distributions. Each simulation consisted of a series of 100 training trials with the inputs *θ*(*n*) drawn from a given distribution followed by a series of probe trials in which learning was turned off (*β*= 0). All context distributions were centered at 0°, which we refer to as the “repeated target angle”. Probe inputs were presented from −140° to +140° in 20° intervals and each input was repeated 100 times. From each simulation, we computed the bias and trial-by-trial variance of the network output *θ _{MV}* for the probe targets. For each context distribution, we repeated this simulation 50 times and computed the average variance and bias. The results are presented in Figure 2.

#### Full-experiment simulations

For each behavioral experiment, we ran the network with the exact sequence of targets that each human subject experienced. The sequence included both context and probe trials. In order to allow for some unlearning between trial blocks, the network weights were decayed back towards their initial values, ${W}_{ij}^{0}$ (Equation 12), after each block:

where the parameter *ε* determines the degree of unlearning. The network output was analyzed in the same manner as the experimental data.

### Comparison of Adaptive Network and Normative Bayesian Models

Lastly, we quantified how closely the network model approximates the normative Bayesian model. Specifically, we set out to ask whether the bias and variance of the network output changes as predicted by the Bayesian model as a function of the context and likelihood variances. The network was trained using the block-wise protocol described above with four different context distributions and tested with a series of probe targets and with different input gains, γ (Figure 8C–F).

**...**

We first equate output of the network to the MAP estimate of the Normative Bayesian model that it effectively implements. Thus, the output variance of the network for a given gain *γ* and context,
${\sigma}_{\mathit{Output}\mid \gamma ,{\sigma}_{\mathit{Context}}}^{2}$, is a measure of the
${\sigma}_{\mathit{MAP}}^{2}$ for those parameters. The effective Likelihood variance for gain *γ*,
${\stackrel{\u2322}{\sigma}}_{\mathit{Likelihood},\gamma}^{2}$, was thus determined by measuring the output variability of the network in the absence of a prior, i.e. with the network trained in the uniform context. The effective variance of the prior for a given training context,
${\stackrel{\u2322}{\sigma}}_{\mathit{Prior},{\sigma}_{\mathit{Context}}}^{2}$, was then computed from Equation 4 using the network simulations with gain *γ*= 60. In summary, the procedure for determining the effective parameters is given by the following two expressions:

We use these parameters to predict the variance and bias of the network for other combinations of gain and context variance (Figure 8C–F).

## Results

### How past actions shape future behavior

Experience-dependent changes in reach planning were evaluated using the well-studied paradigm of “center-out” reaching to visual targets that were arrayed radially about a fixed starting point (Figure 1A). Across blocks of trials, we manipulated the statistics of recent movements by varying the probability distributions of the target angles for the majority trials within a block, i.e., for the “context trials” (90 per block in Experiment 1, 76 in Experiment 2). If recent experience shapes the sensorimotor map, then we expected that changing the context target distribution should affect both the precision and accuracy reach planning. We evaluated such changes using a fixed set of “probe” targets that were randomly interleaved with the context trials (20 per block in Experiment 1, 14 in Experiment 2). Our principle measures of reach planning are the mean of the initial reach direction (θ_{MV}, for “movement vector”, Figure 1A) and its standard deviation across trials (σ_{MV}). These measures are made 100ms after movement onset, before feedback can affect the action (Desmurget and Grafton, 2000).

In the first experiment, we measured how the variance of initial reach directions to a single probe target (grey circle, Figure 1A) depended on the spread of context targets about the probe location. Eight different target distributions were used, ranging from the *Repeated* target condition, with a single target location for all probe and context reaches (Figure 1B), to the *Uniform* condition, with context trial targets selected uniformly about the circle (Figure 1D). The remaining conditions had normally distributed target angles, with the mean at the probe target and with different standard deviations (Figure 1C). We found that the variance of reaches to the probe target changed across conditions (repeated measures, F(7, 49) = 4.33, p < 0.001), with less variable contexts generally leading to less variable probe reaches (Figure 2A). These data show that the repetition of similar reaching movements improves performance on those actions, i.e. “practice makes perfect” on a short timescale.

In a second experiment, we measured how the distribution of context targets affects the mean movement error (bias) at an array of probe targets (white circles, Figure 1A). When context reaches are all made to a single target, in this case the center probe target (gray circle), movements to the other probe targets are strongly biased “inwards” towards the center position, compared to trial blocks with uniformly distributed context targets (Figure 2B). This bias is stronger for probe locations further from the center target position, and this effect attenuates as the distribution of context targets becomes more variable (repeated measures Context × Target interaction: F(9,63) = 5.20, p < 0.001). These results show that the reduced movement variability for repeated target directions comes at the cost of increased movement bias for other target directions.

### Experience-dependent changes, not cognitive strategy

We will argue below that the experience-dependent changes in reaching shown in Figure 2 are the result of an automatic learning process. However, it is also plausible that these effects reflect a high-level strategy, where subjects simply predict future target positions given recent history. To distinguish these possibilities, we conducted a third experiment in which future targets are predictable from recent reaches yet different from them. Within each block of 120 trials, targets were presented sequentially around the circle in either the clockwise (CW) or counter-clockwise (CCW) direction, stepping in 3° increments. Subjects were aware of this pattern and could have predicted the location of the next target, resulting in little or no direction-dependent errors. In contrast, if future movements are always biased towards recently presented targets, then subjects should demonstrate a CW bias during CCW trial blocks and vice versa. Indeed, significant direction-dependent errors were observed (T(7) = 3.28, p < 0.025; Figure 2C), indicating that these experience-dependent effects in Figure 2 are not the result of predictive, cognitive strategies.

### Bayesian Interpretation

The tradeoff we observed between variance and bias can be qualitatively understood within a Bayesian framework for reach planning. In this framework, sensory signals, *x*, give rise to a likelihood function for the current target position, *L*(*θ; x*). Following Bayes’ rule, this likelihood is combined with a prior expectation of the target, taking the form of a probability distribution *p*(*θ*), to yield a posterior distribution *p*(*θ*|*x*). The peak of *p*(*θ*|*x*) is the *maximum a posteriori* (MAP) estimate of the target location and is used to select the appropriate motor response (Fig, 3a; see Methods). If *p*(*θ*) is an adaptive prior, then the bias and variance of reaching will reflect recent movement statistics. For example, when repeated movements are made to the center probe target, there is an increasing expectation that future movements will also be made in that direction; i.e., the prior probability distribution tightens about the center target. A tighter prior decreases the variance of the MAP estimate, but also biases it towards the center target. The width of the prior distribution modulates the degree of these effects (Figure 3B,C).

The variance and bias effects of the simple Bayesian model follow the same trends as those observed experimentally, however, the detailed shapes of these curves are not the same (compare Figs. 3B,C with Figs. 2A,B). More sophisticated Bayesian estimation models can capture some of these details. For example, a “robust” Bayesian models (Kording and Wolpert, 2004; Knill, 2007) would predict that bias scales less than linearly with target distance, as seen in our data (Figure 2B). However, we show later that the details of Figure 2 can be largely explained as the result of an iterative learning process acting on the specific order of trials used in our experiments.

### Trial-by-trial learning rates

If the variance-bias tradeoff in Figure 2 arises from a process of adaptive Bayesian estimation, then we should be able to see the effects of learning evolve over time. Figure 4 illustrates how the average reach bias evolves for the ±90° probe targets over the course of an experimental session in Experiment 2. We did not observe a large increase in bias across the session, although there is a trend for a slight increase within the first trial block, at least for the repeated target and 15° contexts. While these data might seem inconsistent with an adaptive Bayesian model, it is important to note that before the first ±90° probe target of a trial block occurred, a minimum of 10 context trials and an average of over 22 context trials had already taken place. If learning occurs on relatively fast timescale, then it would have already approached the asymptotic value for that context by the time of the first probe trials.

In order to measure the effective learning rates in our experiments, we used a simple incremental learning algorithm to model an adaptive Bayesian estimator. After each trial, the algorithm updates its estimate of the prior distribution, specifically the mean and the variance
${\sigma}_{\mathit{Prior}}^{2}$, based on that trial’s target location (*Methods*, Equation 6). The model includes two free parameters, the Likelihood variance
${\sigma}_{\mathit{Likelihood}}^{2}$ and a learning rate *β* that determines the weight given to the last trial in update algorithm. In the limit of *β* =0, the system is not adaptive and the initial conditions for and
${\sigma}_{\mathit{Prior}}^{2}$ are used for all trials. In the limit of *β* =1, the estimates have no “memory”, e.g., the mean of the prior is set to the target on the previous trial. We fit this model separately to the data from each subject and session in Experiment 2 (excluding sessions with the Uniform context), minimizing the sum-squared prediction error for movement angle. The best-fit learning rates show a large degree of heterogeneity across subjects, with a median value of 0.25 and a positive skew (standard deviation = 0.27, mean = 0.29). Still, learning was generally fast. For example, with the median learning rate, the estimated mean of the prior would reach 66% of its asymptotic value within four trials. The presence of such fast learning rates explains why we see little change in the measured bias across a session (Figure 4).

### The effect of trial-by-trial learning on movement bias and variance

The adaptive Bayesian model provides a much better account of the mean bias data than the simple the normative model, predicting a lower magnitude for the bias and better capturing the dependence of the bias on probe distance (compare Figure 5A to Figure 3C). This difference is not due to the difference in fit likelihoods for the two models (mean σ = 7.2° for the normative model and σ = 10.0° for the adaptive model), since the larger mean variance used in the adaptive model would only increase the magnitude of the bias. Rather, the improvement in fit is due to the effects of trial-by-trial learning and the actual sequence of targets experienced by the subjects. In particular, the presence of the probe trials prevents the prior distribution from converging to the context target distribution.

The influence of trial-by-trial learning can also be seen in the variability of movements. A notable feature of the experimental data is that the reach variability in Experiment 2 depends significantly on the interaction between the context and the target location (Figure 5B; repeated measures Context-×-Target, F[9,63] = 2.26, p = 0.029). In the context of the normative Bayesian model, this result is unexpected, since the model predicts that reach variance should decrease monotonically with context variance, independent of target location (Equation 4 and Figure 3B). We observed the predicted trend at the repeated target location for both Experiment 1 (Figure 2A) and Experiment 2. However, the opposite trend was seen for probe targets further away from the repeated target location, i.e., an *increase* in reach variability was observed as the context distribution narrows (Figure 5B, solid lines). The increased variability is not simply due a large change in bias during the early phase of learning (see Figure 4), as the patterns are qualitatively unchanged when the data are analyzed separately for each of the six trial blocks in the session (not shown).

These results can be understood as the result of the trial-by-trial learning process. Under the normative Bayesian model (Equation 3), the MAP estimate of target location is given by the expression

where is the mean of the prior and *x* is the target estimate based on the sensory input alone. Learning-dependent fluctuations in and
${\sigma}_{\mathit{Prior}}^{2}$ would give rise to additional trial-by-trial variability in *θ _{MAP}*, beyond t hat predicted by the normative Bayesian model. If learning occurs on a short timescale, then such fluctuations are expected even after learning nominallyn asymptotes for a given context. The second term in Equation 16 is linear in the target distance −

*x*, and so one can show that the additional output variance due to fluctuations in ${\sigma}_{\mathit{Prior}}^{2}$ should increase quadratically with the target distance. Specifically, under the normative Bayesian model, the additional output variance due to trial-by-trial fluctuations in the prior is given by,

where θ reflects the true value of the target (i.e., the mean value of ** x**) and the parameters k

_{1}and k

_{2}only depend on the current values of the Likelihood and prior variances. As predicted, the experimental data show a roughly quadratic increase in movement variance with target location (Figure 5B, solid lines). Furthermore, the adaptive Bayesian model reproduces the variance effects seen in our data from Experiment 2 (Figure 5B, dashed lines).

The adaptive Bayesian model also provides a much better account of movement variance at the center target than the simple the normative model (compare Figure 5C to Figure 3B). While the dependence on context is comparable for the two models, only the adaptive Bayesian model provides an accurate estimate of the lower-bound on movement variance. Specifically, the simple normative model predicts no residual variability in the repeated target case, while the adaptive Bayesian model accurately predicts this value. Again, this difference is due to the effects of trial-by-trial learning. While the nominal target is at the same location on every trial, sensory noise injects trial-by-trial variability into the parameters of the prior distribution and prevents the variance of the prior from converging to zero.

Together, these results suggest that the context-dependent changes in reach variance and bias are indeed the effect of a trial-by-trial learning process. These findings also illustrate that the process of learning can itself be responsible for a large portion of the trial-by-trial variability observed in sensorimotor tasks (Cheng and Sabes 2006, 2007), even in a case such as this where the apparent goal of learning is the reduction of movement variability.

### Adaptive network model

While the adaptive Bayesian model captures many of the key features of the experimental data, it does not address the underlying mechanism for learning. In particular, we are interested in discovering candidate neural mechanisms that can link normative models of learning to the neural circuits that control behavior. Here we propose a parsimonious approach to adaptive Bayesian estimation within cortical sensorimotor circuits, similar to that proposed in previous studies (Wu et al. 2002, 2003; Wu and Amari 2005).

Consider a recurrently connected network of neurons (Figure 6A) with dynamics that allow it to efficiently extract information from noisy inputs (Pouget et al., 1998; Deneve et al., 1999, 2001; Latham et al. 2003). On each simulated trial, neurons receive input activation that is determined by the current target angle, the neuronal tuning curves (mean activation vs. target angle), and correlated noise (Wu and Amari, 2002), features that are consistent with physiological observations of sensorimotor cortex (e.g., Burnod et al., 1999; Georgopolous et al., 1986). The pattern of activity across the network is driven by both the input activation and recurrent activation between neurons with similar tuning curves. At the end of each trial (5 iterations of the network dynamics), the planned movement direction is read out from the pattern of activity across the network using a population vector decoder (Georgopolous et al., 1986,Georgopolous et al., 1988).

**...**

In order for the network to learn from experience, a normalized Hebbian learning rule (Hebb, 1949; Oja, 1982) is applied to the recurrent connections so that the changes in connectivity between any two units reflects the trial-by-trial correlations in their firing rates. On every trial this learning rule acts to strengthen connections that give rise to the pattern of activity associated with the current movement direction, slightly biasing the dynamics of the network towards that pattern. We expected that repeated presentations of a narrow range of targets would strengthen the associated patterns of activity, thereby creating an effective prior on subsequent trials.

We first tested this idea by examining whether the variance and bias of the network change with the statistics of recent experience in a manner similar to that observed experimentally. For each simulated trial block, the network was initialized with a weight structure that has been shown to yield nearly optimal outputs for the case of a flat prior, i.e. that approximates maximum likelihood estimation (Pouget et al., 1998; Latham et al. 2003), and with network parameters fit to the experimental data (see Methods). The network was then trained on a set of context trials with the input target angle drawn from one of the distributions used experimentally (see ‘Block-wise simulations’, Methods). After training, learning was turned off and a series of simulated probe trials was performed in order to measure the trial-by-trial variance and bias of the network output.

As expected, the distribution of training targets affects both the variance and bias of the network output. After training with repeated inputs to the same target, i.e., 0° σ, the network’s output variability is greatly reduced compared to the case of training with uniform targets, while intermediate training distributions lead to intermediate output variability (Figure 6B). Training with repeated inputs also resulted in a marked bias towards the repeated target (Figure 6C). These effects become smaller with wider training target distributions.

### Simulated experiments with the adaptive network model

While the block-wise network simulations in Figure 6 show the same variance and bias trends as the behavioral data, they do not account for the finer details of the data. This is perhaps to be expected. In the real experiment, learning is never “turned off”, so the probe trials themselves contribute to the learned effects. Also, the influences of learning can carry over between blocks, either because learning is slow compared to the timescale of a block or due to the presence of multiple timescales of sensorimotor learning (Smith et al., 2006; Kording et al., 2007; Avillac et al., 2005). This makes the results dependent on the specific ordering of trials and conditions.

This effect was particularly important for the variance experiment, because the distribution of context targets differed across blocks within the same experimental session. Thus, information learned in one block could carry over and influence performance in future blocks. We therefore conducted network simulations with the exact sequence of trials that subjects experienced in each session. Because subjects were allowed to rest between trial blocks and learned information may have been lost during this delay (Kording et al., 2007), we also allowed for some unlearning between simulated blocks, with the network weights partially decaying back to initial values.

With these full-experiment simulations, we found that the network was able to reproduce the human psychophysical data with good accuracy. The model matches much of the apparent “noise” in variance-by-context effects in Experiment 1 (Figure 7A), suggesting that these features can be explained by the specific sequence of trials and blocks used in our experiment. For example, the relatively low variance in the uniform context condition appears to be due to the fact that this condition was often presented in one of the last two blocks of the session, when the cumulative effects of learning were greatest. The model also naturally captures the pattern of context and target-dependent variability that we observed in Experiment 2 (Figure 7C). Finally, even though the network parameters were fit on the reach variance data, the network model is able to predict the pattern of reach biases that were observed in both Experiment 2 (Figure 7B) and Experiment 3 (Figure 7D).

**...**

The good match between the data and the adaptive network model for the variance in Experiment 2 (Figure 7C) suggests that the effective learning rate of the network model is also in the same range as that observed experimentally. We explored this issue further by looking at how the within-session changes in network bias compare to the changes in bias observed in Experiment 2 (Figure 7E). The network provides a good match to the data for wider context distributions. However, as the context variance becomes smaller, there are more pronounced learning-dependent increases in bias across trial block for the adaptive network model than for the experimental data, particularly during the repeated target condition. This difference is offset by a larger bias in the initial trial block for the experimental data, suggesting that the effective learning rate in the model is slower than that observed experimentally. As noted in the Discussion, we hypothesize that this difference results from the fact that the network does not include a sufficiently strong stabilizing force that would cause learning to asymptote.

### Adaptive Bayesian priors emerge from Hebbian learning

We have just shown that the network model can accurately emulate the experience-dependent changes in reach variance and reach that we observed behaviorally. In the network, these changes arise from modifications to the recurrent connections, represented by the matrix of connection weights, with matrix element *(i,j)* representing the connection strength from unit *j* to unit *i* (Figure 8A). In the Repeated target condition, the activity pattern representing the repeated, 0° target is reinforced by Hebbian learning on every trial, causing an increase in the connection strengths between units with PDs near this target (see Figure 8B). Similar, but attenuated, changes are apparent when the network is trained with broader target distributions centered at 0°. The enhancement in recurrent connectivity around the 0° unit effectively shifts the energy landscape of the network, deepening the basin of attraction in that region.

These learning-related changes in recurrent connections, and the resulting changes in network dynamics, alter the variance and bias of the network output. We characterized these effects with a range of input gains following training with different target distributions and then compared the network performance to the predictions of a matched normative Bayesian model. The matched value of σ^{2}_{Likelihood} for each input gain was determined from the output variance of the network trained on the uniform context condition (Figure 8c, gray curve). The matched values of σ^{2}_{Prior} were then determined from the network outputs in the 60 Hz input condition (Figure 8c, vertical gray bar). Details of these computations are given in the Methods.

Figure 8C shows the network output variance as a function of training context and testing input gain. At lower input gains, the network and Bayesian models diverge sharply. This difference arises from the fact that in the normative Bayesian model the prior distribution is known, and so there is no noise in estimating its mean (see Methods). Thus, when the input gain is low, σ^{2}_{Likelihood} is high and the MAP estimate is dominated by the noise-free prior. In contrast, the network never achieves a noise-free prior because there is persistent and stochastic baseline input (see Methods). As a result, the network exhibits monotonically increasing output variance as the input gain is reduced. If, however, we reduce the level of baseline noise (from 5Hz in the network simulation to 1 Hz), then the network model provides a very close approximation to the matched normative Bayesian model (Fig 8D). Furthermore, the same Bayesian model provides a close match to the network bias across training context and input gain (Figure 8E), even though the parameters of the Bayesian model were determined using only the network variance. These simulations show that the adaptive network model can provide a close approximation to an ideal Bayesian estimator.

## Discussion

We have shown that visually-guided reaching exhibits a novel form of experience-dependent learning where the statistics of recent movements affect future actions. This learning produces a variance-bias tradeoff that is qualitatively consistent with the process of Bayesian estimation. In particular, we show that repeated performance of movements with similar goal parameters (i.e. target locations) improves the precision of subsequent actions with these goals, resulting in a short-term version of “practice makes perfect.” However, this advantage comes at the cost of reduced accuracy for movements with dissimilar goals.

Others have reported evidence for Bayesian processes in sensorimotor control, primarily reflected as changes in movement bias (Kording and Wolpert, 2004; Miyazaki et al., 2005; Kording et al., 2007) or learning rates (Wei and Kording, 2010; Huang and Shadmehr, 2009). Here we show that such Bayesian affects can be observed even in the absence of the feedback perturbations used in previous studies (Kording and Wolpert, 2004; Wei and Kording, 2010). We show that changes in both movement bias *and* variance are consistent with the presence of an adaptive prior. In particular, an iteratively adaptive Bayesian model can capture many of the key features of our experimental data.

One limitation of the experimental design used here is that we cannot determine the stage at which these bias and variance effects occur, e.g., target selection, movement vector estimation or both (Sober and Sabes, 2003). A recent report by Diedrechsen at al. (2010) shows a potentially related form of “use-dependent learning.” In one of their experiments, passive movements were made to one side or another of an elongated target. This resulted in a bias toward the same side of the target during subsequent voluntary movements. Because their “training” movements were passive and the target was not precisely defined, it seems likely that the effect is movement-dependent, not target-dependent. To the extent that the same mechanism is at play in our study, our effect is also likely to be at least partly movement-dependent.

With the adaptive Bayesian model we have argued that these effects are the result of a trial-by-trial learning process. With this model we were able to estimate the rate of learning and illustrate features of the bias and variance effects that likely result from the presence of trial-by-trial learning. We then showed that Hebbian learning in a simple network model makes a plausible candidate for the mechanism underlying this adaptive estimation process, providing a quantitative match to the experience-dependent changes in movement variance and bias that we observed experimentally (Figure 7A–D). It is important to note that the adaptive Bayesian model and the network model serve very different purposes in this paper. Therefore they were fit differently to the data and have different numbers of free parameters. Thus, the goodness-of-fit of the two models cannot be directly compared and we do not claim that one provides a better account of the data than the other. An important point of comparison, however, is the effective learning rates of the two models. The learning rates estimated from adaptive Bayesian model are fast, consistent with the evolution of the reach bias across blocks in Experiment 2. In contrast, the best-fit network model appears to have a slower effective learning rate (Figure 7E). This slower rate is most likely due to the fact that the model does not include a sufficiently strong stabilizing force that causes learning to asymptote in the way the empirical learning does (Figure 4). This may simply result from the choice of learning rule: Oja’s rule guarantees stability for the square-norm of the weights converging onto a cell, but only in the asymptotic limit. A detailed study of weight normalization in this model and its effects on trial-by-trial learning are left to future work.

Finally, we have shown that the network model can implement a close approximation to a normative Bayesian estimator. Other network models have been proposed for how Bayesian priors can be incorporated into cortical networks (Pouget et al., 2003; Deneve and Pouget 2004; Ma et al., 2006; Wu et al., 2003; Wu and Amari, 2005). In many of these models, priors are explicitly represented by a separate set of units whose inputs act as an additional source of information. The adaptive network approach offers two advantages. First, the prior naturally emerges within the network connections, removing the need for another set of units to encode this information. Second, Hebbian learning provides a simple mechanism by which these priors can be learned. A similar approach has been previously explored by Wu and colleagues (Wu et al., 2003; Wu and Amari, 2005). They showed that Hebbian learning in a very similar network model acts to smooth stochastic input signals over time, reducing output variance when the input is stationary at the cost of slow reaction to changing inputs (i.e. increased bias). This smoothing approximates a form of Bayesian estimation, a simple Kalman filter. We have extended these results, showing that a recurrent network with Hebbian learning approximates Bayesian estimation across a range input statistics and on behaviorally relevant time-scales. Furthermore, we showed that the network output provides a good quantitative match to psychophysical data. Overall, these results suggest that prior experience may be incorporated into sensorimotor planning via Hebbian learning and as a natural byproduct of the ongoing dynamics of the underlying neural circuits.

## Acknowledgments

We would like to thank Charles Biddle-Snead for help with the network models. This project was funded by NIH grant P50 MH077970 (Conte Center) and the Swartz Foundation.

## Footnotes

**Conflicts of Interest:** Neither TV nor PS report any conflicts of interests in regards to this study.

## Reference List

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1.7M) |
- Citation

- Striatal prediction error modulates cortical coupling.[J Neurosci. 2010]
*den Ouden HE, Daunizeau J, Roiser J, Friston KJ, Stephan KE.**J Neurosci. 2010 Mar 3; 30(9):3210-9.* - Learning with slight forgetting optimizes sensorimotor transformation in redundant motor systems.[PLoS Comput Biol. 2012]
*Hirashima M, Nozaki D.**PLoS Comput Biol. 2012; 8(6):e1002590. Epub 2012 Jun 28.* - The generalization of visuomotor learning to untrained movements and movement sequences based on movement vector and goal location remapping.[J Neurosci. 2013]
*Wu HG, Smith MA.**J Neurosci. 2013 Jun 26; 33(26):10772-89.* - Dynamic neural correlates of motor error monitoring and adaptation during trial-to-trial learning.[J Neurosci. 2014]
*Tan H, Jenkinson N, Brown P.**J Neurosci. 2014 Apr 16; 34(16):5678-88.* - Parieto-frontal coding of reaching: an integrated framework.[Exp Brain Res. 1999]
*Burnod Y, Baraduc P, Battaglia-Mayer A, Guigon E, Koechlin E, Ferraina S, Lacquaniti F, Caminiti R.**Exp Brain Res. 1999 Dec; 129(3):325-46.*

- Visuomotor adaptation needs a validation of prediction error by feedback error[Frontiers in Human Neuroscience. ]
*Gaveau V, Prablanc C, Laurent D, Rossetti Y, Priot AE.**Frontiers in Human Neuroscience. 8880* - Bayesian Action&Perception: Representing the World in the Brain[Frontiers in Neuroscience. ]
*Loeb GE, Fishel JA.**Frontiers in Neuroscience. 8341* - Cerebellar and Prefrontal Cortex Contributions to Adaptation, Strategies, and Reinforcement Learning[Progress in brain research. 2014]
*Taylor JA, Ivry RB.**Progress in brain research. 2014; 210217-253* - Visual target distance, but not visual cursor path length produces shifts in motor behavior[Frontiers in Psychology. ]
*Wendker N, Sack OS, Sutter C.**Frontiers in Psychology. 5225* - Decay of motor memories in the absence of error[The Journal of neuroscience : the official ...]
*Vaswani PA, Shadmehr R.**The Journal of neuroscience : the official journal of the Society for Neuroscience. 2013 May 1; 33(18)7700-7709*

- PubMedPubMedPubMed citations for these articles

- How each movement changes the next: an experimental and theoretical study of fas...How each movement changes the next: an experimental and theoretical study of fast adaptive priors in reachingNIHPA Author Manuscripts. 2011 Jul 6; 31(27)10050

Your browsing activity is empty.

Activity recording is turned off.

See more...