Summary Summary (text) Abstract Abstract (text) MEDLINE XML PMID List

#### Send to
jQuery(document).ready( function () {
jQuery("#send_to_menu input[type='radio']").click( function () {
var selectedValue = jQuery(this).val().toLowerCase();
var selectedDiv = jQuery("#send_to_menu div." + selectedValue);
if(selectedDiv.is(":hidden")){
jQuery("#send_to_menu div.submenu:visible").slideUp();
selectedDiv.slideDown();
}
});
});
jQuery("#sendto").bind("ncbipopperclose", function(){
jQuery("#send_to_menu div.submenu:visible").css("display","none");
jQuery("#send_to_menu input[type='radio']:checked").attr("checked",false);
});

File Clipboard Collections E-mail Order My Bibliography Citation manager

Format Summary (text) Abstract (text) MEDLINE XML PMID List CSV

- 1 selected item: 26523390
Format Summary Summary (text) Abstract Abstract (text) MEDLINE XML PMID List MeSH and Other Data E-mail Subject Additional text

Generate a file for use with external citation management software.

# Attention stabilizes the shared gain of V4 populations.

### Author information

- 1
- Center for Neural Science, Howard Hughes Medical Institute, New York University, New York, United States.
- 2
- Department of Neuroscience and Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, United States.

### Abstract

Responses of sensory neurons represent stimulus information, but are also influenced by internal state. For example, when monkeys direct their attention to a visual stimulus, the response gain of specific subsets of neurons in visual cortex changes. Here, we develop a functional model of population activity to investigate the structure of this effect. We fit the model to the spiking activity of bilateral neural populations in area V4, recorded while the animal performed a stimulus discrimination task under spatial attention. The model reveals four separate time-varying shared modulatory signals, the dominant two of which each target task-relevant neurons in one hemisphere. In attention-directed conditions, the associated shared modulatory signal decreases in variance. This finding provides an interpretable and parsimonious explanation for previous observations that attention reduces variability and noise correlations of sensory neurons. Finally, the recovered modulatory signals reflect previous reward, and are predictive of subsequent choice behavior.

#### KEYWORDS:

attention; computation; computational biology; neuroscience; sensory; statistics; systems biology; vision

- PMID:
- 26523390
- PMCID:
- PMC4758958
- DOI:
- 10.7554/eLife.08998

- [Indexed for MEDLINE]

**DOI:**http://dx.doi.org/10.7554/eLife.08998.003

**DOI:**http://dx.doi.org/10.7554/eLife.08998.004

*σ*

^{2}, the Fano factor (variance divided by the mean) should increase with firing rate,

*μ*:

On the other hand, for an additive noise source of variance

*σ*

^{2}, the Fano factor should

*decrease*with firing rate:

These expected trends are illustrated in panel (

**a**). Panel (

**b**) shows the mean value of these quantities, estimated for the cue-towards condition across all cells. The data are clearly consistent with a multiplicative noise source. A similar trend is observed in the cue-away condition, albeit with an overall lower mean rate, and higher Fano factor. This analysis assumes that the noise source has constant variance across stimulus conditions. There were some small, non-monotonic changes in the estimated shared modulators’ variance over the stimulus sequence. Factoring these in does not change the direction of the predictions or the data shown here. A similar analysis can be performed on pairwise response statistics (as stimuli evoke higher mean rates, a multiplicative model predicts correlations will increase, while an additive model predicts correlations will decrease). But these predictions prove more sensitive to the assumption of stability in

*σ*

^{2}, so we omit them here.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.005

**a**) Performance comparison of various submodels, measured as log-likelihood (LL) of predictions on held-out data. Values are expressed relative to performance of a stimulus-drive-only model (leftmost point), and increase as each model component (cue, slow drift, and different numbers of shared modulators) is incorporated. The grey square shows the predictive LL for a two-modulator model, with each modulator constrained to affect only one hemisphere (i.e. with coupling weights set to zero for neurons in the other hemisphere). This restricted model is used for all results from onwards, excepting the fine temporal analysis of . (

**b**) Modulators are anatomically selective. Inferred coupling weights for a two-modulator model, fit to a population of units recorded on one day. Each point corresponds to one unit. As the model does not uniquely define the coordinate system (i.e. there is an equivalent model for any rotation of the coordinate system), we align the mean weight for LHS units to lie along the positive x-axis (see Materials and methods). (

**c**) Distribution of inferred coupling weights aggregated over all recording days indicates that each shared modulator provides input primarily to cells in one hemisphere. (

**d**) Hemispheric modulators are functionally selective. Units which are better able to discriminate standard and target stimuli in the cue-away condition have larger coupling weights (blue line). Discriminability is estimated as the difference in mean spike count between standard and target stimuli, divided by the square root of their average variance (

*d′*). Values are averaged over units recorded on all days, subdivided into five groups based on their coupling weights. Shaded area denotes ±1 standard error. Pearson correlation over all units is

*r*= 0.42. This relationship is not seen for the weights that couple neurons to the slow global drift signal (gray line, Pearson correlation

*r*= 0.00). The relationship between

*d′*and cue weight is significant, but weaker than for modulator weight (

*r*= 0.24); this is not shown here as the cue weights are differently scaled. (

**e**) Same as in (

**d**), but with units subdivided into subgroups according to mean firing rate. Each line represents a subpopulation of ∼500 units with similar firing rates (from red to blue: 0–7; 7–12; 12–17; 17–25; 25–35; 35–107 spikes/s). Within each group, the Pearson correlations between

*d*′ and coupling weight are between 0.2–0.3, but the correlations between mean rate and coupling weight are weak or negligible.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.006

*F*, the cue-dependent gains

*C*, and the slow global drifts

*D*. Then, for a given number of simulated modulators

*K*(from 1 to 12), we sampled a (

*T*×

*K*) matrix of time-varying modulator values (each value i.i.d. Gaussian), and a (

*K*×

*N*) matrix of random weights (each weight i.i.d. Gaussian), producing a net modulator matrix

*M*as the matrix product of these two. We then sampled spike counts

*Y*from the generative model, . For each population and

*K*, we chose the scaling parameter

*λ*such that the median noise correlation between the simulated neurons matched the median noise correlation between the actual neurons. We next fitted the models to these simulated datasets to see how well they recovered the underlying structure. As for the actual data (), we evaluated the model fits via the predictive log-likelihood on held-out data. Each colored line shows the predictive LLs of the fitted models for a given “ground truth” number of modulators. In comparison, the grey squares show the model performance for the actual data. There are two important patterns here. First, for simulated models containing up to 8 modulators, the predictive LLs are greatest when we fit a model having the same number of modulators as the ground truth number used to simulate the data. This demonstrates that the model is in principle able to recover more modulators than the 4 we fit to the actual data. Second, as the number of simulated modulators increases, the ability of the fitted models to make predictions on held-out data declines. This is because the total energy of the shared gain fluctuations is constrained by the measured noise correlations, and is spread amongst the simulated modulators. In this respect, the model predictions on the actual data are most consistent with simulations of 3 or 4 modulators. Finally, it is worth noting that, in simulation, when fitting more modulators than the ground truth, the predictive performance suffers. This reflects overfitting to the noise in the training set. We do not see as pronounced a decline for the model fits to the actual data: instead, the predictive LLs appear to saturate with the number of modulators. This difference between the actual and synthetic data likely reflects our assumption in the simulations that the modulators were all of equal magnitude. A saturation of predictive LL may arise when there is a small set of dominant modulators, and a number of weaker ones. We do not have the statistical power to explore such a long tail of modulatory influences within this dataset, and focus instead on the strongest components.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.007

*π*/2 indicates near orthogonality of the hemispheric weights. Second column: In , we showed that neurons which were task-relevant (i.e. had larger

*d*′ values) were more strongly coupled to the (1D) hemispheric shared modulators. Here, we show that this holds in higher dimensions. For each recording day, we measured the magnitudes of all units’ coupling weight vectors, . Green histograms show the distribution of magnitudes for the quartile of units with largest

*d*′ values; brown histograms show the distribution of magnitudes for the quartile with the smallest

*d*′ values. Third column: In , we showed that the variance of the (1D) hemispheric shared modulators changed according to the attentional cue: specifically, when the cue switched, one hemispheric modulator decreased in variance, while the other increased in variance. To show that this holds in higher dimensions, it is necessary to construct an appropriate metric for this change in second-order statistics that generalizes to higher dimensions, and that also does not depend on a choice of coordinate system. To accomplish this, we measure the effect of the attentional cue as a change in the covariance of the (multivariate) modulator. Considering the change from the cue-right to the cue-left condition, we can measure the effect on the modulator’s second-order statistics via the ratio of the two modulator covariances, . The eigenvalues of this matrix then provide a coordinate-system-free measure of how the modulator statistics change. If the largest eigenvalue,

*λ*, is significantly greater than 1, then there is a direction in modulation space that became more variable due to the switch in cue. If the smallest eigenvalue,

_{max}*λ*, is significantly less than 1, then there is a direction in modulation space that became less variable due to the switch in cue. Eigenvalues close to 1 indicate that the variance of modulation in that direction was unchanged by the cue. Thus these two values,

_{min}*λ*and

_{max}*λ*, play an analogous role to the ratios of modulator variance examined in . The scatter plots show the distribution of

_{min}*λ*and

_{max}*λ*for the higher-dimensional modulator models. Blue points show these eigenvalues from each recording day; red points show the distributions obtained if we shuffle the cue labels for each trial. Importantly, when

_{min}*λ*exceeds 1 and

_{max}*λ*is less than 1 (i.e. when the points lie in the lower right quadrant), then the change in attentional cue is causing an increase in modulator variance in one direction, and a decrease in modulator variance in an orthogonal direction. These effects are clear (and significant, compared with the null distribution in red) in all cases.

_{min}**DOI:**http://dx.doi.org/10.7554/eLife.08998.008

**DOI:**http://dx.doi.org/10.7554/eLife.08998.009

*r*

^{2}= 0.21).

**DOI:**http://dx.doi.org/10.7554/eLife.08998.010

**a**) Example values of the cue signal (imposed by experiment), the slow drift (inferred), and a single hemispheric modulator (inferred) across stimulus presentations for one day and hemisphere. In the model, the gain of each neuron is obtained by exponentiating a weighted sum of these three signals (see ). Histogram in the bottom left shows the distribution of modulator values when the monkey was cued towards the contralateral side (blue), and away from it (red). (

**b**) Modulator variance decreases under cued attention. Histogram shows the ratio of modulator variances estimated in the two cue conditions. Averaged across days and hemispheres, cued attention reduces modulator variance by 23%.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.011

**a**) The classical model of attention. Simulation of two neurons with positive coupling weights, and , to the cue signal. When the cue is directed to the corresponding spatial location (top), both the mean and variance of the simulated neurons’ spike counts increase (bottom). Shaded areas demarcate analytic iso-density contours, i.e. the shape of the joint spike count distributions. (

**b**) The effect of the shared modulator. Simulation of two simulated neurons with positive coupling weights, and , to a shared modulator. A decrease in modulator variance leads to a decrease in both the variance and correlation of spike counts (bottom). (

**c**) Effects on an example pair of units within the same hemisphere, on one day of recording. The cue increases the gain of both cells (numbers indicate cue coupling weights), and the inferred modulator exhibits a decreased variance in cued trials (again numbers indicate coupling weights; top). The spiking responses of the cells exhibit a combination of the effects simulated in (

**a**) in (

**b**): increased mean, decreased Fano factor, and decreased correlation (bottom; means from 7.0 to 8.1 and 6.0 to 7.2 spikes/stim respectively; Fano factors from 1.9 to 1.6 and 1.7 to 1.6 respectively; correlation from 0.19 to 0.10). The shaded areas demarcate smoothed iso-density contours estimated from the data.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.012

**a**) Observed Fano factor and noise correlations, as a function of model coupling weight. Units from all days are divided into five groups, based on their fitted coupling weight to their respective population modulator (model-based quantities), as in . Points indicate the average Fano factors and noise correlations (model-free quantities) within each group, when attention was cued towards the associated visual hemifield (blue) and away from it (red). Shaded area denotes ± 1 standard error. Unitwise Spearman correlations:

*ρ*= 0.31/0.44/0.40/0.51 (fano cue twds/fano cue away/ncorr cue twds/ncorr cue away). (

**b**) Comparison of model-predicted vs. measured decrease in Fano factor and noise correlation. Units are divided into ten groups, based on coupling weights (darker points indicate larger weight). The model accounts for 62% of the cue-induced reduction in Fano factor, and 71% of the reduction in noise correlation. (

**c**) Comparison of cue weights and modulator weights. Units that are strongly coupled to the cue signal are typically strongly coupled to the modulator signal, though the relationship is only partial (unitwise Spearman correlation:

*ρ*= 0.26). These results are robust when controlled for firing rate ().

**DOI:**http://dx.doi.org/10.7554/eLife.08998.013

**a**) We repeated the analysis of , but subdivided the total population of units in two ways: first, by mean firing rate into six groups (rows), and then by coupling weight into five subgroups (points on each plot). Each row thus replicates for a controlled subpopulation of approximately 500 units with similar firing rates. Within each group, the correlation between mean rate and modulator coupling weight was weak or negligible. Nevertheless, the relationships of Fano factor and noise correlation to modulator weight remain. (

**b**) We also repeated the analysis of , subdividing the total population of units by mean firing rate into six groups, as in the rows of (

**a**). Again, the relationship between cue and modulator weights remains.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.014

**a**) Joint statistics of the two hemispheric modulators. Blue points: simultaneous values of the two modulators aggregated over all days. Thick black ellipse: iso-density contour at one standard deviation of the Gaussian density matching the empirical covariance. Thinner black ellipse: two standard deviations. Dashed lines: principal axes (eigenvectors) of this covariance, with the thicker dashed line indicating the axis with the larger eigenvalue. The vertical elongation of the ellipse shows that the variance of the modulator for the cued side is smaller than the variance of the modulator for the opposite side. The slight clockwise orientation shows that the two modulators have a very small positive correlation (, negligible). (

**b**) Autocorrelation of modulators across successive stimulus conditions. Individual lines show the within-block autocorrelation of each estimated modulator; the thick lines shows the average across days and hemispheres. For simplicity of presentation, the targets and the gaps between trials have been ignored. The time constant of this process is on the order of several seconds. (

**c**) Average time course of shared modulation within each stimulus presentation. We extended the population response model by allowing the value of the modulator to change over the course of each stimulus presentation. Given limitations of the data at fine temporal resolutions, we assumed that the temporal evolution of the modulator within each stimulus presentation followed some stereotyped pattern (up to a scale factor that could change from one stimulus presentation to the next; see Materials and methods). Fine blue lines: modulators’ (normalized) temporal structure extracted for each recording day. Heavy black line: average across days. Grey shaded area: normalized peri-stimulus time histogram (arbitrary units) of spiking responses during presentations of the standard stimuli, averaged across all units, days, and cue conditions, with zero denoting spontaneous rate. Shared modulation predominantly occurs during the sustained period and is nearly absent during the onset transient.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.015

**a**) Average effect of modulator values on subsequent behavioral performance, averaged across all days and difficulty levels. Values show the average change in hit probability for targets on the cued side (left column) and the opposite side (right column) following a unit increase in the cued (top row) and opposite (bottom row) modulators. *, **, ***. Full psychometric curves are shown in . (

**b**) Average effects of previous trial reward on current trial performance. Note that this is a direct comparison of the behavioral data, and does not involve the modulator model. (

**c**) Average effects of previous trial reward on the value of the two hemispheric shared modulators.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.016

**a**) Psychometric performance, averaged across all days. These plots expand the results of showing the interacting effects of trial difficulty and the two hemispheric modulators on task performance. The hit probability is shown as a function of the orientation change in degrees for trials where the target was on the cued side (black points) or the opposite side (single gray point; opposite side targets were only presented at 12 deg). For each condition, the color gradient shows the effects that values of the cued modulator (left panel) and opposite modulator (right panel) have on the hit probability. We fit a family of psychometric curves to the cued-target conditions, with the two modulator values as regressors; the colored lines in each panel show two of these curves, indicating the biasing effect of (left panel), and (right panel) on performance. (

**b**) Left: average effects of previous trial reward on current trial performance, from . Note that this is a direct comparison of the behavioral data, and is not dependent on the modulator model. Right: effect of reward on performance predicted by chaining together the effects of reward on modulator () and modulator on performance (). The biasing effect of reward on behavior, as mediated by the V4 modulators, is consistent with the observed data (left), but captures only a relatively small proportion of the total reward bias (∼5–10%). To estimate the total behavioral reward bias, we fitted Bernoulli-GLMs (i.e. GLMs with a Bernoulli observation process) which predict the response (hit/miss), given the previous trial’s reward (hit for target on cued side/hit for target on opposite side/other) as regressors. When the current trial’s target was cued, we treated the orientation change as an additional regressor, and we included a lapse parameter as behavioral performance typically saturated below 100% correct (). The effect of previous reward in this model manifests as a bias term within the sigmoid (logistic) nonlinearity. To estimate the V4-mediated reward bias, we measured how large these total behavioral reward biases were if they had to pass through the “bottleneck” of the V4 modulators. We thus fitted three GLMs: a Gaussian-GLM which predicts the cued modulator on a trial, given the previous reward (and the previous modulator values); a second Gaussian-GLM which predicts the opposite modulator on a trial in the same way; and a Bernoulli-GLM which predicts the response (hit/miss), with the two modulators on that trial as a regressor. By multiplying these two effects together (the average change in modulators due to each previous reward state in the first and second GLMs, with the modulator coefficients in the third GLM), we obtained the desired quantities.

**DOI:**http://dx.doi.org/10.7554/eLife.08998.017

**a**) Illustration of how shared gain fluctuations would behave if they were noise, i.e. undesirable random fluctuations. In baseline conditions (red), gain fluctuations would be expected to have similar variance for all neurons in the V4 population. The action of attention would be expected to reduce the variance of gain fluctuations in task-relevant neurons, so as to mitigate their adverse effect on coding precision (see ). (

**b**) Contrary to this simple “noise” interpretation, the variance of shared gain fluctuations are markedly larger for task-relevant neurons than task-irrelevant neurons in baseline (cued away) conditions. Moreover, although this variance decreases under attentional cueing (cued toward), it remains larger for the task-relevant neurons. Functional relevance for each unit is measured as

*d*′ (as in ); shared gain variability, , is measured as the total variance of model-estimated gain fluctuations (from slow drift and modulators combined). These results are robust when controlled for firing rate ().

**DOI:**http://dx.doi.org/10.7554/eLife.08998.018

**DOI:**http://dx.doi.org/10.7554/eLife.08998.019

**DOI:**http://dx.doi.org/10.7554/eLife.08998.020

### Publication types, MeSH terms, Grant support

#### Publication types

#### MeSH terms

- Action Potentials
- Animals
- Attention*
- Haplorhini
- Models, Neurological
- Perception*
- Sensory Receptor Cells/physiology*