Fixational eye movements enhance the precision of visual information transmitted by the primate retina

Fixational eye movements alter the number and timing of spikes transmitted from the retina to the brain, but whether these changes enhance or degrade the retinal signal is unclear. To quantify this, we developed a Bayesian method for reconstructing natural images from the recorded spikes of hundreds of retinal ganglion cells (RGCs) in the macaque retina (male), combining a likelihood model for RGC light responses with the natural image prior implicitly embedded in an artificial neural network optimized for denoising. The method matched or surpassed the performance of previous reconstruction algorithms, and provides an interpretable framework for characterizing the retinal signal. Reconstructions were improved with artificial stimulus jitter that emulated fixational eye movements, even when the eye movement trajectory was assumed to be unknown and had to be inferred from retinal spikes. Reconstructions were degraded by small artificial perturbations of spike times, revealing more precise temporal encoding than suggested by previous studies. Finally, reconstructions were substantially degraded when derived from a model that ignored cell-to-cell interactions, indicating the importance of stimulus-evoked correlations. Thus, fixational eye movements enhance the precision of the retinal representation.

from the main text, with additional comparisons against simpler reconstruction methods.Columns (left to right): Simulated noiseless reconstruction, a reconstruction of the stimulus from linear projections onto the LNBRC filters (see Methods); Known-LNBRC-dCNN, MAP reconstruction with known eye movements, using LNBRC encoding and dCNN prior; Zero-LNBRC-dCNN, MAP reconstruction while incorrectly assuming no eye movements, using LNBRC encoding and dCNN prior; Joint-LNBRC-dCNN, reconstruction by jointly estimating image and eye movements, using LNBRC encoding and dCNN prior; Known-LNBRC-1F, MAP reconstruction with known eye movements, using LNBRC encoding and simpler 1/F Gaussian prior; Known-LNP-dCNN, MAP reconstruction with known eye movements, using simpler LNP encoding and dCNN prior.The original stimuli were drawn from the ImageNet database [1], and are unavailable due to copyright restrictions.Comparing LNBRC-dCNN against LNBRC-1/F, in which the dCNN prior was replaced with a simpler Gaussian 1/F prior.Reconstructions using the dCNN prior were systematically better than those using the 1/F prior, demonstrating that the dCNN natural image prior contributed substantially to reconstruction quality.3a (reconstruction quality as a function of quantity of fixational drift eye movements) using LPIPS, an alternative measure of perceptual distance based on a deep neural network trained for object recognition.Smaller LPIPS values corresponds to higher quality.Each color corresponds to a different preparation (same color convention as Figure 3a).Error bars correspond to the standard error of the sample mean.The results using LPIPS are consistent with those using MS-SSIM: regardless of whether eye movements were known a priori (Known-LNBRC-dCNN, dashed lines) or jointly estimated along with the image (joint-LNBRC-dCNN, solid lines), LPIPS decreases with increasing eye movements (images improve in quality with increasing eye movements).Failure to compensate for eye movements (zero-LNBRC-dCNN, dotted lines) resulted in LPIPS increasing with increasing eye movements.(b) Re-analysis of Figure 3b (population-specific reconstruction quality as a function of quantity of fixational drift eye movements) using LPIPS.The solid lines correspond to the parasol-only joint-LNBRC-dCNN reconstructions, and the dashed lines to midget-only joint-LNBRC-dCNN reconstructions.Error bars correspond to the standard deviation of the sample mean.In all cases, LPIPS decreased with increasing eye movements, demonstrating that drift eye movements improved both the parasol cell and midget cell signals.  .Rather than build an expression for q(w | s, y) all at once, we use a particle filter to represent q (j) (w | s, y), our estimate of q after the first j frame transitions, and iteratively update the particle filter once for each frame transition.q (j) must satisfy the functional form for q, and can be written as In order to update the particle filter for timestep j + 1, we work with the unnormalized distribution (e.g. the numerator in the above expression) γ j = p(s | w, y)r (j) (w i−1 ).Using a sampling distribution v j+1 (w (j+1) | w (j) ), the particle filter update weight α i+1 can be expressed as Note that we are yet to define what r and v are, and that we are free to define these however we wish.Let us define v j+1 as j+2,...,T = w j+1 , w 0,...,j = w (j) 0,...,j

otherwise
This definition means that we update the particle for timestep j + 1 by setting the eye position at w j+1 as well as all subsequent eye positions to the same random draw from the distribution p(w j+1 | w j ), while leaving all eye postions from previous times t ≤ j unchanged.
We then define r i−1 ) to be the following: • For i = 0 and any value of j, r 0 (w Since y (i) is a fixed known constant from the previous iteration of the algorithm, the term E w∼q(w|y (i) ,s) [log q(w | y (i) , s)] is simply a constant and can be discarded from the optimization.In addition, since the term E w∼q(w|y (i) ,s) [log p(w)] doesn't depend on the optimization variable y, we can discard it as well.This leaves the problem arg max y log p(y) + E w∼q(w|y (i) ,s) log p(s | y, w) (3.1) We have freedom to pick the distribution q(w | y (i) , s), but what distribution gives us the tightest bound?As it turns out, the gap between the inequality and the equality is exactly the KL-divergence between q(w | y (i) , s) and p(w | y, s) D KL (q(w | y (i) , s) ∥ p(w | y, s)) = E w∼q(w|y (i) ,s) log q(w | y (i) , s) p(w | y, s) = E w∼q(w|y (i) ,s) [log q(w | y (i) , s)] − E w∼q(w|y (i) ,s) log p(w, s | y) p(s | y) = log p(s | y) − E w∼q(w|y (i) ,s) log p(w, s | y) q(w | y (i) , s) and hence we want to choose a distribution q(w | y, s) to be as close to p(w | y, s) as possible to make the lower bound tight.Obviously, explicitly computing p(w | y, s) will be rather difficult, and hence we iteratively build an approximation to it using particle filtering.

Sequential importance sampling
Define γ n (w 0:n ) as an unnormalized distribution.Let p(w n | w n−1 ) be the hidden state transition kernel, and let p(s n | w n ) be the observation model.

Figure S1 :
Figure S1: LNBRC fit quality for fixational drift natural movies, for cells of each of the major cell types in the macaque monkey (ON parasol, OFF parasol, ON midget, OFF midget).(a) Comparison of LNBRCsimulated repeat rasters and PSTH with real recorded repeats in response to fixational drift natural movie stimuli, for a single example ON parasol RGC.(b) Histogram showing fraction of PSTH variance explained by the LNBRCs, for every ON parasol RGC recorded in one preparation.The fraction of explained variance was systematically high for every cell, suggesting that LNBRCs can accurately represent retinal responses to natural movie stimuli.(c) and (d) Same as (a) and (b) for OFF parasol RGCs.(e) and (f ) Same as (a) and (b) for ON midget RGCs.(g) and (h) Same as (a) and (b) for OFF midget RGCs.

Figure S2 :
Figure S2: Additional example fixational drift stimulus images and reconstructions, from the same experimental preparation as Figure2dfrom the main text, with additional comparisons against simpler reconstruction methods.Columns (left to right): Simulated noiseless reconstruction, a reconstruction of the stimulus from linear projections onto the LNBRC filters (see Methods); Known-LNBRC-dCNN, MAP reconstruction with known eye movements, using LNBRC encoding and dCNN prior; Zero-LNBRC-dCNN, MAP reconstruction while incorrectly assuming no eye movements, using LNBRC encoding and dCNN prior; Joint-LNBRC-dCNN, reconstruction by jointly estimating image and eye movements, using LNBRC encoding and dCNN prior; Known-LNBRC-1F, MAP reconstruction with known eye movements, using LNBRC encoding and simpler 1/F Gaussian prior; Known-LNP-dCNN, MAP reconstruction with known eye movements, using simpler LNP encoding and dCNN prior.The original stimuli were drawn from the ImageNet database[1], and are unavailable due to copyright restrictions.

Figure S3 :
Figure S3: Performance comparison of the LNBRC-dCNN MAP reconstruction algorithm against simpler alternatives, for the eye movements natural movies, in the case that the eye movements are known a priori.N = 1992 test images were used to perform the comparisons in each panel.(a) Comparing LNBRC-dCNN against LNP-dCNN, in which the LNBRC encoding model was replaced with a simpler LNP encoding model.Reconstructions using the LNBRC encoding model were systematically better than those using the LNP model, demonstrating that the more sophisticated LNBRC encoding model contributed substantially to reconstruction quality.(b)Comparing LNBRC-dCNN against LNBRC-1/F, in which the dCNN prior was replaced with a simpler Gaussian 1/F prior.Reconstructions using the dCNN prior were systematically better than those using the 1/F prior, demonstrating that the dCNN natural image prior contributed substantially to reconstruction quality.
Figure S5: Comparison of parasol-only and midget-only reconstructions for the fixational drift eye movements stimulus, in one preparation.(a) Example reconstructions, using the joint-LNBRC-dCNN simultaneous eye movement estimation and image reconstruction algorithm.Columns (left to right): Simulated noiseless parasol RGC reconstruction, reconstructions of the stimuli from linear projections onto the parasol RGC LNBRC filters only (see Methods); joint-LNBRC-dCNN parasol RGC only, joint reconstructions computed from experimental data using only parasol RGCs; Simulated noiseless midget RGC reconstruction, reconstructions of the stimuli from linear projections onto the midget RGC LNBRC filters only (see Methods); and Midget RGCs, joint reconstruction from experimental data computed using only midget RGCs.The midget-only reconstructions contained greater fine spatial detail than the parasol-only reconstructions.The original stimuli were drawn from the ImageNet database [1], and are unavailable due to copyright restrictions.(b) Comparison of MS-SSIM reconstruction quality between the midget-only and parasol-only reconstructions.For nearly every image, the midget-only reconstruction quality exceeded that of parasol-only reconstructions.(c) Comparison of LPIPS perceptual distance between midget-only and parasol-only reconstructions.For nearly every image, the midget-only perceptual distance was smaller than that of parasol-only reconstructions.

Figure S6 :Figure S7 :Figure S8 :Figure S9 :
FigureS6: Re-analysis of Figure4using LPIPS.(a) Flashed natural image reconstruction performance as a function of spike timing perturbation, in four experimental preparations (colors).The x-axis is plotted on a log scale, with a broken axis to facilitate comparison with unperturbed data.Error bars in all panels correspond to the standard error of the sample mean.Performance at each level of temporal perturbation was evaluated using N = 1500, N = 1750, N = 750, and N = 750 images for the blue, black, green, and yellow preparations, respectively.LPIPS remained relatively constant up to 10 ms of spike timing jitter.(b) Fixational drift natural image reconstruction performance as a function of spike timing perturbation, in three experimental preparations.The solid lines correspond to joint reconstruction of the image and eye trajectory, while the dashed and faded lines correspond to reconstruction of the image alone with known eye trajectory.Performance at each level of temporal perturbation was evaluated using N = 1992 images for each experimental preparation.For the fixational drift stimulus, LPIPS gradually increased with increasing spike time perturbation, and degraded measurably with 2-5 ms of spike timing perturbation.