Decoding network-mediated retinal response to electrical stimulation: implications for fidelity of prosthetic vision

Objective. Patients with photovoltaic subretinal implant PRIMA demonstrated letter acuity ∼0.1 logMAR worse than sampling limit for 100 μm pixels (1.3 logMAR) and performed slower than healthy subjects tested with equivalently pixelated images. To explore the underlying differences between natural and prosthetic vision, we compare the fidelity of retinal response to visual and subretinal electrical stimulation through single-cell modeling and ensemble decoding. Approach. Responses of retinal ganglion cells (RGCs) to optical or electrical white noise stimulation in healthy and degenerate rat retinas were recorded via multi-electrode array. Each RGC was fit with linear–nonlinear and convolutional neural network models. To characterize RGC noise, we compared statistics of spike-triggered averages (STAs) in RGCs responding to electrical or visual stimulation of healthy and degenerate retinas. At the population level, we constructed a linear decoder to determine the accuracy of the ensemble of RGCs on N-way discrimination tasks. Main results. Although computational models can match natural visual responses well (correlation ∼0.6), they fit significantly worse to spike timings elicited by electrical stimulation of the healthy retina (correlation ∼0.15). In the degenerate retina, response to electrical stimulation is equally bad. The signal-to-noise ratio of electrical STAs in degenerate retinas matched that of the natural responses when 78 ± 6.5% of the spikes were replaced with random timing. However, the noise in RGC responses contributed minimally to errors in ensemble decoding. The determining factor in accuracy of decoding was the number of responding cells. To compensate for fewer responding cells under electrical stimulation than in natural vision, more presentations of the same stimulus are required to deliver sufficient information for image decoding. Significance. Slower-than-natural pattern identification by patients with the PRIMA implant may be explained by the lower number of electrically activated cells than in natural vision, which is compensated by a larger number of the stimulus presentations.


Introduction
Age-related macular degeneration (AMD) is a leading cause of untreatable blindness. Geographic atrophy (GA), the atrophic form of advanced AMD, affects around 3% of people above the age of 75, and around 25%-above90 [1,2]. Due to gradual loss of photoreceptors in the central macula, GA patients experience severe deterioration in high-resolution central vision, compromising their ability to read and recognize faces. Although central vision degrades over time, patients retain their low-resolution peripheral vision, and hence typically do not lose visual acuity beyond 20/400. One approach to restoration of sight in retinal degeneration is to replace the missing photoreceptors with photodiodes [3,4], which convert the incident light into electric current flowing through the retina, and thus convey visual information to the secondary neurons by electrical stimulation. Massive amplification mechanisms in photoreceptors enable their operation in a very broad (about 10 orders of magnitude) range of light intensity. Photodiodes require much brighter illumination (about 1 mW mm −2 ) in order to provide sufficient current for retinal stimulation. Therefore, photovoltaic system for restoration of sight includes augmented-reality glasses, where images captured by the camera are projected into the eye using more intense light. To avoid perception of this intense light by the remaining photoreceptors, near-infrared wavelength (880 nm) is used. To provide chargebalanced electrical stimulation, the light is pulsed, and to enable stable visual percepts, pulse repetition rate should exceed the frequency of flicker fusion (around 30 Hz). For functional restoration of sight, prosthetic vision should be encoded in a way which brain can easily decode, and, ideally, should be as close to natural, as possible.
In healthy retina, optical information (local light intensity) is converted via phototransduction into decrease of the cell potential (hyperpolarization) in photoreceptors, which, in turn, reduces the rate of release of neurotransmitter glutamate in synapses with the secondary neurons -bipolar and horizontal cells. By providing lateral inhibition that forms an antagonistic surround, horizontal cells perform the first step in spatial contrast enhancement. Bipolar cells electrically integrate inputs from multiple photoreceptors and relay these signals to tertiary retinal neurons -retinal ganglion cells (RGCs), with their inputs regulated by amacrine cells primarily via lateral inhibition. Finally, RGCs convert these signals into bursts of the action potentials ('spikes'), which propagate via optic nerve to the brain. Encoding of the visual information in different types of RGCs varies by multiple properties: ON and OFF pathways, sizes of receptive fields (RFs), transient and sustained responses, chromatic sensitivity, etc. Signals from the overlapping mosaics of the various types of RGCs are further processed in the brain before merging into a single visual percept. In atrophic AMD, photoreceptors in the central macula slowly degenerate and disappear, while the inner retinal neurons remain largely intact, albeit with some rewiring [5][6][7].
Ex-vivo and in-vivo animal studies with photovoltaic replacement of photoreceptors demonstrated preservation of multiple features of the natural retinal signal processing: sizes of the RGC RFs, with antagonistic surround [8] and non-linear summation of subunits [3], flicker fusion [9] and adaptation to static images [10], and spatial resolution matching the pixel pitch (55 and 75 μm) [9]. Interestingly, both, ex-vivo and in-vivo studies demonstrated not only excitatory (ON) but also inhibitory (OFF) responses to electrical stimulation. The latter may be explained by stimulation of rod bipolar cells, which feed into the ON and OFF cone pathways via amacrine cells. Contrast sensitivity of prosthetic vision in rats appears to be about five times lower than natural [11], which is likely due to the fact that horizontal cells connect to the terminals of photoreceptors, in GA they become detached from the remaining neural network. Therefore, some reduction in contrast sensitivity is expected [11], and for prosthetic vision it might be partially compensated by image processing prior to projection onto the implant. Clinical trial demonstrated that patients correctly perceive various patterns of lines and letters, demonstrating monochromatic shaped vision with resolution closely matching the pixel size (100 μm in the first trial) [4]. They also report flicker fusion at frequencies exceeding 30 Hz.
One of the features of such prosthetic vision, however, appears to be the lower than normal speed of the pattern and letter recognition. In the first trial, it took about 4 s for letter identification by patients with PRIMA implants [4], while it takes less than half a second in normal subjects, when font sizes exceed the acuity limit [12]. Here, using a retrospective analysis of the previously recorded data [8], we investigate the potential retinal underpinnings of this phenomenon by comparing RGC responses to visual and electrical stimulation in healthy and degenerate rat retina recorded on a multi-electrode array (MEA). In particular, we assess the amount of noise in various cellular responses, as well as the strategies for image recognition based on ensemble encoding by a population of cells.

Photovoltaic implants
Photovoltaic arrays (1 mm diameter, 30 μm in thickness, with 75 μm pixels) (figure 1(a)) were manufactured from crystalline silicon, as described earlier [13], to produce anodic-first pulses. Active electrode was 20 μm in diameter, and each pixel was surrounded by a return electrode, connected into a mesh common to all pixels. Both electrodes were coated with SIROF film of about 300 nm in thickness [9].

Retinal recording
The retinal data is taken from the previously published recordings [8]. We used Long-Evans (LE, n=4) and Royal College of Surgeons (RCS, n = 4, P120-130) rats for healthy and USC). All experimental procedures were approved by the Stanford Administrative Panel on Laboratory Animal Care, and all animals were kept in accordance with the institutional guidelines and conformed to the guidelines of the Association for Research in Vision and Ophthalmology Statement for the Use of Animals in Ophthalmic and Vision Research. Eyes were enucleated from euthanized (390 mg kg −1 pentobarbital sodium, 50 mg ml −1 phenytoin sodium) rats. A section of the retina (∼3 mm × 3 mm) was dissected and placed ganglion cells side facing a 512-electrode MEA ( figure 1(b)) [14]. The retina was constantly perfused with Ames' medium and bubbled with a mixture of 95% O 2 and 5% CO 2 . The body temperature of rats is around 36 °C-37 °C. To allow long recordings ex-vivo, metabolic rate is slowed down by lowering temperature to around 30 °C. We picked 29.4 °C to be consistent with previous ex-vivo studies [3,8,10]. For electrical stimulation, an implant was placed onto the subretinal side of the tissue (figures 1(c) and (d)). A nylon mesh (∼100 μm cell size) was used to press the implant and retina onto the MEA for better contact [10]. Voltage waveforms from each of the 512 electrodes on the MEA were sampled at 20 kHz frequency, amplified and digitized using custom-made readout electronics and data acquisition system [14].

Stimulation protocol
For electrical stimulation, an 880 nm diode laser coupled via a 400 μm multimode fiber was used for illumination. The beam exiting from the fiber was collimated and homogenized using a 2° divergence microlens array. In the same optical path, we placed a yellow LED (591 nm) for visual stimulation. Both light sources were used as backlighting for an LCD screen (Holoeye HEO-0017) to generate images [3,10]. The 8-bit LCD panel had a 60 Hz native frame rate, 1024 × 768 resolution, and a white-to-black intensity ratio of 10000:1 at 591 nm and 200:1 at 880 nm. Projected onto the retina, each screen pixel formed a 6 × 6 μm 2 square.
To characterize spatiotemporal properties of RGCs, a spatiotemporal binary white noise stimulus was used, where each pixel in each frame had a 50% chance of being bright or dark [15]. The white noise for visual stimulation was shown at 30 Hz frame rate, and made up of pixels of 60 μm in width on the retina. The white noise for electrical stimulation was displayed at 20 Hz frame rate, with the backlight laser pulsing at 4 ms, and consisted of pixels of 70 μm in width on the retina. Each white noise stimulus lasted for 30 min.

Spike sorting
Electrical stimulation generated artifacts by saturating the MEA recording amplifiers, so part of the recorded waveforms had to either be discarded or adjusted (see supplementary figure 1 (available online at https://stacks.iop.org/JNE/17/066018/mmedia)). To pre-process the data for the spike-sorting pipeline, the recording of the first 8.25 ms after the laser pulse was replaced with a randomly generated noise ('blanking') that matched the noise level of the electrode. All action potentials during this period were lost, which may lead to underestimation of the cell responsiveness. Afterwards, to remove any lingering capacitive decay outside of the blanked period, we fitted the trace with a 7th-order polynomial, and then subtracted it out from the original trace.
The artifact-subtracted raw data were then used to find and sort the action potentials ('spikes'). A negative voltage deflection exceeding three times the root-mean-squared noise on each electrode was considered a spike. Custom-made software was used to perform spike sorting, as described previously [3,14,16]. We applied dimensionality reduction to the detected spike waveforms using a principle component analysis, followed by expectation-maximization clustering [14]. For each putative neuron, we calculated its electrophysiological image (EI), which is the average electrical signal measured on the whole MEA when the neuron produced a spike. An EI typically shows the soma location and axonal trajectory of the RGC [17,18]. Electrically stimulated cells generally have faster, but weaker STAs, similar to previous observations [8,11]. Visual ON stimulation results in hyperpolarization of photoreceptors, and visual OFF leads to depolarization. Subretinal anodic stimulation of the LE retina will depolarize photoreceptor terminals, generating the same effect as a visual OFF stimulus. Therefore, the polarity of the peak closest to the right is inverted, and a visual LE ON cell is an electrical OFF cell. To date, there is no clear indication whether electrical ON and OFF RGCs in the degenerate retina correspond to any specific natural type of RGCs.
For our analysis, we selected cells with the following four criteria. First, for each candidate neuron, an estimate of the fraction of spikes coming from other neurons ('contaminating spikes') was obtained from the number of refractory period violations in the spike train [14]. Cells with over 10% contamination were excluded. Second, RGCs with backward propagating axonal signals were also excluded. Each of these criteria removed less than 10% of cells. Third, RGCs with the time course signal-to-noise ratio (SNR) below 3 were excluded from the analysis. For the SNR calculation, the peak value of the time course was used as a signal, and the root-mean-square value of the 10 time course values farthest from the time of the action potential was used as noise [8,11]. This criterion excluded a higher ratio of RCS RGCs than LE RGCs, exemplified by the SNR difference between the time courses in figures 2(c) and (e). Depending on the preparation, this criterion removed around 24%-40% of the initially identified LE visual cells, and 92%-94% of the RCS cells. Fourth, for LE retinas, we only included cells that responded to both visual and electrical stimulation, and have their somas under the implant. The restricted area of interest allows for a fairer comparison between the number of responsive cells across retina types. This procedure retained around ∼19%-24% of LE cells. Within the implant region, approximately 23%-32% of the visually responsive cells were also electrically responsive. Overall, compared to the initially identified cells, around13%-18%oftheLERGCsand6%-8%ofRCS RGCs were included in our analysis.

Modeling RGC responses
Experimental data for each cell was fitted to a linear-nonlinear (LN) model [15] and a convolutional neural network (CNN) model [19], as illustrated in figure 3. Due to drifting in recording data, we used a train-test-discard split of 20/20/60 (see supplementary figure  2). Only 20% was chosen for training, because we fitted our models to segments of data which were consistent with respect to data drifting, while keeping enough data to avoid model overfitting. The model was tested on another 20% of the test data in the unused 80%. Since there were no repeated stimuli, we compared the model predictions to experimental data by applying Gaussian broadening to each spike with σ = 2 white noise frames for smoothening ( figure 5(a)). This was applied to both the spikes measured in experiments and model-predicted spikes. We then computed the Pearson correlation coefficient for the resulting traces. In addition, all model fits were five-fold cross-validated, where we equally sized the sections dividing the entire recorded data. Neither model overfitted to any particular segment of the data.

LN model-Mathematically, the LN model (figure 3(a)) is set as follows:
R(s) = N(w ⋅ s) (1) where R = response s = stimulus w = linear weights/filter N = static nonlinearity The linear filter w can be computed through STA response to the white noise stimulus.
The static nonlinearity can be extracted by mapping the empirical cell activity to stimulus convolved with the linear weights (w ·s) [15].

CNN model-An
implementation of a CNN model has been proven useful for modeling the healthy salamander retina [19], and here we used a similar architecture (figure 3(b)) with two convolution blocks followed by a dense layer. Each convolution block consisted of a 2D convolution (weights), a parametric rectifier linear unit, batch normalization (norm), and a dropout layer. These last two components sped up the training while also regularizing the network to prevent overfitting. The number of filters and their dimensions for each convolution block were picked to optimize model performance on test data while avoiding overfitting and are listed in table 1. The output of the second convolution layer was flattened into a 1D vector before being fed into the final dense linear layer, which had a number of units matching the number of recruited RGCs in the retina.
The network was trained using the gradient-descent ADAM optimizer [20] and a Poisson log-likelihood. L2 weight regularization was employed on the convolution and linear layers, while L1 regularization was used on the output of the network. Especially for cells with lower firing rates, L1 can efficiently zero-out many weights. The complete loss function was defined as follows: where R i = model response i The input stimulus was similar to that used in computing STAs, while the response now included all activity and inactivity. For visual stimulation, 20 consecutive movie frames (spanning 600 ms) were considered one stimulus, and the spike rate during 33 ms following the stimulus was taken as the target response. Similarly, for electrical stimulation, five movie frames (250 ms) and the following 50 ms of activity was considered a stimulusresponse pair. To improve precision of spike timing while increasing the training sample count, the electrical stimulus was up-sampled with linear interpolation to 250 Hz, and the corresponding RGC spiking activity was binned to match the stimulus frame rate. During validation, the predicted activations were down-sampled back to the original frame rate before the correlation was computed.
The CNN is parameterized by 13 different hyperparameters, including filter count, size, stride, and nonlinearity for each of the two convolution blocks. In addition, we also explored different values for the learning rate, L1 and L2 coefficients, batch size, and dropout probability. We performed 100 trials for each dataset with randomized values for all parameters using the SHERPA hyperparameter optimization library [21], and the bestperforming hyperparameters were picked. Networks were trained for 50 epochs on electrical datasets and 50 epochs on visual datasets. The total training time on a single NVIDIA Titan X GPU was 30 min and 2 h, respectively.

RGC noise estimation
We estimated the noise level in RGC firing under electrical stimulation using the algorithm illustrated in figure 4(a). First, we passed the white noise stimulus through an LN model for an LE RGC under visual stimulation, generating a simulated spike train. For the noise-injection step, we then removed spikes randomly at some predefined ratio, which we denote noise ratio (NR). Spontaneous spikes were added into the spike train to match the original average spike rate via a Poisson process. With the new noise-injected spike train, we re-computed the STA. The boxed regions in figure 4(b) with dimensions (length, width, time) = (3 px, 3 px, 4 frames) were used for the subsequent analyses. Out of all LE retinas, we selected the cell that had the median AUC under visual stimulation as a reference. Noise-injection into the STA of this cell yielded a family of CCCs shown in figure 6(c). To characterize the noise of each cell under electrical stimulation, we matched its CCC to the curve in the family that has the most similar AUC, and the resulting matching NR characterizes the cell (figure 6(d)).

Ensemble encoding
To evaluate how much information is encoded by the ensemble of cells for the pattern recognition task, we simulated projection of pixelated Landolt-C onto a piece of retina ( figure 7(a)). Each presentation of the C lasted for five movie frames, and was spatially pixelized into either 70 μm (for electrical stimulus) or 60 μm (for visual stimulus) pixels.

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript s = argmin s w ⋅ s − w ⋅ s (6) The 500 blocks that best satisfy the above criterion are chosen, and the average RGC activity 30 ms following each block was considered a response to the Landolt-C. Afterwards, responses of all cells were concatenated into a template with time bins of 5 ms. Four different templates were created for four orientations (up, down, left, right) of the C using the same procedure. We then simulated 10000 trials with random orientations that will be decoded and identified for its ground truth orientation. For each trial, the number of spikes in each time bin was simulated as a Poisson process with its mean matching the spike rate in the bin in the corresponding template. The generated spiking pattern for each trial was correlated to all templates, and one with the highest Pearson's r was considered the decoded orientation. Decoding accuracy was taken as the ratio of correctly decoded trials to total. For the four LE retinas, the cell counts were 49, 49, 21, and 20; for the four RCS retinas, the cell counts were 19, 14, 13, and 9. To study the effect of number of cells on decoding accuracy, we fixed the size of the C at 14 pixels. To study the effect of C size and number of flashes, we included all cells on each retina into the decoder.

Single-cell response modelling
Each selected RGC recorded on an MEA were fitted with both an LN model [15] and a CNN model (mcintosh2018). The 30 min long recordings were split into a train-test-discard ratio of 20/20/60. After model training, both spikes in the test data and the model predictions were broadened with a Gaussian filter of σ = 2 white noise frames. The filtered test data was then correlated with the filtered model predictions using Pearson's correlation coefficient.
For the natural response of healthy retina, the LN model fitted to levels similar to previous reports in salamander and rat retinas (correlation in the range of 0.3) [19,22]. The CNN model fitted much better to the spike trains elicited by visual stimulation in ON and OFF cells (correlation of about 0.6, figure 5(a) top), agreeing with earlier studies in the salamander retina [23]. However, both models predictions correlated with retinal responses to electrical stimulation of the healthy or degenerate retina significantly worse (figure 5(a), center and bottom). Across a population of cells and multiple retinas (n =4 each), CNN fits to electrical data reached a correlation of only ∼0.15, significantly lower than 0.6 for the LE visual response ( figure 5(b)). The LN model fitted distinctly better to electrical OFF cells than ON cells in LE retinas (p < 10 −7 , two-sample t-test), while the CNN model fitted with less discrepancy between the two cell types (p = 0.013). For electrical ON cells, the CNN model fitted significantly better than the LN model (p < 10 −9 ), but the same cannot be said for electrical OFF cells. In RCS retinas, correlation with the CNN model was similar to the LE retina under electrical stimulation ( figure 5(b)). However, correlations with the LN model were far worse for RCS OFF cells than for LE electrical OFF cell, while that for electrical ON cells remained similar.

Noise estimation in RGC firing
To characterize the noise of each RGC, we computed a CCC and its corresponding AUC for each cell and its STA according to equations (3)-(5) (see section 2). If a cell responds perfectly only to one single type of stimulus with no spontaneous firing, the AUC would be 1 and the CCC curve would be a flat line at correlation equal to 1. If the cell is firing strictly spontaneously, the CCC would resemble the noise curve in figure 6(a). Figure 6(a) illustrates the characteristic correlation curves (CCCs) of three example cells, as well as the CCC for an STA generated from randomly sampled white noise frames, which is described by a square root dependence on the normalized number of spikes. The CCC for RCS is the closest to the noise curve and has the lowest AUC, followed by LE electrical. Distribution of AUCs across the cell population in retinas, shown in figure 6(b), confirms that RCS responses were the noisiest, followed by the LE electrical responses. By replacing spikes in LE visual responses with randomly timed spikes (see section 2), we can generate a family of CCCs with various NRs ( figure 6(c)). At a certain NR, the noise-injected visual CCC matches that from electrical responses. Compared to the LE visual response with median AUC, the NR was 55.3 ± 22.5% and 78.2 ± 6.5% for LE electrical and RCS, respectively (figure 6(d)). All RCS responses were at least 55% noisier than natural.

Ensemble encoding
To evaluate the efficacy of ensemble encoding of visual information, we simulated the flashing of Landolt-C patterns, and the resultant spiking patterns of RGCs on a retina. The spiking patterns of all included cells on the same piece of retina were used to train a decoder (see section 2). Afterwards, with test trials, we compute the accuracy of decoding the orientation of a displayed Landolt-C.
Accuracy of decoding the orientation of Landolt-C rises with increasing number of recruited cells ( figure 7(b)). Since the cells were ranked by their independent decoding accuracies, the first few cells contributed to the faster rise in accuracy. In addition, beyond the first few, recruited cells started carrying redundant information, which improved accuracy with diminishing returns. Such trend is generally observed in decoding the ensemble of neural signals for applications in brain-machine interfaces [24]. Notably, neither presence of photoreceptors nor stimulation type affected the decoding accuracy significantly, despite the spiking being much more stochastic under electrical stimulation, as discussed previously. Decoding accuracy also rises steeply with increasing C size until it reaches 4-5 pixels ( figure 7(b)), where the accuracy flattens out because the gap in C now exceeds one pixel, and hence it is fully resolved. The asymptotic level of accuracy was determined primarily by the number of recorded cells in the retina. For example, retinas with 49 recorded cells can reach 60%-80% accuracy, while only 50% can be achieved with 19-21 cells ( figure 7(c)).
Increasing the number of stimulus presentations also increased the decoding accuracy ( figure 7(d)). Therefore, to compensate for fewer cells responding in RCS retinas, more presentations are required to accumulate the same amount of information for the image decoding. To achieve 75% accuracy in decoding the orientation of letter C, LE retinas required 2-3 flashes of the image, while the RCS retinas needed 13 presentations.

Discussion
The fact that predictive retinal models perform worse for RCS under electrical stimulation than for LE retina under visual stimulation is not surprising and can be explained by several factors, including the increased spontaneous firing rate in the degenerate retina [25,26], likely due to higher uptake of retinoic acid [27]. There is evidence that even in healthy mice retinas, intrinsic variability of the RGCs response to network-mediated electrical activation is higher than with natural visual stimulation [28]. Replicability and reliability of the network-mediated stimulation also decreases with retinal rewiring during degeneration [29]. In addition, fewer and weaker responding cells in the RCS retina compared to normal could be expected both in-vivo and ex-vivo due to limited penetration depth of electric field into the inner nuclear layer (INL), especially with residual debris separating the degenerate retina from the stimulating array.
Our model predictions for the network-mediated stimulation of RCS retina are worse than in other studies that directly stimulate RGCs, either electrically [22,30] or optogenetically [31]. Direct stimulation of RGCs bypasses many noise sources, such as bipolar and amacrine cells, connecting synapses, and any pathological rewiring of the retina, and therefore each RGC can be more reliably modeled. The tradeoff for this reliability is the loss of inherently preserved retinal signal processing, such as antagonistic center-surround organization [8] and subunit summation [3]. As demonstrated in our ensemble encoding simulation and evidenced in clinical studies [4], it appears that for pattern perception, the ensemble encoding is more relevant than reproducibility of the individual cell responses.
On the other hand, we found that response of the healthy retina to electrical stimulation is much noisier than natural. Also, the number of electrically activated RGCs is lower than natural by a factor of ~3-4. This might be related to the difference in mechanisms of natural and electrical activation of the photoreceptors. In natural vision, due to the rather slow phototransduction cascade, a millisecond flash causes photoreceptor hyperpolarization for tens of milliseconds [32]. Under electrical stimulation, however, membrane potential is affected directly, and therefore it closely follows the electrical pulse duration (<10 ms), much shorter than the natural response. Another factor might be related to the fact that our experiments were performed in the dark. Since the dark-adapted photoreceptors are depolarized, further depolarization of the terminals by electric field is quite limited, effectively restricting the dynamic range. Both factors likely contribute to the lower-thannatural SNR and fewer activated cells when the healthy retina is stimulated electrically.
The cell count is further reduced when we compare across healthy and degenerate retinas, even when both are under electrical stimulation. Within the implant boundary, smaller number of RCS RGCs and their corresponding spotted distribution of RFs may be due to multiple reasons. First, limited penetration depth of electric field has stronger effect on the INL than photoreceptors. Photoreceptor outer segments all reached the implant, allowing electrical access to all cells. However, cell bodies in the INL are staggered in depth, so the deeper bipolar cells might not reach the stimulation threshold. Second, since RCS rat models the retinitis pigmentosa (RP), significant retinal rewiring could have resulted in lessor even non-responsive regions. Retina in AMD patients is unlikely to rewire to the same extent as in the end-stage RP, and hence we can expect a higher cell count and a more complete mosaic in clinical testing. Third, the thinner than normal RCS retina is more prone to mechanical damage during tissue preparation and mounting. Fourth, to avoid mechanical damage, we elected to remove vitreous less vigorously, which may have left some residual vitreous and debris that impeded electrical coupling to the MEA. These last two reasons would cause a lower cell count ex-vivo than in clinical practice. With the SNR >3 selection criterion, it is likely that a fraction of ON-OFF cells was excluded because the ON and OFF responses counteracted and canceled or weaken each other in STA computation. In addition, since we did not further sub-classify cells beyond ON and OFF, it is likely that some direction-selective cells are included, although f or electrical responses, it is unclear whether direction selectivity is preserved.
By construction, under a radially symmetric stimulus, the STA of an RGC is the first-order term in the Wiener kernel series expansion of the cell's response function [33]. Therefore, the LN model can be considered a single-filter approximation, while the CNN model can fit better due to inclusion of multiple linear filters and a better approximation of nonlinearities [34]. Indeed, previous studies have shown that CNN models fit markedly better than the LN model in the salamander retina [19], which we also observed here in the healthy rat retina under visual stimulation. Surprisingly, there is little difference between the two models fitted to LE retina OFF cells stimulated electrically, indicating that the responses were predominantly single-filter ( figure 5(b)). An interpretation is that OFF cell responses can be described nearly completely using only a single RF, which means these cells only respond to a limited subspace of stimuli. Consequentially, these RGCs may fail to respond to certain classes of spatial patterns that require subunit computation, such as null stimuli [35], where the linear filter of a cell is scaled and subtracted away from an otherwise response-inducing white noise stimulus. Computationally, LN is a simpler model that captures lower-order computations, while CNN can learn a better higher-order representation, if it exists. The disparity between LN and CNN models for RCS data suggests that even the degenerate retina retains some degree of higher-order computation. For example, from the response to alternating grating (both ex-vivo and in-vivo), we know that nonlinear summation of subunits occurs also in the degenerate retina [3], but whether the number of computational subunits for each RGC matches that of the healthy retina remains unknown.
It is of note that the correlation we perform here is not conventional. Commonly, when comparing between a model prediction and experimental data, some form of repeated stimulus is required. For example, it could be a 30 s long white noise movie repeated for ten times. The average response is then correlated to the model prediction. However, in our case, we did not have repeated stimuli. As a workaround, we applied a Gaussian filter to broaden the spikes recorded over 6 min of non-repeating white noise. If cells spike with perfect timing and repeatability, the correlation should resemble the conventional method. However, if spike rates and spike times are highly variable, such as RGCs in a degenerate retina, the correlation should be lower than in conventional methods. In addition, without repeated stimuli, we cannot select cells using reliability-based cell selection criteria [22]. Instead, we selected cells with SNR above 3 in the STA time course, which may include noisier cells. Consequently, the absolute correlation value reported here is not a one-to-one comparison with previous studies [22,23]. In ensemble encoding, we made two important assumptions: First, the input strength (w s) was calculated with linear weights extracted from the binary white noise. Since pixels in the stimulus are spatiotemporally independent, the resulting trained weights are generally biased against spatiotemporally correlated stimuli, such as long straight edges and bars, drifting objects, and even natural scenes [36]. Therefore, the current method leads to underestimating the accuracies in the LE retina. It is unknown whether directional sensitivity remains intact in the degenerate retina or how many nonlinear subunits exist under electrical stimulation, so the accuracy curves for the RCS retina in figure 7(b) may or may not be underestimated. Second, the letter C was placed at the same location over the five frames it was displayed. Normally with microsaccades, visual pattern can be displaced by ∼70 μm between the frames presented at 20-33 Hz [37]. Since RFs form a tightly packed mosaic, we assumed translational symmetry in response, i.e. no matter where the stimulus is displayed on the retina, the retinal output will carry equivalent amount of information. This assumption cannot be made if the retina is foveated, as the density of photoreceptors varies with eccentricity. However, since the rat retina is not foveated, we decided to follow this assumption. As a result, the current analysis assumes that the effect of the eye movements on amount of information for pattern identification is well-approximated even without moving the letter C.
Single-cell SNR played little role in ensemble encoding of the visual information. As demonstrated in figure 7(b), all retinas had similar accuracies, even though LE RGCs under visual stimulation had far better SNR. A reason might be that better SNR as calculated might not limit the amount of information propagating downstream, if the encoded visual signals were orthogonal to the major noise eigen-modes [38]. Since visual information is distributed across the retina, the more cells recruited for decoding, the higher is the accuracy. Unlike natural visual response, electrical stimulation affects the bipolar cells stronger if they reside near the electrode surface, and hence fewer RGCs were responding than in natural stimulation. To compensate for the reduced amount of visual information transmitted, the stimulus needs to be replayed multiple times. This may explain the longer time patients require to recognize letters and other patterns in clinical trials [4].
As in Sloan font, the gap in a Landolt-C is 1/5 of the letter size [39]. For letter sizes smaller than 4 pixels, the gap is not fully resolved but encoded in some shade of grey different from the rest of the ring, which led to lower accuracy in identification. Once the gap is fully resolved, i.e. letter size greater than 5 pixels, decoding accuracy remains relatively stable. This signifies that with the pixel size used in these studies (70 μm), the limiting factor in resolution is strictly at the implant level (pixel size), but not biological (subunit size). Essentially, prosthetic vision can resolve spatial features down to the pixel size with stable accuracy, which matches our previous in-vivo measurements [3].
In conclusion, we found that LN and CNN models matched the RGC activity elicited by subretinal electrical stimulation less accurately than that for natural responses, likely due to the weaker than natural response and higher spontaneous firing in the degenerate retina. Despite the noisier signal, visual information is still encoded across the ensemble of cells in the retina, which allows patients to perform visual discrimination tasks, albeit slower due to the reduced number of responding RGCs, compared to natural vision.        Number of filters and their dimensions for each convolution block and for each stimulation type.