Light-Field Microscopy for Optical Imaging of Neuronal Activity: When Model-Based Methods Meet Data-Driven Approaches

Understanding how networks of neurons process information is one of the key challenges in modern neuroscience. A necessary step to achieve this goal is to be able to observe the dynamics of large populations of neurons over a large area of the brain. Light-field microscopy (LFM), a type of scanless microscope, is a particularly attractive candidate for high-speed three-dimensional (3D) imaging. It captures volumetric information in a single snapshot, allowing volumetric imaging at video frame-rates. Specific features of imaging neuronal activity using LFM call for the development of novel machine learning approaches that fully exploit priors embedded in physics and optics models. Signal processing theory and wave-optics theory could play a key role in filling this gap, and contribute to novel computational methods with enhanced interpretability and generalization by integrating model-driven and data-driven approaches. This paper is devoted to a comprehensive survey to state-of-the-art of computational methods for LFM, with a focus on model-based and data-driven approaches.


Introduction
One of the key goals of neuroscience is understanding how networks of neurons in the brain process information. Achieving this goal requires the ability to capture the dynamics of large populations of neurons at high speed and resolution over a large area of the brain. Whilst there are many viable techniques for observing brain activity, optical imaging with fluorescent indicators is a popular strategy to record the activity of neurons owing to the high spatial resolution and potential scalability.
A fluorescent indicator transduces biophysical changes into changes in fluorescence. The most commonly used indicators in functional neuroimaging are those that respond to changes in membrane potential or calcium concentration. Although calcium indicators monitor neuronal membrane potential indirectly, they are most commonly used because of their slow transients and high signal amplitude which make them easier to detect than membrane potential indicators. In particular, due to their slow decay, calcium indicators can be used with diffraction-limited point scanning techniques and single-pixel detectors that collect the emitted fluorescence photons irrespective of the path which they take to reach the detectors, as shown in the leftmost sub-figure in Fig 1. This leads to multi-photon microscopy, an imaging approach robust to scattering and well suited for imaging deep in tissue. However, this approach inherently results in a low temporal resolution due to the use of point scanning techniques, therefore ill-suited to capture fast biological dynamics.
Neurons communicate through electrical impulses lasting around 1 millisecond and repeated at rates up to hundreds of Hertz. Therefore, the study of neural information coding requires high temporal resolution imaging methods. In recent years, we have witnessed substantial progress in fluorescence microscopy imaging in particular due to improved fluorescent indicators of neuronal activity and to the use of alternative scanning strategies for faster acquisition [1,2]. For example, scanning with lines or sheets, instead of points, speeds up acquisition through spatial parallelization (Box 1 and Fig 1) [1]. Furthermore, scanless, whole-volume (called "wide-field") is the most efficient illumination scheme, maximizing photon generation rates and imaging speeds. This speed increase, however, comes at the cost of increased background interference since out-of-focus light appears in the in-focus image, reducing contrast. Interestingly, light-field microscopy (LFM) can exploit this out-offocus fluorescence by reassigning photons back to their correct 3D locations. It becomes a particularly attractive candidate for high-speed three-dimensional (3D) bioimaging.
The original LFM was designed by Levoy et al. [3], where a microlens array was inserted at the native image plane (NIP) of a widefield microscope to capture a four-dimensional (4D) light-field (including 2D lateral position and 2D angular information), as shown in Fig. 2 and Fig. 3. The angular information in turn relays depth information for volumetric reconstruction. In this way, LFM can capture volumetric information of the incident light in a single snapshot, allowing 3D imaging at video frame-rates. The ability of LFM to image neuronal activity over volumetric regions at video frame-rates is generating tremendous excitement in the neuroscience community with plenty of recent breakthroughs including the imaging of whole-brain neuronal activity in some small organisms (e.g. C. elegans, drosophila, larval zebrafish [4][5][6][7]). Despite its advantages in fast and large-scale 3D imaging, the original LFM has several critical limitations that in the beginning hampered its widespread use, such as compromised spatial resolution, presence of reconstruction artifacts, time-consuming reconstruction, low signal-to-noise ratio and image degradation caused by scattering in thick tissues.
In what follows, we review fundamental aspects of LFM, describe in Box 2 the wave-optics model used in this context [8], and point out current challenges and viable solutions in Section 2. Then, we briefly review the latest progress on LFM optical systems in Section 3, and present a thorough survey on state-of-the-art computational methods developed for improving LFM performance in Section 4. We highlight the key aspects of these computational methods and also the potential gains that one can obtain when modelbased priors are incorporated into data-driven approaches. We also discuss some pertinent suggestions on the application of varied machine learning concepts and techniques into LFM for neuroimaging in Section 5. Finally, we conclude in Section 6.

Background
Capturing the 4D light-field information with a single camera sensor introduces inherent trade-offs between spatial resolution and angular resolution because the camera pixels are now encoding 4 dimensions rather than 2. Under the application background of imaging neuronal activity, the reduced spatial resolution makes faithful reconstruction of the monitored volume from light-field data problematic, and this issue may become more severe due to light scattering in deep tissues which cannot be mitigated as in the case of raster-scanning, single-detector microscopes. This makes it more difficult to distinguish and localize neurons deep in the tissue. Moreover, some detailed neural structures, such as axons, dendrites, synapses, may not be clearly visualized from individual views. Accordingly, neural dynamics recorded from these fine neural structures may suffer from low signal-to-noise ratio [4,8,10]. Furthermore, the depth-dependent sampling leads to a non-uniform spatial resolution in the reconstructed volumes and can cause squareshaped artifacts at regions near the native object plane (NOP) due to coarse sampling. Moreover, the forward process projects a 3D object domain onto the 2D camera plane through convolution with a high-dimensional PSF that varies spatially in both lateral and axial dimensions. Traditional model-based volumetric reconstruction methods often exploit iterative deconvolution to solve this highly complex inverse problem, which aggravates the computational cost.
Facing these challenges, two parallel viable solutions can be exploited independently or jointly to address the above issues: (1) to improve the acquisition by designing alternative optical systems, or (2) to improve the reconstruction by designing advanced computational algorithms.
With advanced optical systems design, LFM is endowed with desirable properties. For example, it is able to produce diverse sampling patterns by recoding the 4D information at the pupil plane instead of the native imaging plane, which improves information acquisition performance. This can be combined with refined selective illumination modalities, such as light-sheet excitation, to provide optical sectioning. These new features contribute to Song  eliminating interference from background caused by out-of-focus light and scattering, thus increasing the signal-to-noise ratio, spatial resolution and volume coverage, which further fulfills the potential of LFM in imaging whole-brain neuronal activity on freely behaving organisms.
Aside from improving LFM optical systems, the other alternative is to improve postprocessing performance by developing advanced computational methods capable of better exploiting angular information. Furthermore, proper learned or model-based priors can be incorporated in these methods. At present, we are witnessing an emerging trend that an increasing amount of effort is devoted to developing more efficient and effective computational algorithms for fast volumetric reconstruction, neuron localization and neuronal activity demixing using light-field data. These methods can be generally categorized into two classes: model-based approaches and data-driven approaches. The model-based category attempts to enhance the reconstruction by incorporating additional priors, such as smoothness, spatial and temporal sparsity, low-rank, depth-related shearing property and shift-invariance property in phase-space. In contrast, the data-driven a.k.a. learning-based category capitalizes on machine learning to pave the way to advanced computational LFM. In particular, more researchers are using deep learning approaches to achieve unprecedented performance. Also, the flexibility of the microscope system opens up the possibility for the creative combination of model-based methods with data-driven methods to improve the interpretability and credibility of learned models.

Alternative Optical Systems for LFM
Since the first design of LFM proposed in [3], there has been steady technological improvement on the optical system. A variety of advanced LFM systems have emerged to optimize the light-field recording [6,[11][12][13]. These new designs are able to produce diverse sampling patterns to improve acquisition performance [6,11], or able to introduce refined illumination strategies to provide optical sectioning [12,13], or able to combine both benefits [14].
Representative designs that exploit the insight of diverse sampling include eXtended field of view light-field microscopy (XLFM) [6] and Fourier light-field microscopy [11], see also Fig. 4 (a). A common feature of these designs is that a (customized) microlens array is placed at the rear pupil plane of the imaging objective, instead of the NIP as in the original LFM which makes the position of the microlens array conjugated to the rear pupil plane. Such designs, in ideal conditions, can measure 2D spatially invariant point spread functions (PSF) which produce diverse sampling patterns and thereby avoid square-shaped artifacts near the focal plane.
Other works that leverage the insight of refined illumination include hybrid light-sheet and light-field microscopy (LSLFM) [12] (see Fig. 4 (b)) and confocal light-field microscopy [13]. Here, instead of using wide-field illumination, LSLFM [12] uses a scanning light-sheet for excitation and a microlens array for light-field imaging. Such design simplifies the detection by limiting the illumination to the volume of interest. Based on the same optical design as XLFM, i.e. placing a microlens array at the objective's conjugate pupil plane for more diverse sampling, confocal LFM [13] improves the illumination by shaping the excitation laser beam into a plane, which selectively illuminates an axial plane (x-z plane). This enables background-free, fast, volumetric imaging of neural dynamics deep in mouse and zebrafish brains.
Isotropic spatial resolution light-field microscopy (Iso-LFM) [14] proposes a dualview light-field imaging system to combine benefits of both refined illumination and diverse sampling. In particular, Iso-LFM [14] combines selective-volume illumination with simultaneous acquisition of orthogonal (perpendicular) light-fields to yield 3D images with high, isotropic spatial resolution and exhibits a significant reduction of reconstruction artifacts, thereby overcoming some current limitations of light-field microscopy implementations (see also Fig. 4 (c)). The selective-volume illumination is implemented by confining the excitation light spatially to the volume of interest, which optimizes signal-to-background contrast and minimizes erroneous reconstruction artifacts from out-of-volume emitters. Then, Iso-LFM detects the emitted fluorescence via two identical objectives placed perpendicular to each other and orthogonal to the illumination objective. This design provides a dual-view capability that enables dual-view data fusion and deconvolution of the simultaneously acquired light-fields. This configuration achieves effectively an isotropic spatial resolution of 2 um, and also substantially reduces the presence of image planes containing reconstruction artifacts (so-called artifact planes).

Novel Computational Methods for LFM
Advanced optical hardware systems could introduce extra system complexity. Furthermore, changes to the optical system do not in themselves address the computational issues related to reconstructing volumes from light-fields. Therefore, to mitigate or overcome aforementioned issues, an increasing amount of research is devoted to advanced computational post-processing strategies. At present, we are witnessing an emerging trend of more efficient and effective computational algorithms being developed for fast volumetric reconstruction, neuron localization and signal demixing.
In particular, some of these methods fall into the model-based category which advances the light-field model by incorporating additional priors, such as smoothness, non-negativity, spatial and temporal sparsity, low-rank, phasespace priors, etc. [4,7,9,[15][16][17][18][19][20][21]. In contrast, others fall into the data-driven category which exploits machine learning to achieve more efficient computational solutions. In particular, by virtue of the superior modelling capability of deep neural networks in approximating highly-complex functions effectively, more and more algorithms based on deep learning are being proposed for fast and robust volumetric reconstruction and neuron localization [22][23][24][25][26]. These cutting-edge computational approaches have opened new avenues to push the limits of LFM.

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts (R-L) 3D deconvolution algorithm for volumetric reconstruction from LFM images, which was further refined in [4].
However, as 3D deconvolution with light-field images is an ill-posed tomographic inverse problem [4], wherein multiple different perspectives of a 3D volume are involved, the iterative R-L deconvolution method has high computational complexity and also tends to generate reconstruction artifacts, especially near the native focal plane. The underlying reason for these artifacts lies in the depth-dependent sampling of LFM; that is, LFM captures a different amount of information at different depths, which is reflected in the PSFs. In particular, the PSF corresponding to the in-focus plane (depth=0, at NOP) is highly redundant, implying very low angular resolution, and this leads to square-like artifacts during R-L deconvolution. To overcome these limitations, one feasible solution is to impose additional priors into the light-field model to constrain the reconstructed volume to lay in a more appropriate space. A series of studies described below have attempted to improve the efficiency and performance of volumetric reconstruction by incorporating advanced priors into the model.
Stefanoiu et al. [9] propose an aliasing-aware deconvolution method for artifact-free 3D reconstruction which employs depth-dependent anti-aliasing filters to remove artifacts from reconstructed volumes in each R-L iteration. Such iterative "deconvolution + anti-aliasing filtering" operations lead to the iterative aliasing-aware deconvolution, which enforces smooth priors in a smoothing expectation-maximization scheme.
Furthermore, Verinaz et al. [15] incorporate more elaborated priors into the model to enforce more faithful solutions. To address the slow speed issue of the original R-L deconvolution, they also propose a method to simplify the light-field forward model which significantly reduces the computational complexity. Based on the simplified light-field model and elaborated priors, an Alternating Direction Method of Multipliers (ADMM) optimization strategy is developed to solve the 3D deconvolution problem and to achieve artifact-free volumetric reconstruction.

Spatio-temporal factorization models-In certain scenarios, volumetric
reconstruction is not the ultimate goal, as it may be of more interest to localize target neurons and extract functional activity. To identify target neurons and their activity, Nobauer et al. [7] proposed Seeded Iterative Demixing (SID) LFM which performs localization by adding a segmentation operation to the deconvolved volume. The identified individual neurons result in light-field footprints that aid the subsequent spatio-temporal demixing. Specifically, SID is an iterative source-extraction procedure for scattered LFM data that seeds inference with information obtained from remnant ballistic light. In this process, instead of frame-by-frame reconstruction of LFM images, SID achieves neuron localization and neuronal activity demixing by performing non-negative matrix factorization on scattered spatio-temporal (functional) LFM data. The key concepts of this method is illustrated pictorially in Fig. 5.
Considering that sparsity priors can improve reconstruction performance, spatial resolution and SNR, Yoon et al. [16] propose a sparse decomposition LFM. This strategy converts the inherent temporal sparsity of neuronal activity into spatial sparsity of 2D images to achieve the level of resolution expected for sparse samples in densely packed samples. In particular, [16] decomposes the light-field time series into a low-rank non-negative component that corresponds to the static part, e.g. background, and a sparse non-negative component that corresponds to the neuronal activity. After the decomposition, high-resolution volume reconstruction can be achieved by applying Richardson-Lucy deconvolution with sparsity regularization to the sparse component rather than the raw LF images.

4.1.3
Phase-space models-Phase-space, a.k.a. spatial-angle space or epi-polar plane image (EPI) space [18,27], is an appropriate space to reveal the structure of light-field and this can be used to directly localize neurons and to obtain their functional activity. An example of EPI obtained from light-field microscopy images is depicted in Fig. 2. One particular useful property in phase-space is the depth-related shearing principle that each point in the volume traces out a tilted line in phase-space and the slope of this line is proportional to the depth. Such a property can also be explained using the phase-space Wigner function which states that the light propagation in space can be easily represented by a simple shearing operation in phase-space [18].
Compressive LFM: Based on these principles, Liu et al. [19] demonstrates the use of phase-space imaging for 3D localization of multiple point sources inside scattering material. Pegard et al. [20] further extend [19] by presenting a compressive light-field microscopy method which takes advantage of spatial and temporal sparsity of fluorescence signals to identify and localize each neuron in a 3D volume, with scattering and aberration effects naturally included and without ever reconstructing the volume. Specifically, non-negative matrix factorization is first used to sparsify the raw LFM video data. Once the training data is sparse, each frame can be separated into single sources using the phase-space sparse coding approach [19]. This leads to a "footprint" dictionary that is composed of each neuron's light-field signature. Then, each new LFM frame is decomposed as a linear, positive superposition of elements of the footprint dictionary. The coefficients of the sparse decomposition are a quantitative measure of functional fluorescence signals (in this case calcium transients), corresponding to the magnitude of the neuronal activity recorded.
Convolutional sparse coding LFM: Along this line, Song et al. [21] propose a convolutional sparse coding LFM model, as shown in Fig 6. Different from the compressive LFM [19,20], this work capitalizes on the shift-invariance property in the phase-space to perform convolutional sparse coding (CSC) on EPIs with respect to a depth-related EPI dictionary. The selected dictionary elements and corresponding sparse coefficients help to identify and localize neurons. The phase-space shift-invariance property brings two benefits: 1) It allows the convolution of a large input EPI with relatively small elements from an EPI dictionary, thus reducing the computational complexity; 2) It allows to only consider the depth range when synthesizing dictionary elements because the transverse shift can be revealed by a convolution operation. This reduces the dictionary size. More realistic and faithful dictionary design contributes to enhanced sparse coding performance, leading to improved robustness to light scattering and 3D localization accuracy.

Data-driven computational approaches
In contrast to model-based methods, data-driven approaches attempt to automatically learn models from training data and incorporate discovered information into models, without requiring prior knowledge. This provides an appealing algorithmic alternative to overcome the shortcomings of model-based methods. Among data-driven approaches, deep neural networks, a.k.a. deep learning has attracted considerable attention as they provide unprecedented performance and efficiency in a variety of real-world signal and image processing tasks, including image de-noising, de-blurring, super-resolution, etc. To date, an explosive amount of effort is being invested into applying deep learning to the neuroimaging domain [22][23][24][25][26]. Multiple algorithms have recently been proposed to replace iterative 3D deconvolution by a deep neural network (e.g. [28]) to improve reconstruction speed and quality. We first introduce some representative purely data-driven approaches and then present some interpretable model-inspired data-driven approaches for LFM.

VCD-Net:
Wang et al. [23] adapted U-net to achieve real-time and artifact-free volumetric reconstruction with uniform spatial resolution, referred to as view-channel-depth network (VCD-Net). Different from [22] which performs R-L deconvolution on raw LFM images to obtain a low-resolution volume and then trains a 3D U-net for super-resolution, VCD-Net does not require the R-L deconvolution step. Instead, it extracts multiple views from raw light-field images and then directly maps them to the high-resolution volume (i.e. 3D image stack) through cascaded convolutional layers. To train the network, synthetic LFM images are generated by applying the forward model in [8] to ground truth high-resolution volumetric data acquired beforehand. In particular, the data preparation procedures include obtaining a number of high-resolution 3D volumes of stationary samples using synthetic or experimental methods and then performing light-field projection. Light-field projection converts these high-resolution 3D images into 2D light-field images which are then paired with ground truth data for network training. When the data is ready, VCD-Net is trained on the training dataset to transform input raw light-field back to 3D volumes, which would be compared with the high-resolution ground truth to guide optimization of the network.

LFM-Net:
A similar concept was adopted in [29] to reconstruct high-quality confocal image stacks from LFM images using U-net which is trained on aligned pairs of LFM images and confocal image stacks to learn the directmapping between them. Different from VCD-Net [23], LFM-Net is trained on real LFM images rather than synthetic ones, and it does not convert raw 2D LFM images to a sequence of views in advance. During inference, LFM-Net achieves fast (10 frames per second, mostly delayed by the camera's exposure time of 100ms) and high quality (close to confocal imaging under similar optical settings) 3D volumetric reconstruction.

HyLFM-Net:
Although deep learning based-methods empirically demonstrate excellent image reconstruction performance and good generalization ability, no theoretical guarantees on generalization can be given. Out of this concern, Wagner et al. [24] suggest to extensively validate and, if needed, to retrain the network for each experimental setting. To this end, they developed a hybrid light-field light-sheet microscope, shown in Fig. 7 where the light-field Europe PMC Funders Author Manuscripts data is used as the input and the aligned high-resolution light-sheet data as ground truth to train the deep network. Moreover, when applying the trained network to new light-field data, new light-sheet data can be continuously acquired and this allows for fine-tuning or for retraining of the network on-the-fly if an inconsistency is found during continuous validation. Such a continuous validation mechanism serves as a dynamic calibration to ensure good generalization of the deep network. This is realized by adding a simultaneous, selective-plane illumination microscopy modality (e.g. light-sheet microscope) into the LFM setup which continuously produces high-resolution ground truth images of single planes for validation, training or refinement of the CNN. The training can thus be performed both on static sample volumes and dynamically from a single plane that sweeps through the volume during 3D image acquisition.

Model-inspired data-driven approaches-
Despite the good performance of modern deep networks in a variety of tasks, such purely data-driven approaches are also subject to some limitations [30]. Apart from challenges that commonly appear in inverse bio-imaging problems, like imperfect knowledge of the forward model and lack of ground truth data, deep learning also has its own specific limitations. For example, generic deep networks are often empirically designed and typically adopt a hierarchical architecture composed of many layers and parameters. Although such designs endow deep networks tremendous capability of modelling obscure, even unknown physical systems, the network structures lack interpretability and involve excessive trainable parameters which may cause overfitting, degraded robustness and generalizability. Moreover, LFM for imaging neuronal activity has specific features, for example, the inherent spatial and temporal sparsity of fluorescence neuronal signals. This calls for development of novel deep learning methods that are able to fully exploit priors embedded in physical models and in the application considered.

CISTA-net LFM:
Song et al. [26] proposed a model-inspired deep learning approach to perform fast and robust 3D localization of neurons using light-field microscopy images. This is achieved by developing a deep network that efficiently solves a convolutional sparse coding (CSC) problem (6) to map an EPI to corresponding sparse codes that are associated with the 3D locations of target neurons, as shown in Fig. 8:  with M for the number of depths to be covered, ‖ · ‖ 2 is the l 2 norm and ‖ · ‖ 1 is the l 1 norm. The network architecture is designed systematically by unrolling the Convolutional Iterative Shrinkage and Thresholding Algorithm (CISTA) so that each iteration step of CISTA gives one layer of the network. The complete network is formed by concatenating multiple such layers. Therefore, forward passing through the network is equivalent to executing the CISTA algorithm a finite number of times. Note that the network contains domain specific layers which explicitly exploit physical priors, while the parameters are learned from a training dataset. Such design enables the network to leverage both domain knowledge implied in the model, as well as new knowledge learned from the data, thereby combining advantages of model-based and learning-based methods.

LISTA-net LFM:
Verinaz et al. [25] proposed a novel deep network that combines the light-field wave-optics model with Learned Iterative Shrinkage-Thresholding Algorithm (LISTA), a well-known deep unrolling/unfolding network designed for efficient sparse coding, as shown in Fig. 9 (a). Their analysis shows that when a LFM image is rearranged into a group of views presented as a 3D array, the LFM physics model can be conveniently approximated by a linear convolutional network. Accordingly, they modified LISTA to take a group of views as the input and to generate a high-resolution volume. At each unfolded iteration, the forward model and its transpose are computed repetitively to obtain a 3D volume at the output from the original light-field. It provides an effective way to incorporate physics knowledge into the network architecture, as well as improve its interpretability. Furthermore, inspired by Wasserstein Generative Adversarial Networks (WGANs), an adversarial training strategy is adopted, which makes it possible to train the network under realistic conditions such as lack of labelled data and noisy measurements, as only unlabelled data is needed to compute a properly designed adversarial loss. Fig. 9 (b) shows reconstructed neurons compared with ISRA, an improved version of R-L deconvolution method.

Ground truth for data-driven approaches
When applying data-driven approaches to map light-field images to high-resolution 3D volumes, the ground truth can be given by other high-resolution imaging modalities including but not limited to confocal, light-sheet and multi-photon microscopy images [24,29]. However, due to high cost of obtaining ground truth, it is common that data-driven methods lack labelled data. This issue makes it challenging to directly exploit supervised learning methods for training. It may also cause overfitting and reduced generalization capabilities. To mitigate the impact of lack of ground truth data, other learning techniques and strategies, such as transfer learning, adversarial training, semi-supervised learning, unsupervised learning, can be adopted to develop feasible solutions. For example, one can use well-founded optic models to generate a large amount of synthetic data to train data-driven models [23,26]. The trained models can then serve as a good initialization for further fine-tuning on sparsely labelled real data. Alternatively, domain adaptation [31][32][33] can be exploited to learn domain invariant feature representation via minimizing a distance metric, e.g. maximum mean discrepancy (MMD), or minimizing an adversarial loss between the source (synthetic) and target (real) distributions. In this way, a model trained on the source labelled data can then be directly applied to the target domain.

Efficient data processing and model training
Due to video-rate imaging speed, LFM can generate a large amount of data, which brings a severe challenge for real-time processing and analysing of neural dynamics. Fortunately, relying on much higher inference efficiency than iterative approaches, deep learning models are a promising avenue for real-time post-processing and analysis, such as volumetric reconstruction, 3D localization, neural activity demixing [23][24][25][26]29].
To train a compact model efficiently, knowledge distillation [34][35][36] concepts can be tailored for LFM based neuroimaging. Taking the depth localization of neurons using LFM for example, this problem can be formulated as a multi-class, multi-label classification problem after converting an original light-field into the Epipolar Plane Image (EPI), a particular spatio-angular feature in the phase-space [26]. Different from common classification tasks, this task has a specific feature that the hard-labels are well-structured. In particular, for an arbitrary class label, its adjacent/neighbouring class labels exhibit high correlation and coherence. This leads to a Gaussian-shape group structure that indicates the inter-class relationships, similar to the class probabilities represented by the softened teacher model's logits. Based on this observation, [26] proposes to exploit the specific prior knowledge and directly construct the soft-labels (i.e. class probabilities) by performing a convolution between the hard-labels (i.e. ground truth) and a Gaussian kernel followed by normalization.
In this way, the Gaussian kernel plays the role of a "temperature" scaling function as in knowledge distillation. It effectively smooths out the probability distribution to reveal inter-class relationships learned by human experts serving as a teacher model for the specific task. A larger kernel width corresponds to a higher "temperature" that allows the model to pay more attention to the inter-class correlations. Accordingly, the soft-labels are used to compute the loss function, more specifically, the distillation loss function. Soft-labels also bring some training benefits as incorporated inter-class relationships bring more guidance information to accelerate the training, as well as to enforce group sparsity into the network's prediction.

Conclusion
In this article, we provided an extensive review to the latest progress on LFM for imaging neuronal activity. Advanced optical systems focus on improving the information acquisition performance via enhanced sampling diversity and refined illumination strategies, while advanced computational algorithms attempt to improve the post-processing performance via establishing more powerful models equipped with predefined or learned priors. Both strategies and their combinations promise to provide an unprecedented capability for imaging neuronal activity at a high resolution and speed across large volumes. We envision that more creative applications of deep learning and relevant data-driven approaches are being conceived and implemented for light-field microscopy. Along with computational algorithm becoming increasingly essential component of the post-processing, we will witness more uncovered potential and widespread usage of LFM.

Box 1
Sequential versus parallel imaging modalities.

Fig. 1.
Illustration of imaging modalities categorized based on the acquisition mode. In a sequential acquisition, as shown in the first sub-figure, a focal spot is scanned across one or more dimensions in space to cover the entire volume. Consequently, the location of the signal is determined by the instantaneous position of the excitation beam, and a point detector can be used to collect emitted fluorescence photons irrespective of the path with which they reach the detector. Such imaging modalities confer robustness to light scattering and are therefore well-suited for deep tissue imaging. This comes at the cost of a reduced temporal resolution. In parallel acquisition modes, as shown in other three sub-figures, some or all voxels are recorded simultaneously on a camera sensor array, which enables higher volume rates. In particular, light-sheet imaging simultaneously scans the light-sheet excitation plane to quickly build up a volume plane by plane. In this way, it enables to image the whole volume plane by plane. In contrast, light-field imaging enables simultaneous recording of all voxels within a volume. Fluorescence generated throughout the volume is captured through a microlens array to encode both position and angular information simultaneously. However, unlike point-scanning modalities, parallel acquisition modes are inherently vulnerable to scatter-induced crosstalk between neighbouring camera pixels. See also [1] for description of different imaging modalities.

Box 2
Light-field microscopy and its wave-optics model The microlens-based light-field imaging system in Fig. 2 aims to transform the light-field from the world space into the image space of the main lens, thereby sampling the light-field at the sensor plane. Each lenslet with its underlying group of pixels forms an in-camera sampling scheme, analogous to a tiny camera with very few pixels, that observes the in-camera light-field. The diffraction pattern generated by an ideal point source when propagated through an optical system is the system's impulse response function, commonly referred to as the point spread function (PSF). A wave-optics forward model has been developed in [8] Formulation (1)  λf MLA x 2 2 )) . Alternatively, for arbitrary distances between the MLA and the sensor, a more accurate Rayleigh-Sommerfeld diffraction solution can be used to predict h(x, p) [9].
The light-field PSF has a complex and translation-variant pattern which depends on the specific 3D positions of the point source. Thus, the image formation cannot be modeled as a convolution of a scene with a corresponding PSF, as is commonly done in conventional image formation models. Instead, the wavefront recorded at the sensor plane is described using a more general linear superposition integral (4) and a corresponding discretized version (5).
f(x) = ∫ + ℎ(x, p) 2 g(p)dp,  (5) where f(x) and f denote the continuous and discrete 2D intensity pattern at the sensor plane. p ∈ ℝ 3 is the position in a volume containing isotropic emitters whose combined intensities are distributed according to g(p).   (a) The Fourier LFM places a microlens array at the rear pupil plane of the objective, instead of the NIP. (b) LSLFM [12] uses a scanning light-sheet for excitation and a microlens array for light-field imaging. (c) Isotropic spatial resolution light-field microscopy (Iso-LFM) [14] is a dual-view light-field imaging system that combines selective-volume illumination with simultaneous acquisition of orthogonal (perpendicular) light-fields.    Convolutional sparse coding LFM capitalizes on the shift-invariance property in the phasespace to perform convolutional sparse coding (CSC) on EPIs with respect to a depth-related EPI dictionary. The selected dictionary elements and corresponding sparse coefficients help to identify and localize neurons. Reproduced from [21]. Song   Purely data-driven approaches, e.g. deep learning for light-field imaging, usually map a light-field image or its multi-view images to a volume after training on LFM images with another high-resolution image modality such as light-sheet microscopy images used as the ground truth. Redrawn from HyLFM-Net [24]. Song  (a) The architecture of the network is designed systematically by unrolling the convolutional Iterative Shrinkage and Thresholding Algorithm (CISTA) to efficiently solve a convolutional sparse coding (CSC) problem. The goal is to map an Epipolar Plane Image (EPI) to corresponding sparse codes that are associated with the 3D locations of target neurons. (b) The knowledge distillation concepts are tailored for the task to train a compact model efficiently. In particular, after formulating the localization problem as a classification problem in the phase-space, the prior knowledge such as Gaussian-shape group structure and inter-class relationships are effectively incorporated into constructed soft-labels to achieve knowledge distillation from experts to the model. (c) Left: LFM images of a neuron at 3 different depths (-15, 0, +15 um) and corresponding EPIs without background Europe PMC Funders Author Manuscripts (removed using matrix factorization). Right: Comparing sparse coding and depth detection performance of Convolutional sparse coding LFM [21] and CISTA-net LFM [26]. It shows that data-driven approach [26] obtains sparse codes with higher quality and this leads to more accurate localization result. Fig. 9. A model-inspired data-driven approach -LISTA-net LFM [25] for volumetric reconstruction using LFM data. LISTA-net LFM combines deep learning with light-field waveoptics model to achieve fast high-resolution 3D reconstruction using LFM data.
(a) The architecture of the network is designed by unrolling Iterative Shrinkage-Thresholding Algorithm (LISTA). An adversarial training strategy is exploited to train the network with unlabelled noisy measurements for unsupervised learning. (b) Reconstructed neurons from a LFM image using the improved R-L deconvolution method ISRA [4] (top) and data-driven approach -LISTA-net LFM [25] (bottom). It shows that the reconstruction from [25] reveals details such as dendrites and axons more clearly and with less artifacts, as well as suppresses blurring and scattering at background.