Deep learning can accelerate and quantify simulated localized correlated spectroscopy

Nuclear magnetic resonance spectroscopy (MRS) allows for the determination of atomic structures and concentrations of different chemicals in a biochemical sample of interest. MRS is used in vivo clinically to aid in the diagnosis of several pathologies that affect metabolic pathways in the body. Typically, this experiment produces a one dimensional (1D) 1H spectrum containing several peaks that are well associated with biochemicals, or metabolites. However, since many of these peaks overlap, distinguishing chemicals with similar atomic structures becomes much more challenging. One technique capable of overcoming this issue is the localized correlated spectroscopy (L-COSY) experiment, which acquires a second spectral dimension and spreads overlapping signal across this second dimension. Unfortunately, the acquisition of a two dimensional (2D) spectroscopy experiment is extremely time consuming. Furthermore, quantitation of a 2D spectrum is more complex. Recently, artificial intelligence has emerged in the field of medicine as a powerful force capable of diagnosing disease, aiding in treatment, and even predicting treatment outcome. In this study, we utilize deep learning to: (1) accelerate the L-COSY experiment and (2) quantify L-COSY spectra. All training and testing samples were produced using simulated metabolite spectra for chemicals found in the human body. We demonstrate that our deep learning model greatly outperforms compressed sensing based reconstruction of L-COSY spectra at higher acceleration factors. Specifically, at four-fold acceleration, our method has less than 5% normalized mean squared error, whereas compressed sensing yields 20% normalized mean squared error. We also show that at low SNR (25% noise compared to maximum signal), our deep learning model has less than 8% normalized mean squared error for quantitation of L-COSY spectra. These pilot simulation results appear promising and may help improve the efficiency and accuracy of L-COSY experiments in the future.


Introduction
Magnetic resonance imaging (MRI) is a popular imaging modality capable of providing valuable anatomical and functional information in vivo. By utilizing a strong magnetic field and radio-frequency (RF) waves, MRI successfully images hydrogen atoms in their local chemical environment, allowing for useful soft tissue contrast. One technique that allows for the metabolic investigation of different tissues is the magnetic resonance spectroscopy (MRS) method. In particular, single-voxel 1 H MRS is capable of providing biochemical information from a volume of interest (VOI) in the human body 1 . MRS provides a 1 H spectrum rich with peaks representative of various chemicals. Furthermore, this spectrum can be quantified by using a spectral fitting algorithm 2-6 to yield chemical, or metabolite, concentrations. MRS, and more specifically the point resolved spectroscopy (PRESS) experiment, has been used to explore pathologies affecting the brain 7 , prostate 8 , liver 9 , breast 10 , as well as other sites, and is often used in combination with other imaging studies to discern how metabolic alterations in tissues correlate with anatomical abnormalities.
Unfortunately, one-dimensional (1D) spectroscopy techniques such as PRESS have a disadvantage when it comes to quantifying overlapping metabolite spectral signals. Since many metabolites are found in the body at very low concentrations, separating these signals from more dominant spectral peaks becomes very challenging. For this reason, several approaches have been developed to better quantify these lower concentrated metabolites, including J-editing techniques [11][12][13][14] and two-dimensional (2D) spectral acquisitions [15][16][17][18][19] . In particular, 2D MRS offers the advantage of quantifying all metabolite signals in a single scan at the expense of increasing acquisition time. A typical 2D MRS experiment includes a time increment, t 1 , in the pulse sequence to acquire data from the indirect temporal dimension. Combined with the acquisition of the direct temporal dimension, t 2 , a 2D spectrum, S(F 2 ,F 1 ), can be acquired by Fourier transforming the 2D temporal data, s(t 2 ,t 1 ).

Figure 1. Two proposed implementations for the D-UNet architecture are shown. A) A non-uniformly sampled L-COSY experiment is
reconstructed into the fully sampled spectrum. While under-sampling can be performed in both the t 2 and t 1 dimensions, this study analyzes the reconstruction of L-COSY spectra acquired using non-uniform sampling along only the t 1 dimension. B) Several metabolite spectra are identified from a fully sampled L-COSY spectrum using a D-UNet model. The intensities of the metabolite spectra directly correlate to concentration values, and therefore this study also investigates the potential application of deep learning to quantify L-COSY spectra. In total, 17 metabolites were quantified in this simulation study.
One popular 2D MRS technique is the localized correlated spectroscopy (L-COSY) experiment 18 . This experiment acquires data by using a 90 • -180 • -t 1 -90 • -t 2 sequence and yields several cross-peaks which can be used to identify and quantify overlapping resonances. However, there are two main limitations of the L-COSY technique. First, due to the t 1 increment necessary to obtain the indirect dimension, the L-COSY scan time is very long. Second, because of the nature of an additional dimension, spectral fitting becomes more complex and therefore less ideal quantitation techniques such as peak integrals are often used. Several methods have been proposed to overcome these two challenges to improve L-COSY, including non-uniform sampling with reconstruction 20 and 2D spectral fitting using prior-knowledge 21,22 .
Recently, deep learning and artificial intelligence have become more prominent in the medical field and radiology [23][24][25][26] . These methods are often used for segmenting medical images, aiding with diagnosis, and verifying image quality. One popular deep learning architecture is the UNet 26 , which is a fully convolutional network 27 capable of image-to-image domain mapping. While UNet is often used for segmentation purposes, our group has recently demonstrated that a novel UNet architecture, the densely connected U-Net (D-UNet) [28][29][30] , is capable of reconstructing super-resolution spectroscopic images. In this study, we demonstrate that the D-UNet architecture can be used to: 1) reconstruct non-uniformly sampled (NUS) L-COSY acquisitions and 2) quantify fully sampled L-COSY spectra accurately. The D-UNet models were trained and evaluated using simulated L-COSY data. The first type of D-UNet model was trained to reconstruct NUS L-COSY. This reconstruction method was quantitatively compared to compressed sensing ( 1 -norm) reconstruction 31 . The second type of D-UNet model was trained to quantify seventeen metabolites from a simulated fully sampled L-COSY spectrum. All reconstruction results were compared to the actual simulations to evaluate the errors of the reconstructions both qualitatively and quantitatively.

Methods
As shown in Figure 1, the goal of this study was to perform two distinct tasks using the D-UNet architecture: 1) reconstruct NUS L-COSY spectra and 2) quantify L-COSY spectra. While each task used different data for training the models and testing the results, the initial simulation process to synthesize L-COSY spectra was identical for both applications.
Then, L-COSY spectra were randomly generated by modifying the original metabolite simulations, also referred to as the basis set. Each metabolite in the basis set (B m ) was first line broadened in both the direct and indirect temporal dimensions using an exponential filter and a random phase was applied to the basis metabolite signal as well: Above, B lb,m is the new line-broadened metabolite, φ r is a random angle between 0 and 2π, e −r 1,m is an exponential filter applied to the t 1 domain, and e −r 2,m is an exponential filter applied to the t 2 domain. Each metabolite was allowed to have separate line-broadening terms. The factors e −r 2,m and e −r 1,m resulted in effective line-broadenings of 5-25Hz and 0-15Hz, respectively, and were implemented in this fashion to mimic the range of common T 2 values in vivo.
Next, the individual metabolites were combined linearly using random concentration values to produce an initial L-COSY spectrum, s init : In equation 2, r 3,m is a random concentration value between 0 and 10, and is representative of the concentration value in mmol. The final L-COSY spectrum, s f , was created by adding noise to s init . The noise level could vary drastically from 0% to 25% of the maximum metabolite signal.

Non-uniform Sampling and Reconstruction
Non-uniform sampling was performed on the final s f matrix along the t 1 dimension utilizing an exponential probability density function [34][35][36] . This NUS scheme emphasized sampling earlier t 1 points more due to the fact that these points have less T 2 decay (more signal). The last t 1 point was sampled for all of the NUS schemes. The three sampling masks used in this study are displayed in Figure 2. A t 1 point was sampled if the value in the mask was 1, and it was not sampled if the value in the mask was 0. The number of points sampled for each mask were 75, 50, and 25 resulting in a scan acceleration factor of 1.3x, 2x, and 4x, respectively.
Aside from the D-UNet reconstruction of NUS data described below, data were also reconstructed using compressed sensing reconstruction 31 . The 1 -norm minimization reconstruction was performed by solving the following optimization problem: Equation 3 is the general formulation for compressed sensing reconstruction. u is the reconstructed data in the (F 2 ,F 1 ) spectral domain, M is the sampling mask along the t 1 domain, F is the 2D Fourier transformation, f is the NUS data in the (t 2 ,t 1 ) temporal domain, and σ 2 is the estimate of the noise variance. The noise variance was estimated from a noisy region of the spectrum, as previously described 35,[37][38][39] .

Reconstructing NUS L-COSY with D-UNet
The densely connected UNet architecture utilized in this study was very similar to a previously reported model 29 , and the general architecture can be seen in figure 3. This model utilized the generic UNet architecture, which operates by learning important global and local features using a variety of convolutional layers. The first half of the UNet continuously uses convolutional and max pooling layers, and these layers help reduce the input matrix size. By reducing the size, the network learns the primary global features of the input images. The second half of the UNet uses deconvolutional and up-pooling layers, which restore the matrix size. This process helps learn local features that are vital to restoring the images on a finer scale. The 3/17 Figure 2. A ground truth simulated L-COSY spectrum is shown (top). Sampling schemes were applied to the simulated spectrum using the sampling masks shown in the 1 st column. These masks sampled 25, 50, and 75 t 1 points out of a total 100 t 1 points to yield 4x, 2x, and 1.3x acceleration factors, respectively. The 2 nd column shows the under-sampled spectra in the (F 2 ,F 1 ) domain and the 3 rd column shows the spectra reconstructed using a D-UNet model. Errors for each reconstruction are displayed as difference maps in the final column. architecture also leveraged densely connected convolutional layers, which aid in carrying important features throughout the learning process. All convolutional layers used a kernel size = 3 x 3, stride = 1, and a rectified linear unit (ReLU) activation function 40 .

4/17
The D-UNet model used for reconstructing the NUS L-COSY data was designed to take an NUS L-COSY spectrum as input and produce a reconstructed L-COSY spectrum as output. The NUS L-COSY data was produced by multiplying s f by the sampling mask in the (F 2 ,t 1 ) domain and then transforming this matrix back into the (F 2 ,F 1 ) domain. The output was simply the s f matrix without noise in the (F 2 ,F 1 ) domain. Both the input and output matrix sizes were 512 x 32, and corresponded to spectral ranges of 0.5-4.5ppm in the direct spectral dimension (F 2 ), and 1.2-4.3ppm in the indirect spectral dimension (F 1 ). Additionally, the inputs and outputs were inserted as three different channels into the network with each channel representing the real, imaginary, and magnitude information of the spectrum. Finally, all inputs and outputs were normalized to be in between values of 0 and 1, and were normalized based on the maximum value of the magnitude images. The loss function was the mean squared error (MSE) between the reconstructed L-COSY (Recon) and the actual simulated L-COSY (Actual), which was defined as: The Adam optimizer 41 was used with a learning rate set to 1e −3 . Three D-UNet models with identical architecture were trained to reconstruct spectra sampled using the masks shown in figure 2. A total of 40,000 simulated NUS L-COSY spectra were simulated for each sampling scheme, and 100 spectra were used to evaluate the results as an independent test set. The batch size for the training was 10 samples per batch.

Quantitation of L-COSY with D-UNet
The quantitation of fully sampled L-COSY data was performed in a similar manner to the method described above. The input to the quantitation D-UNet was the L-COSY spectrum as a 512 x 32 matrix with three channels representing the magnitude, real, and imaginary components of the spectrum. The input was scaled from 0 to 100 based on the maximum of the magnitude spectrum. The output of the network was a 512 x 32 matrix representative of each metabolite basis set. Therefore, since 17 metabolites were quantified, the output had 17 channels representing the magnitude spectrum for each metabolite. All other training parameters were identical to those described above. A total of 21,000 simulated L-COSY spectra were used for training, and 100 spectra were used for testing the results independently.

Evaluation
All of the results were compared to the actual simulated spectra by utilizing the MSE metric from equation 4. For the non-uniformly sampled spectral reconstruction, the MSE was calculated over all 100 test spectra and compared to the MSE of the 1 -norm reconstruction from equation 3 for all acceleration factors. Normalized MSE was also used, and errors were normalized based on the maximum signal intensity of the spectrum. In addition to MSE, the quantitation with D-UNet also investigated the effect of noise on the quantitative results. Specifically, ten different noise levels were evaluated on the same 100 spectra to determine how the signal-to-noise ratio (SNR) affects the model results and overall stability. These noise levels ranged from 0% to 25% of the maximum signal intensity.

NUS L-COSY reconstruction
The NUS L-COSY spectra reconstructed using the D-UNet architecture can be seen in figure 2. The non-uniform sampling produces several F 1 ridging artifacts present in the spectral domain, which are ultimately removed by using the D-UNet models. For training, the MSE loss function achieved a loss of approximately 3e −5 for each of the models. The errors as the difference between the Actual and Recon spectra are also shown for each acceleration factor.
A qualitative comparison between the D-UNet reconstruction and 1 -norm minimization methods are shown in figure  4 for the 4x reconstructions. While the D-UNet reconstruction displays minimal errors surrounding the major peaks, the compressed sensing results show large errors. Due to the iterative reconstruction, several false cross-peaks also appear in the Acceleration D-UNet 1 1.3x 0.0388 0.0192 2x 0.0170 0.0681 4x 0.0327 0.208 Table 1. Total mean squared error (MSE) over 100 testing spectra for each acceleration factor. A D-UNet model was trained to learn reconstruction for each sampling factor, and the results were compared to 1 -norm reconstruction as described in equation 3. Since the maximum signal is 1 for all spectra analyzed, the normalized MSE as a percentage for the D-UNet is 3.88%, 1.70%, and 3.27% for acceleration factors of 1.3x, 2x, and 4x, respectively. Similarly, the MSE as a percentage for the 1 -norm reconstruction is 1.92%, 6.81%, and 20.8% for acceleration factors of 1.3x, 2x, and 4x, respectively. minimization performs better than the D-UNet reconstruction. However, at higher acceleration factors where less points are sampled, the D-UNet mean error remains under 5%, whereas the 1 -norm minimization reconstruction error is larger than 20%. Once again, these values were calculated over 100 testing L-COSY data that were simulated independently of the training set.

L-COSY quantitation
The capabilities of the D-UNet to identify metabolites from a given L-COSY spectrum are demonstrated in figure 5. From the given L-COSY spectrum, 9 metabolite reconstructions are shown and compared alongside the simulated ground truth spectra: NAA, PCh, Cr3.0, mI, Gln, Glu, GABA, GSH, and Asp. In the example spectrum displayed, NAA was simulated at a concentration level of approximately 8 mmol. For GSH, which was simulated closer to 1 mmol, the reconstruction results still have similar intensity values to the simulated ground truth. While only 9 metabolite reconstructions are shown, it is important to note that all 17 metabolites in the basis set are reconstructed and could be visualized. Of course, SNR can play a large role on the performance of any quantitation algorithm, and therefore errors resulting from high noise were investigated. Figure 6 displays the effect of noise levels on the calculated mean squared error for all 17 metabolite spectral reconstructions. As expected, degrading SNR results in larger MSE values for quantitation. In addition, an example spectrum is shown at two different noise levels: noise level 2 (5% noise) and noise level 8 (20% noise). It is clear that cross-peak intensities vary largely with noise, due to the fact that cross-peaks are low signal peaks for the L-COSY experiment.
The linear relationships between the actual and predicted measurements for all 100 test spectra and 17 metabolites were also analyzed. Figure 7 shows the linear relationships for 16 metabolites quantified from the test spectra with a noise level of 5% of the maximum signal intensity. Linear fits are shown on the correlation plots between the simulated ground truth (Actual) and the reconstructed (Recon) concentrated values. In order to produce the concentration results for Recon, the maximum intensity was used from the individually reconstructed metabolite spectra from the D-UNet quantitation model.
Finally, table 2 compares the concentration values of the 17 metabolites at different SNR values. Ideally, if the quantitation was perfect, the slope would be one and the standard error would be zero. For many metabolites at noise level=2, the slope and error are close to ideal values. However, at noise level=8, slopes start to deviate largely from the ideal values and error also increases. The r 2 metric displayed is the coefficient of determination and is the variance of the fit. Overall, the r 2 values show that variance is low for the quantitative correlations at both noise levels, as demonstrated by r 2 >0.8.

Discussion
From the results, it is clear that the D-UNet architecture is capable of both reconstructing non-uniformly sampled L-COSY data and quantifying L-COSY spectra after appropriate training. Figures 2 and 5 show this qualitatively whereas tables 1 and 2 show this quantitatively. While deep learning has very recently been used for quantitation of 1D MRS 42 , to our knowledge this is the first application of deep learning for reconstructing and quantifying L-COSY MRS. For reconstruction at high acceleration factors, the D-UNet method greatly outperforms a standard compressed sensing method. Spectral quality plays a large role in determining the outcome of the quantitation method, and poor spectral quality results in higher errors, as seen in figure 6. Even though the model architecture for both applications is identical, the two models learn separate properties of the L-COSY spectrum.
The first model, which reconstructs NUS L-COSY data, learns to remove the artifacts produced from the application of a particular non-uniform sampling mask. Due to the non-uniform t 1 sampling, various ridging artifacts are present in the 7/17 Figure 5. The results for the quantitation D-UNet are displayed for an example fully sampled L-COSY spectrum. From the input spectrum (top), the deep learning model reconstructs each metabolite's magnitude spectrum individually (Recon). For comparison, the actual simulated magnitude spectra (Actual) are plotted alongside the reconstructed spectra with the same intensity windows. While only 9 metabolites are displayed, the D-UNet model produces 17 metabolite spectra. The concentrations for these spectra are proportional to the signal intensities, as is standard for most fitting algorithms. Figure 6. The mean squared error is displayed as a function of noise level for the quantitation D-UNet results (left). These results were produced by analyzing MSE for 100 identical spectra at 10 different noise levels ranging from 0% -25% noise relative to the maximum signal intensity. Two example spectra are shown displying 5% noise (middle) and 20% noise (right). Qualitatively, it is clear that cross-peak signal amplitude is greatly altered due to the added noise in the noise level = 8 spectrum. Figure 7. The relationship between the actual metabolite concentration on the x-axis (Actual) and the reconstructed metabolite concentration on the y-axis (Recon) is shown for 16 metabolites for 100 test spectra. The spectra contained approximately 5% noise signal relative to the maximum signal intensity (noise level = 2). Overall, most metabolites displayed an expected linear relationship even at lower concentration values. Quantitative values for these results are tabulated in table 2.

9/17
Noise Level = 2  Table 2. A quantitative comparison between the quantitation results for two different noise levels is shown. A perfect quantitation algorithm would produce the following results: slope = 1, r 2 = 1, and standard error (Std. Error) = 0. From the results, it is clear that higher SNR spectra produce more accurate quantitative results.
F 1 domain 20 . Depending on the sampling pattern, the artifacts will be mostly constant for each metabolite, but will still be a function of the metabolite concentration, line-broadening factor, and noise level. By providing enough example data, the network essentially learns how to identify the ridging artifacts and remove them appropriately for a given sampling mask and basis set. This is best illustrated in figure 2, where it is clear that ridging is removed in the reconstructed spectra for each acceleration factor. On the other hand, the second model that quantifies metabolite concentrations from L-COSY spectra learns a different property of the L-COSY images. After adding all of the metabolites together to form a composite spectrum, several signals overlap and are hard to disentangle. Optimization problems are able to handle this issue by fitting overlapping peaks using several parameters, often including appropriate prior-knowledge 21,22,43 . Unfortunately, these algorithms take a very long time to calculate these parameters and often yield sub-par results if the quality of the L-COSY spectrum is low (high noise, low SNR, signal contamination, etc.). The quantitation model learns how to disentangle overlapping signals through analysis of the magnitude, real, and imaginary components of the input spectrum. By training on thousands of data, the model learns which signals best represent each metabolite even if the signal is buried in another peak and noise. Furthermore, the calculation is extremely fast and is on the order of seconds for a single spectrum. In terms of accuracy, even lower concentrated metabolites simulated at less than 1 mmol are accurate (error less than 3%) for most metabolites, as shown in figure 7. While these pilot results look promising for reconstruction and quantitation of L-COSY spectra, the current implementation of this method has several weaknesses.
First, the D-UNet model requires prior-knowledge for all metabolites present in the tissue for training as well as how these 10/17 signals are affected by a particular non-uniform sampling pattern. Compressed sensing reconstructions do not require any spectral prior-knowledge, and therefore are more versatile for different sampling masks. This is not necessarily a weakness if: 1) the sampling mask used for acquisition matches the D-UNet sampling mask used for training and 2) all metabolites in the tissue are known a priori. For most experiments, 1) is easily satisfied. For healthy tissues and well documented pathologies, 2) is not an issue. However, 2) may become an issue for pathologies that are not well understood and involve unknown chemical changes. This problem may be alleviated by including prior-knowledge for all metabolites appearing in the analysis of ex vivo tissue samples of this pathology if available. For example, mass spectrometry of ex vivo tissues played a pivotal role in identifying 2-hydroxyglutarate (2HG) in certain glioma patients as a metabolite of interest 44 . Additional prior-knowledge can always be included into the training process to account for macromolecule signals or other signals that may be present in the spectrum retrospectively if necessary.
Another weakness of the current methodology is that water and fat contamination were not added to the training spectra. Due to water suppression pulses 45 , spectral distortions around the water region may affect metabolite quantitation. For 2D experiments, total removal of water signal while retaining metabolite signal is more challenging, and may affect the amplitudes of correlated cross-peaks close to water. This problem can be overcome through more advanced training, however the effects of water suppression and removal through common methods such as singular value decomposition (SVD) have to be well understood in order to be modeled correctly. Contaminating fat signal may affect quantitation of metabolites such as lactate and NAA, depending on severity. These fat signals can also be incorporated into the training process, however it is important to utilize the correct fat species.
The final weakness of the current methodology is the broadening model used to produce the training and testing data. Currently, only an exponential line-broadening term was used for this pilot study. While exponential line-broadening may be a great first approximation for peak shapes, gaussian, lorentzian and even voigt lineshapes may be present in the final experimental peaks 43 . Due to the increased number of parameters introduced with these added lineshapes, the training data size would need to be much larger for adequate training of the model. In addition, the number of features present in the model may need to be increased in order to handle the complexity of the additional broadening parameters.
Even with these weaknesses, the methodology presented here can easily be applied to other 2D MRS experiments and to iterative MRS experiments in general. The J-resolved spectroscopy (JPRESS) experiment is another useful 2D MRS technique 16,17 , and we have pilot results showing that these models apply for this type of experiment (Supplemental Information). Other 2D experiments include the nuclear overhauser effect spectroscopy (NOESY), total correlation spectroscopy (TOCSY), as well as others. Iterative MRS experiments include diffusion weighted spectroscopy [46][47][48] , J-editing spectroscopy, and any multi-TE spectroscopy 49 . In addition, this methodology could be refined for the application of super-resolution spectroscopy, including covariance spectroscopy 50,51 . However, super-resolution may be unnecessary if accurate quantitative results can already be obtained from low resolution spectra.
Simulation results are certainly powerful for evaluating the feasibility of potential applications, and this study demonstrates that the D-UNet is capable of reconstructing NUS L-COSY data and quantifying L-COSY spectra. However, these methods need to be further validated in vitro and in vivo. Also, these methods have to be compared to state of the art techniques for each application. For reconstruction, the D-UNet model should ideally be compared to compressed sensing, maximum entropy 39, 52 , or other reconstruction methods. It is important to note that while many reconstruction methods require certain sampling schemes (random, non-uniform, etc.), the D-UNet is capable of reconstructing any sampling pattern with the correct training approach. While an exponential sampling scheme was used in this study, a skewed-squared sine-bell sampling scheme may be better to implement in the future 39 . For quantitation, it is important to compare the deep learning method to other 2D in vivo fitting algorithms to assess accuracy and reproducibility 22 . After further validation, these models may easily be combined together to create a single deep learning model capable of simultaneously reconstructing and quantifying L-COSY spectra. With further improvements, this method will hopefully have the same acquisition duration as a 1D single-voxel scan (3-5 minutes), which will make this method extremely useful clinically for discerning overlapping metabolite signals.

Conclusion
We present a deep learning approach capable of reconstructing non-uniformly sampled L-COSY spectra and quantifying fully sampled L-COSY spectra. Overall, the results demonstrate accurate reconstruction and quantitation with normalized mean squared error less than 5% for most SNR levels. This technique was evaluated using simulated data, and further studies will validate this method for in vitro and in vivo measurements, and compare this method to state of the art techniques.