Life Prediction of Battery Using a Neural Gaussian Process with Early Discharge Characteristics

The state of health (SOH) prediction of lithium-ion batteries (LIBs) is of crucial importance for the normal operation of the battery system. In this paper, a new method for cycle life and full life cycle capacity prediction is proposed, which combines the early discharge characteristics with the neural Gaussian process (NGP) model. The cycle data sets of commercial LiFePO4(LFP)/graphite cells generated under different operating conditions are analyzed, and the power characteristic P is extracted from the voltage and current curves of the early cycles. A Pearson correlation analysis shows that there is a strong correlation between P and cycle life. Our model achieves 8.8% test error for predicting cycle life using degradation data for the 20th to 110th cycles. Based on the predicted cycle life, capacity degradation curves for the whole life cycle of the cells are predicted. In addition, the NGP method, combined with power characteristics, is compared with other classical methods for predicting the remaining useful life (RUL) of LIBs. The results demonstrate that the proposed prediction method of cycle life and capacity has better battery life and capacity prediction. This work highlights the use of early discharge characteristics to predict battery performance, and shows the application prospect in accelerating the development of electrode materials and optimizing battery management systems (BMS).


Introduction
In recent years, lithium-ion batteries (LIBs) have attracted widespread attention due to the advantages of their high energy density, low self-discharge characteristics, and absence of memory effects [1][2][3]. They have become the preferred energy storage systems for many engineering and industrial applications, such as portable devices, automobiles, and aerospace systems [4][5][6]. However, the performance of LIBs will deteriorate with the decrease of capacity, which may lead to equipment and system degraded capability, failures, or even catastrophic loss [7].
State of health (SOH) is an index used to evaluate the aging degree of cells, which includes capacity fade and cycle life prediction. It provides very useful information to predict when a battery should be replaced. Battery performance is usually monitored by measuring battery voltage, current, temperature, capacity, impedance, and other parameters through relevant experiments. Various corresponding methods can be used to evaluate the battery health status based on the parameters mentioned above [8]. Cycle life refers to the number of cycles in which available capacity decays to 80% of the rated capacity of the battery [9]. Therefore, it is vital to achieve an accurate and reliable prediction of SOH of LIBs, which is a key approach of battery management system (BMS) in scientific research and practical application [10]. Meanwhile, the accurate prediction model of capacity decay and cycle life statistics could significantly accelerate the development and commercialization of LIBs, curve to describe the remaining useful life (RUL) of LIBs, and proposed a LIBs cycle life estimation method based on a feedforward neural network and the importance sampling. In Ref. [30], a deep neural network was used to predict the SOH and RUL of batteries. In Ref. [31], a SOC estimation method was introduced through a combination of a radial basis function neural network (RBFNN), an orthogonal least-squares algorithm, and an adaptive genetic algorithm. The method was validated using LiFePO 4 (LFP) batteries under several different discharging conditions. Currently, there are few reports on the use of early data to predict cycle life. Recently, Severson et al. [32] proposed a prediction model combining data generation and a data-driven model, which used the discharge voltage curve of the early cycles to predict and classify the cycle life of LIBs. Their research showed the prospect of accurately predicting battery life using early discharge characteristics.
In this study, we explore the possibility of extracting battery health indicators from early discharge characteristics to predict SOH. The proposed methodology is combined with the Neural Gaussian Process (NGP) model [33][34][35], which considers battery degradation under various operating conditions. First, based on a correlation analysis, we predicted that the average power that decays in the early cycles of batteries is related to the cycle life. Afterwards, the cycle life of LIBs is predicted using the established model. The proposed method is then compared with other cycle life and capacity prediction methods.
The rest of the paper is organized as follows. The battery data preparation is presented in Section 2. The physical feature extraction is shown in Section 3. In Section 4, the methodology is introduced. The results and description are discussed in Section 5. Finally, the conclusion is summarized in Section 6.

Battery Data Preparation
We used the degradation data set of 124 commercial LFP/graphite cells (A123 Systems, model APR18650-M1A, 1.1 Ah nominal capacity, failure capacity of 0.88 Ah) generated by Severson et al. [32]. In the experiment, those cells were cycled in a forced convection temperature chamber set to 30 • C, and the detection parameter data sets were recorded by the Arbin LBT potentiostat sensor. Parameters such as voltage, capacity, current, temperature, and internal resistance were continuously measured during the cycles. The data set was generated by three batches of batteries through tests, of which the first batch contained 41 batteries, the second batch contained 43 batteries, and the third batch contained 40 batteries. All cells in this dataset were charged with a two-step fast-charging policy. This is the largest publicly available dataset for nominally the same commercial LIBs cycled under controlled conditions. We studied the performance decline of LIBs with multiple degradation mechanisms due to the manufacturing processes and operating conditions of the dataset. Meanwhile, fast charging conditions made it possible to estimate battery health and optimize battery performance under extremely fast conditions, and accelerate the commercialization of fast-charging lithium batteries. Figure 1 shows the functional relationship between the observed discharge capacity and the number of cycles in the whole life cycle. The capacity of the batteries decay slowly during the early cycles, while rapidly decaying in the later stage of the cycles. The degradation tracks of the batteries are interlaced with each other. This indicates that there is a complex, nonlinear relationship between capacity and cycle number, and it is difficult to choose an accurate aging model as the basis of battery dataset life prediction [36]. In order to overcome these difficulties, we combined key physical characteristics, which were constructed by the early discharge characteristics and reflected the state of health of LIBs, with a data-driven approach to predict the cycle life of cells.

Physical Feature Extraction
In previous reports, researchers usually used measurem age, current, and temperature to construct physical prope [24,27,29]. In this paper, in order to explore the relationship be acteristics and cycle life, we use discharge voltage ( ), disch charge time in every cycle to propose the average power function of cycle number. The power characteristic of a ba  Figure 2 the cells in the early cycles. Since the capacity increased slig stage, in order to study the relationship between the character cycle life, we use data from the 20th to 110th cycles.
show trend, and its attenuation is very small from the whole life statistics to convert the fluctuation attenuation of the averag each battery from the 20th to 110th cycles into a scalar t

Physical Feature Extraction
In previous reports, researchers usually used measurement parameters such as voltage, current, and temperature to construct physical properties to predict battery life [24,27,29]. In this paper, in order to explore the relationship between early discharge characteristics and cycle life, we use discharge voltage U(t), discharge capacity Q(t), and discharge time t in every cycle to propose the average power characteristic P, which is a function of cycle number. The power characteristic P jk of a battery per cycle is defined as where j represents the jth battery, and k stands for the kth cycle. For the kth cycle of the jth cell, U(t i ) is the discharge voltage of the cells at time t i , where i is a moment in a discharge cycle i = 2, 3, . . . , n. Q(t i ) and Q(t i−1 ) are respectively the discharge capacity of the batteries at time t i and time t i−1 in the cycle. Figure 2a is P attenuation curve of the cells in the early cycles. Since the capacity increased slightly in the early discharge stage, in order to study the relationship between the characteristics of the decay phase and cycle life, we use data from the 20th to 110th cycles. P shows a fluctuating attenuation trend, and its attenuation is very small from the whole life cycle. We use the variance statistics to convert the fluctuation attenuation of the average power characteristic P j of each battery from the 20th to 110th cycles into a scalar P Dj to establish a corresponding relationship with the cycle life L j . The relationship between P D and cycle life L is quantitatively analyzed by the Pearson rank correlation coefficients. It is expressed as,

Neural Gaussian Process Model
Although is strongly correlated with the cycle life when considering the no relationship between cycle number and the discharge capacity, the linear model predict the capacity decay. Therefore, we built a prediction model based on chara and NGP. Surprisingly, Figure 2b shows a significant correlation between the variance of characteristic P and cycle life of LIBs (ρ P D ,L = −0.936). Compared with previous studies, the correlation coefficient of power characteristic is higher than that of the voltage characteristic using the 110th cycle and the 20th cycle (ρ ∆Q 110−20 (V) = −0.932, where capacity Q is a function of voltage, and ∆Q 110−20 (V) = Q 110 (V) − Q 20 (V)) [32].

Neural Gaussian Process Model
Although P is strongly correlated with the cycle life when considering the nonlinear relationship between cycle number and the discharge capacity, the linear model fails to predict the capacity decay. Therefore, we built a prediction model based on characteristic P and NGP.
NGP is a machine learning method that combines neural network and GP. It makes use of the advantages of the neural network function approximation to solve the difficulty of selecting the kernel function of Gaussian process through experience. This allows the model flexibility to carry out nonlinear simulation to achieve target prediction. Figure 3 illustrates the schematic structure of the NGP. In order to learn and optimize the parameters of the model and generate target prediction, the model divides the data set (x 1:n , y 1:n ) into a training set (x 1:m , y 1:m ), 1 < m < n and test set (x m:n , y m:n ), where x 1:n (x 1 , . . . , x n ), y 1:n (y 1 , . . . , y n ). In particular, in the training and test stages of the model, the data consists of the context set (x C , y C ) and the target set (x T , y T ). In the training phase, the context set (x C , y C ) and the target set (x T , y T ) are derived from the training set (x 1:m , y 1:m ). In the test phase, the context set (x C , y C ) is the training set (x 1:m , y 1:m ) and the target set (x T , y T ) is the entire data set (x 1:n , y 1:n ). The marginal joint distribution of the objective function F(x T ) is defined as, whereŷ T is the prediction target vector. p denotes the abstract probability distribution over all random quantities, z is the latent variable of the model, obeys the Gaussian distribution, and measures the probability uncertainty of the model. s C is a parameter generated by the context set (x C , y C ), and z C ∼ N (µ(s C ), Iσ(s C )) is modelled by a Gaussian parameter by s C . k * is the attention mechanism of the model. This mechanism makes full use of the information of x C , and gives important contextual points that have more weight within the model. The mechanism of attention can be defined by the Laplace kernel, the dot-product kernel, etc. At the same time, the parameters of the encoder and decoder are learned by maximizing the following evidence lower bound (ELBO), where q(z T |s T ) and q(z C |s C ) are the variational posteriori of p(z T |s T ) and p(z C |s C ) , respectively. During the training, the target values y T are known, and Equation (4) is used to optimize the encoder and decoder parameters. And y T is the prediction vector of x T . During the test, Equation (3) is used to predict the target valuesŷ T .

Prediction Framework
The procedure of the battery cycle life prediction, based on early discharge characteristics, is presented in Figure 4. Moreover, the methods of voltage characteristic and the RBFNN model are compared, respectively, to illustrate the advantages of the power characteristic and NGP model proposed in this paper.
lowing evidence lower bound (ELBO), where ( | ) and ( | ) are the variational posteriori of ( | ) and ( | ), respectively. During the training, the target values are known, and Equation (4) is used to optimize the encoder and decoder parameters. During the test, Equation (3) is used to predict the target values .

Prediction Framework
The procedure of the battery cycle life prediction, based on early discharge characteristics, is presented in Figure 4. Moreover, the methods of voltage characteristic and the RBFNN model are compared, respectively, to illustrate the advantages of the power characteristic and NGP model proposed in this paper.
Suppose the battery data is ( : , : ), and divide the data set into training set ( : , : ),1 < < and test set ( : , : ). The prediction step of battery life can be expressed as follows: firstly, the power characteristic of all cells is calculated by Equation (1). Then, the battery life is predicted by Equations (3) Similar to the battery cycle life prediction, the training set data is used to optimize the model. For the test batteries, the discharge capacity data ( : * , : ) of the first Figure 3. NGP model framework. The calculation processes are defined by MLP, except for the attention mechanism and mean. The deterministic path generates attention mechanism k * , and the latent path generates latent variable z C and z T . k C , s C and s T are the intermediate variables that generate k * , z C , and z T , respectively. The methods defined by k C , s C and s T are similar. For instance, k C is a set composed of k Ci , which is a representation produced by each pair (x C , y C ) i in the input space through MLP.

Cycle Life Prediction
The battery data set consists of three sampling batches; we divided the data into a training set, the first test set, and the second test set to verify the predictive performance and adaptability of the model. In particular, we combined the first two batches of data and divided the data set into the training set (41 cells) and the first test set (43 cells, by intermittent sampling, and divided the battery data of the third batch into the second test set (40 cells). To evaluate the prediction accuracy of the proposed method, the mean absolute percent error (MAPE) is used in this paper. It is given by Suppose the battery data is (x 1:n , L 1:n ), and divide the data set into training set (x 1:m , L 1:m ), 1 < m < n and test set (x m:n , L m:n ). The prediction step of battery life can be expressed as follows: firstly, the power characteristic P D of all cells is calculated by Equation (1). Then, the battery lifeL is predicted by Equations (3)  life prediction to realize the prediction of the whole life cycle capacity degradation of cells. Specifically, we normalize C y1:L C y1 , . . . , C yL , denoted as C * y1:L , and establish a functional relationship with Q 1:L Q 1 , . . . , QL. Normalization is defined as Similar to the battery cycle life prediction, the training set data is used to optimize the model. For the test batteries, the discharge capacity data (C * y1:110 , Q 1:110 ) of the first 110 cycles are inputted into the model to obtain the capacity (C * y1:L , Q 1:L ) of the entire life cycle.

Cycle Life Prediction
The battery data set consists of three sampling batches; we divided the data into a training set, the first test set, and the second test set to verify the predictive performance and adaptability of the model. In particular, we combined the first two batches of data and divided the data set into the training set (41 cells) and the first test set (43 cells, by intermittent sampling, and divided the battery data of the third batch into the second test set (40 cells). To evaluate the prediction accuracy of the proposed method, the mean absolute percent error (MAPE) is used in this paper. It is given by where L j is the observed cycle life of the jth battery,L j is the model predicted life of the jth battery, and n represents the number of training sets or test sets. In this paper, the validity of the proposed power characteristic and model is illustrated by comparing the cycle life prediction performance of the voltage characteristic and the RBFNN model. Figures 5 and 6 show the predicted performance of the power characteristics using the NGP model and the RBFNN model, respectively. Figures 7 and 8 compare the predicted results of the voltage characteristics using the NGP model and the RBFNN model. A comparison between the predicted cycle life and the observed cycle life is shown in Figures 5a, 6a, 7a and 8a. Table 1 displays the MAPE of cycle life for each method. A smaller MAPE value indicates a better data set cycle life prediction performance. Surprisingly, the predicted error of the NGP model, using power characteristics, is 12.0% in the first test set and 8.8% in the second test set, using only the 20th to 110th cycles data. It is worth noting that the battery performance has only a very weak decay during the first 110 cycles. In the first test set, excluding a rapidly degraded battery that deviates from the relationship between P and the cycle life of the other batteries, the predicted error is reduced to 8.8%. Simultaneously, the prediction errors of the RBFNN model using power characteristics in the training set, the first test set, and the second test set are 10.2%, 10.1%, and 9.8%, respectively. The prediction errors of the NGP model using voltage characteristics in the training set, the first test set, and the second test set are 9.3%, 10.7%, and 11.9%, respectively. The prediction errors of the RBFNN model using voltage characteristics in the training set, the first test set, and the second test set are 10.5%, 12.5%, and 12.6%, respectively. As can be seen from Table 1, NGP performs better than RBFNN when using the same battery characteristics. At the same time, using the same model, the predictive performance of the power characteristics optimizes the voltage characteristics. The prediction curves of each method are shown in Figures 5b, 6b, 7b and 8b. For voltage characteristics, the prediction curve of the NGP model is smoother than that of the RBFNN model. The RBFNN model shows a sharp value in the lower left corner of Figure 8b. For the power characteristics, the prediction curves of the two models are smooth. The results show that the proposed power characteristics can improve the prediction accuracy of the model, and the NGP model has a better regression fitting performance. The NGP model, combined with the power characteristics, is an effective method to predict the cycle life of batteries.
model. The RBFNN model shows a sharp value in the lower left corner of Figure 8b. For the power characteristics, the prediction curves of the two models are smooth. The results show that the proposed power characteristics can improve the prediction accuracy of the model, and the NGP model has a better regression fitting performance. The NGP model, combined with the power characteristics, is an effective method to predict the cycle life of batteries.  show that the proposed power characteristics can improve the prediction accuracy of the model, and the NGP model has a better regression fitting performance. The NGP model, combined with the power characteristics, is an effective method to predict the cycle life of batteries.      The results in parentheses exclude a battery that deviates from the relationship between P and cycle life, or between V and cycle life.

Capacity Prediction
In this brief, batteries with a different cycle life (i.e., cycle life 335, cycle life 854, cycle life 1028, and cycle life 1638) are selected to verify the NGP model capacity prediction results, and compare the prediction results of the RBFNN model. As the prediction results of the RBFNN model are affected by the length of the training data, we not only use the data of the first 110 cycles to predict capacity, but also use the data of the first 20% of cycles of the predicted life of the power characteristics to predict capacity. The MAPE of the cycle that reaches the capacity failure threshold for the first time is used to quantify the performance of the method, based on NGP and RBFNN.  The results in parentheses exclude a battery that deviates from the relationship between P and cycle life, or between V and cycle life.

Capacity Prediction
In this brief, batteries with a different cycle life (i.e., cycle life 335, cycle life 854, cycle life 1028, and cycle life 1638) are selected to verify the NGP model capacity prediction results, and compare the prediction results of the RBFNN model. As the prediction results of the RBFNN model are affected by the length of the training data, we not only use the data of the first 110 cycles to predict capacity, but also use the data of the first 20% of cycles of the predicted life of the power characteristics to predict capacity. The MAPE of the cycle that reaches the capacity failure threshold for the first time is used to quantify the performance of the method, based on NGP and RBFNN. Figure 9 shows the capacity prediction results of the NGP and RBFNN models. The details of the capacity prediction results can be seen in Tables 2 and 3. Figure 9 shows that the capacity prediction curve of the NGP model fits the real curve better. For the failure life predicted by the capacity curve, the prediction error of the NGP model is less than 5% using the first 110 cycle data, and the performance growth is small after changing the length of training, while the performance of the RBFNN model, using only the data of the first 110 cycles, is poor with the increase of battery cycle life. When using the data of the first 20% cycles of the predicted life of the power characteristics, the RBFNN model shows better performance, but is still worse than the NGP. life predicted by the capacity curve, the prediction error of the NGP model is less than 5% using the first 110 cycle data, and the performance growth is small after changing the length of training, while the performance of the RBFNN model, using only the data of the first 110 cycles, is poor with the increase of battery cycle life. When using the data of the first 20% cycles of the predicted life of the power characteristics, the RBFNN model shows better performance, but is still worse than the NGP.  The results show that the data length has little effect on the performance of the NGP model, but has a great influence on the RBFNN model. In practice, the battery discharge data are measured, so it is difficult to increase the training length of the model. Combined with the prediction results of cycle life and capacity, the NGP model shows good predic-

Predicted capacity(Ah)
Train with the first 20% cycle life data Figure 9. The predicted discharge capacity curves.  The results show that the data length has little effect on the performance of the NGP model, but has a great influence on the RBFNN model. In practice, the battery discharge data are measured, so it is difficult to increase the training length of the model. Combined with the prediction results of cycle life and capacity, the NGP model shows good prediction accuracy.

Conclusions
This paper proposes a battery life prediction model, combining discharge characteristics and NGP, using early cycle discharge data. For the degradation data generated by commercial LFP/graphite batteries under different cycling conditions, we do not use a priori battery degradation knowledge. Rather, on the one hand, we extract the average power characteristic from the discharge voltage, battery capacity, and charging time, and, on the other hand, combined with NGP, using only the 20th to 110th cycle data, we get 8.8% life predicted error and a complete capacity degradation predicted curve. The prediction results show that the performance of the NGP and power characteristic model is better than that of the traditional FRBFF and voltage characteristic techniques. Thus, the proposed method NGP with early discharge characteristics has superior performance in cycle life and capacity prediction. In general, our approach can optimize material design, and provides a reliability assessment for the BMS by evaluating battery performance early in the battery cycle. In future, new studies should consider more battery discharge characteristics in order to improve prediction performance.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.