A distribution-based selective optimization method for eliminating periodic defects in harmonic signals

Due to environmental interference and defects in measured objects, measurement signals are frequently affected by unpredictable noise and periodic defects. Moreover, there is a lack of effective methods for accurately distinguishing defect components from measurement signals. In this study, a distribution-based selective optimisation method (SOM) is proposed to mitigate the effects of noise and defect components. The SOM can be seen as a binary- or multiple-class signal classifier based on an error distribution, which can simultaneously eliminate periodic defect components of measurement signals and proceed with signal-fitting regression. The effectiveness, accuracy, and feasibility of the SOM are verified in theoretical and realworld measurement settings. Based on theoretical simulations under various parameter conditions, some criteria for selecting operation variables among a selection of parameter conditions are explained in detail. The proposed method is capable of separating defect components from measurement signals while also achieving a satisfactory fitting curve for the measurement signals. The proposed SOM has broad application prospects in signal processing and defect detection for mechanical measurements, electronic filtering, instrumentation, part maintenance, and other fields.


Introduction
Measurement data are often collected through scientific and engineering experiments.The relationships between measurement data y and independent variables x are examined to analyse signal characteristics, which can be denoted as an approximate expression of discrete points (x i , y i ) or fitting function f(x i ) in signal processing.Conventional fitting methods for measurement signals include the original least squares (OLS) [1], fast Fourier transform (FFT) [2], principal component analysis (PCA) [3], wavelet transform (WT) [4,5], least mean squares (LMS) [6,7], and maximum likelihood estimation (MLE) [8].However, few methods can efficiently handle environmental interference in measurement signals while fitting discrete points (x i , y i ).Such environmental interference include noise and mechanical defects.To provide an accurate function f(x i ) for data interfered with random and nonrandom interference, scientists worldwide have optimised conventional fitting methods and have provided a few approaches to improve fitting accuracy and computational efficiency.
For a measurement signal with a smooth waveform, many scientists research signal characteristic based on improved fitting methods.Brown et al. [9] proposed a signal-period recognition method based on a feedback error system.Tenneti et al. [10] proposed a Nested Periodic Matrix method for detecting signal periods using nested periodic matrices.Qiu et al. [11] presented an optimisation approach to obtain the exact frequency characteristics of harmonic signals.Tan et al. [12] obtained the signal frequency using a linear model of the frequency measurement based on least-squares regression analysis.Gurubelli et al. [13] developed a method for estimating the signal frequency of sampled sinusoidal signals.These studies have helped improve the fitting accuracy of the measurement signals.However, these methods can neither deal with noise with large amplitudes nor handle multiple defects.
For measurement signals with noise components, researchers have proposed several methods to alleviate the noise effect, thereby recognising the signal characteristics.Laakso et al. [14] reconstructed non-uniform measurement signals using polynomial filtering to minimise the effects of noise.He et al. [15] evaluated noise-disturbing problems and proposed a noise-eliminating method for acoustic emission signals.Zhang et al. [16] proposed a modified joint maximum-likelihood estimation algorithm for burst signals.Aliev et al. [17] analysed the effect on estimation errors of the correlation functions of noise signals using traditional correlation analysis algorithms.Sawma et al. [18] proposed an identification method for motor parameters based on LMS, but the fitting accuracy was affected by the noise components.The abovementioned studies are effective for noise reduction in harmonic signals, but they were unable to handle harmonic signals with extensive interference or multiple defects.
Owing to the diversity of objects to be measured and the complexity of environmental factors, measurement signals are often merged with one or multiple types of periodic defect components.There are many common real-life examples of harmonic signals merging with obvious periodic defects.For example, 1) two abrupt points appear in the measurement signal of the axle profile owing to the casting joint line; 2) the measurement signal of the rotating cylindrical shaft emerges with multiple pulse signals induced by surface local damage, protuberance, and pit; and 3) the measurement signals of vibration displacement, voltage, and sound decibels are mixed with many transient signals owing to the interference of mechanical, electromagnetic, and even power supply impact excitations.Periodic defects can be easily viewed as valid information of harmonic signals in the measurement process because the span and amplitude of the defect are usually notable.This results in an obvious error in the fitting curve, and the fitting accuracy of the current methods is unsatisfactory.Fig. 1 shows an example of a waveform of harmonic signals merged with periodic defects.As shown in Fig. 1, the amplitude of the defect is large and the defect frequency is periodic.
For measurement signals that emerge with defect components, the most commonly used methods for eliminating defect components are the WT filter [19] and Gaussian process regression (GPR) [20].However, both methods have shortcomings in terms of effectiveness and accuracy of defect detection.The computational efficiency of the WT filter is costly, and it cannot guarantee quick and accurate calculations simultaneously.GPR is mainly used to deal with the harmonic signals of lose nonperiodic information.When the periodic defect is merged with the measurement signal, GPR cannot distinguish between the defect and non-defect information.Therefore, it is not a suitable method for handling harmonic signals with periodic defects.In addition to the WT filter and GPR methods, researchers have proposed several methods to achieve fault diagnosis.Cheng et al. [21] proposed a noise reduction method based on adaptive weighted symplectic geometry decomposition.However, this method has limitations in defect identification.Mauricio et al. [22] developed a bearing diagnostics method by improving the envelope spectrum, which has high accuracy but more complexity.
The main contribution of this study is the proposal of a novel error-distribution-based selective optimisation method (SOM) to distinguish defects, thereby providing a robust and adaptive approach to fit measurement signals.The proposed SOM can handle large periodic defect components, provide an automatic setting of parameters in the operation variable group, and ultimately eliminate signals that are derived from defect components.Using the proposed SOM, the optimal trigonometric polynomial vector of the harmonic signal can be achieved, which enables an accurate estimation of the equipment's operational performance and defects.
This study first introduces the SOM model in Section 2. The SOM procedure involves dividing the sample signal into several segments, eliminating partial segments randomly, fitting the residual signal, analysing the coincidence error, and obtaining the optimal trigonometric polynomial vector.Section 2 defines the selection principle for the parameters in the operation variable group for the proposed SOM model.In Section 3, the effectiveness and applicability of the SOM are verified and compared with the current state-of-the-art signal fitting methods, including the OLS, WT, and GPR methods.Section 4 focuses on analysing the association between the characteristic parameters of measurement signals and fitting success probability, and presents the selection principle of operation variable groups.Moreover, the application of the SOM for measuring signals with defect components is verified experimentally in Section 5. Finally, concluding remarks are presented in Section 6.
The proposed selective optimisation method (SOM) is a novel signal-processing method based on error distribution that can be used to distinguish defect components from measurement signals.There are three operation variables in the SOM: the number of segments per cycle N seg , the number of eliminated segments N del , and the sample size N sec .The operation variable group (N seg , N del , N sec ) was selected according to the signal characteristics, which will be introduced in Section 4.
There are three main steps in the SOM procedure, as shown in the flow diagram in Fig. 2. The first step was signal preprocessing.The operation variables N seg and N sec are first ensured in this step, and multiple sample signals are intercepted from the measurement signal.The second step is the sample signal segmentation, elimination, and fitting.Operation variables N del are ensured, N del segments are selected and eliminated randomly, and the residual signal of each eliminated combination is fitted.The last step is statistics and optimisation.The coincidence errors of all the eliminated combinations were calculated and analysed, and the optimum fitting result of the sample signal was obtained.The details of the variable settings, operational procedure, and parameter acquisition are shown in Fig. 2.

Signal preprocessing
The signal preprocessing step involves observing the data and preprocessing the signals into windows for better curve-fitting performance.The defect period T and span S def were determined based on the measurement signal characteristics.The defect period T can be calculated as an integer multiple of the fundamental period of measurement signal, which can be obtained by observing the signal's peaks and inflexions, or by transforming the measuring variables in the process of signal acquisition.The estimate of the defect span S def can be obtained by observing the signal waveform, and the exact value of S def can be acquired automatically by analysing the sample signal gradient.The fitting accuracy improves when S def < T/8.The segment span S seg was determined according to T and S def .When the condition of S seg > S def is satisfied, the number of segments per cycle N seg can be calculated as.
In general, sample size is proportional to the probability of defect elimination.Therefore, several sample signals were intercepted from the measurement signal to aggrandise the sample size N sec , which is usually set as 2 or 3 to balance the fitting accuracy and efficiency.
The acquisition procedure for sample signals can be summarised in the following two steps: 1) a segment is subdivided into N sec parts, each part's span S sec = T/(N seg N sec ); 2) N c (usually set as 3-5) cycles of data are intercepted starting from each assigned point x s of the measurement signal as sample signals, x s = a + iS sec , where a is the arbitrary moment on the measurement signal, which is usually the initial point of signal acquisition, and iS sec is the phase difference of each sample signal's initial phase, i = 0, 1, 2, …, N sec -1.
The proposed method can handle multiple harmonic signals with various independent variables.To better understand the proposed method, the independent variable x was defined as the phase, and T was set to 360° in this study.A schematic diagram of the signal preprocessing is shown in Fig. 3.As shown in the figure, T is the defect period, and each period of the signal is divided into N seg segments, where S i stands for each segment, i = 1,2,…, N seg .The first segment is divided into N sec parts, where s i stands for each part, i = 1,2,…, N sec .If S seg > S def , the probabilities of a defect contained in one or two adjacent segments are where C 1 and C 2 represent the events in which defects occur in one and two adjacent segments, respectively.

Segmentation, elimination, and fitting
The eliminated combination represents the segments of each period, which can be determined according to the first period of sample signals.The principle of elimination was the same for the other periods as it was for the first.
The first period was segmented based on the N seg and S seg obtained during preprocessing.
The N del of the N seg segments are randomly selected, and N del represents the number of eliminated segments.N del < N seg , therefore, the number of eliminated combinations of all sample signals can be written as Subsequently, the corresponding segments of each eliminated combination were removed.The probabilities of eliminating defects are where D 1 and D 2 represent the events in which the defects are eliminated when C 1 and C 2 are satisfied, respectively.The probability was significantly reduced when the defect was distributed in two segments.Setting S seg > S def , the probability of the defect being eliminated is where D represents the case where defects are eliminated.Substituting (1), (2), and (4) into (5), the eliminated probability can be expressed as There are two operation variables, N seg (in the form of S seg ) and N del in Eq. ( 6).P(D) increases with S seg and N del , if T and S def are constant.This result indicates that the probability of defects being eliminated, P(D), is positively correlated with the total amount of signal rejection, N del × S seg .However, removing too much data can lead to obvious drawbacks.Therefore, N del should be chosen based on the principle N def < N del < N seg /2, and S seg should be set as S seg = k e S def and k e ∈ [1.5,2].If defects are contained in one or two adjacent segments and S seg > S def , the defect can be eliminated completely only if all segments involving defects are eliminated.Based on this combination, there must be several elimination combinations that include segments with defects.The associations between each signal characteristic parameter and the SOM operation variables are analysed in Section 4 (the selection principles of operation variable group are introduced in Section 4.3).
According to the principle of signal elimination in SOM, each period of the sample signal is divided into N seg segments, and the N del segments are removed according to different elimination combinations.The residual signal is periodic; thus, it can be fitted based on OLS.
OLS is a mathematical optimisation method for estimating the best relationship between the independent and dependent variables.If we assume that (x i , y i ) is the coordinate of each point's residual signals and (x i , f m (x i )) is the coordinate of the fitting curve of the residual signal in the mth eliminated combination, we have where A is a trigonometric polynomial vector, which can be written as A = [1 cosk 1 x sink 1 x cosk 2 x sink 2 x …] T .k i represents the fluctuation frequency of the signal, k 1 = 2π/T, and can be obtained using the frequency acquisition method [23] or by transforming the measurement variables in the signal acquisition process.P m is the undetermined coefficient vector, and An example is provided in Fig. 4 to demonstrate the interception, segmentation, and elimination of a sample signal using the SOM model.The signal parameters are T = 360° and S def = 30°, the operation variables are N seg = 6, N del = 3, and N sec = 2, and the grey bars represent the eliminated segments.In this case, the number of elimination combinations is N ec = 40, and the probability of eliminating defects in each elimination combination is P(D) = 35 %.Various combinations were eliminated to complete the elimination of period defect.

Statistics and optimization of fitting results
In the SOM, the coincidence errors between the residual signals and fitting curves are calculated to obtain the optimal fitting trigonometric polynomial of the measurement signal.According to Step 1 and Step 2, N ec groups of the fitting curve f(x i ) are obtained.
The residual sum of the squares of the two curves represents the coincidence error.The coincidence error of the mth eliminated combination is defined as where n represents the number of data points in the residual signal.The N ec combination coincidence errors and corresponding signal amplitudes were calculated, and all coincidence errors were evaluated using probability statistics to obtain the optimum trigonometric polynomial.The coincidence error was taken as the abscissa, and the signal amplitude was taken as the ordinate.The distribution of the coincidence errors is shown in Fig. 5

Simulation signal generation
A standard harmonic signal is generated, whose coefficient vector is represented as Random noise components and periodic defect components were superimposed on the standard harmonic signal.The simulation signal function can be expressed as Eq. ( 10), and a simulation signal is generated, as shown in Fig. 7. y = a 00 + a 01 cosk 1 x + b 01 sink 1 x + a 02 cosk 2 x + b 02 sink 2 x + … + a 0n cosk n x + b 0n sink n x + f noise x + f def x (10) where f noise (x) is the added random noise component and f noise ( is the added periodic defect component, which is determined by the period T, position (initial phase x id ), span S def , maximum amplitude A def , and number of cycles N def .The defect period is equal to the fundamental period of the standard harmonic signal, N def is set to 1 or 2, and x id , S def , and A def are random.Each defect occurs in the simulation signal periodically with different amplitudes, and the function of the defect component is expressed as where R n is a random number in the range of [-1,1] and is used to produce periodic defects with different amplitudes; n is the nth periodic defect, n = 1,2,…N c ; and h(x) is the distance from each point to the defect middle phase, which reflects the shape of the defect hump, ℎ x = cos x − cos x id + S def /2 2 + sin x − sin x id + S def /2 2 .

Signal fitting using OLS, SOM, WT and GPR
In our experiment, we used four curve-fitting methods: OLS, SOM, WT, and GPR.The fitting parameters of the four methods are defined in this subsection, and their fitting results are compared using an error calculation matrix, as shown in Fig. 8.
For the simulation signals fitted by OLS and SOM, the coefficient vectors of the fitting curve for OLS and SOM can be written as The simulation signals were processed by the WT filter using the MATLAB Wavelet Analyser toolbox.After tuning with different hyperparameters in the WT filter, we chose the Daubechies wavelet with six vanishing moments (the corresponding filter type in MATLAB is 'db6') and set the wavelet decomposition to four.
Similar to WT, the simulation signals are fitted by GPR using MATLAB (function 'fitrgp').The fit method is set as 'exact', the basis function is 'pureQuadratic', and the basis matrix is In Fig. 8, the black lines are simulation signals, the blue curves are standard harmonic signals, and the red, green, cyan, and orange curves represent the fitting results using the SOM, OLS, WT, and GPR methods, respectively.As shown in Fig. 8, the fitting curves of WT and GPR deviate significantly from the standard harmonic signal.This confirms that the WT filter and GPR fitting are not suitable for processing harmonic signals mixed with defect components.Thus, WT and GPR will not be further introduced in subsequent simulations.The fitting curves processed by OLS and SOM are similar to the standard harmonic signal, and the fitting errors and efficiency analyses of the two methods are calculated and compared in the following section.Four groups of operation variables were randomly selected to process the simulation signals to prove the generality of the SOM.The operation variable group (N seg , N sec , N del ) was set as group 1: (4, 2, 5), group 2: (6, 4, 2), group 3: (8, 6, 3), and group 4: (10,2,3).The influence of the number of simulations N rs on the fitting results is not apparent in the process of numerous repeated simulations; therefore, N rs is set as 1000 to save simulation time and ensure the reliability of the fitting method.The OLS and SOM fitting errors of each simulation were calculated and analysed, and the frequency distribution of the fitting error δ is shown in Fig. 9 (a 1-4 ).The abscissa represents the fitting error δ and the ordinate represents the corresponding simulation frequency F(δ) of the fitting error δ.

Efficiency analysis
The coordinates are normalised to the range of [0,1], with the total fitting effectiveness of the coordinate system equal to 1 (100 %).As shown in Fig. 9 (b 1-4 ), the abscissa is normalised to the relative error δ r , which is the relative value between the fitting error and the maximum permissible error.Fitting accuracy can be effectively guaranteed when the fitting error is less than A noise .Thus, the maximum tolerance interval for error was set as A noise in this study.The ordinate is normalised to the fitting success probability, representing the relative value of the total frequency below a certain error and the number of repeated simulations N rs .Therefore, the fitting success probability is expressed as The fitting success probabilities ψ(δ r ) of OLS and SOM are shown in Fig. 9 (b i ).The red and blue lines represent the fitting success probability curves of the two methods, respectively.They can be viewed as typical receiver operating characteristic (ROC) curves.
The colour blocks are the area under the curve (AUC), which reflects the method's total fitting effectiveness ε in the range of [0, 1].The total fitting effectiveness is presented in Eq. ( 15).To obtain a higher fitting success probability within a small fitting error, the AUC should be as large as possible.
ε= 1 The numerical values of each group's fitting results are shown in Table 1, which includes fitting success probabilities ψ(δ r ) when δ r = 1 (100 %) and the total fitting effectiveness ε.
As shown in Fig. 9 and Table 1, for all four operation variable groups, the fitting results of the simulation signals obtained by SOM were significantly higher than the fitting results of OLS.Both the fitting success probabilities ψ(1) and the total fitting effectiveness ε of SOM are significantly higher than those of OLS.ψ( 1) is guaranteed to be greater than 80 % in all groups.This suggests that, when compared to the OLS method, SOM can significantly reduce the influence of defect components and improve fitting accuracy.
In addition, the range of total fitting effectiveness of various operation variable groups is distinctly large, from 65.06 to 96.70, when ε of OLS is approximately the same.
This indicates that the operation variable groups play an essential role in the total fitting effectiveness, and it is paramount to use the correct group of operation variables to achieve optimal curve fitting results.The association of each signal characteristic parameter with the total fitting effectiveness and the selection principle of operation variable groups is evaluated in Section 4.

Operation variable selection and optimisation
Because the parameters of the standard harmonic signal and interference are arbitrary, the fitting results of various signals are different in the same operation variable group (N seg , N del , N sec ).Therefore, the relationship between the signal characteristics and the optimal operation variable is investigated to acquire the selection principle of (N seg , N del , N sec ).In this section, the total fitting effectiveness is represented by the total failure probability ε f = 1-ε, which shows the fitting effect more intuitively.

Selection of operation variable N sec
To analyse the association between sample size N sec and fitting effectiveness, the total failure probability ε f for different values of N seg and N del are calculated and shown in Fig. 10.There are seven groups of operation variables: (2, 1, N sec ), (3, 2, N sec ), (4, 2, N sec ), (5, 1, N sec ), (6, 3, N sec ), (7, 5, N sec ), and (8, 7, N sec ).The abscissa represents the sample size (N sec ) and the ordinate is the total failure probability ε f .
As shown in Fig. 10, the total failure probability ε f is significantly high in the case of N sec = 1, and the influence of the increase in N sec is slight when N sec greater than 3. Thus, the situation of N sec = 1 should be avoided when setting a generic algorithm, and the selection principle of N sec is that N sec is usually set to 2 or 3 to ensure higher processing efficiency and fitting accuracy simultaneously.As shown in Fig. 11 (a) and (b), the total failure probability ε f will be slightly reduced with increasing data volume for each operation variable group.Compared with W s , the influences of N c and D sd are not apparent, and ε f can be controlled to less than 5 % in most cases.The ε f of each operation variable was similar when N c and D sd were constant, and the largest difference was 2 %.
The total failure probability ε f of (2, 1, 3) and (8, 6, 3) in Fig. 11 (c) is evident for the simulation signal, including the frequency multiplier.This shows that the extraction of valid information is difficult when too many signals are eliminated.Therefore, signal rejection should be minimised as much as possible on the premise that S seg > S def if harmonic signals are mixed with the frequency multiplier components.
(2) Effect of defect component's parameters: As the determining factors of the signal waveform, defect parameters (e.g., S def , A def and N def ) should be carefully considered to optimise the selection principle of operation variables.The simulation signal can no longer be regarded as a low-frequency harmonic signal when S def > T/4; therefore, the research range of S def is (0, 90].Their influence on the total failure probability ε f was calculated, as shown in Fig. 12. As shown in Fig. 12 (a) and (b), the influence of large defects and the number of defects are evident and cannot be ignored.The defect be eliminated efficaciously, only if S seg greater than 1.5S def and N del ≥ N def and ε f is less than 5 %.In addition, all cases have a high fitting accuracy when N def = 1; however, if N def ≥ 2, N del should be larger than N def to obtain a higher fitting accuracy.As shown in Fig. 12 (c), ε f increases with A def expansion; ε f is generally less than 5 %, but ε f of the cases N del = 1 is larger than in other cases.Fig. 12 shows that the operation variables should satisfy N def + 1 ≤ N del < N seg /2.
(3) Effect of noise amplitude: Owing to environmental interference, the measurement signals often interfere with various noise components.A noise reflects signal stability; the influence of A noise on the total failure probability is shown in Fig. 13.
Fig. 13 shows that the changing trend of ε f is insignificant except for ( 8, 6, 3) and (8, 4, 3).This result indicates that the eliminated signal's span (N del S seg ) should be less than half of a period (that is, N del < N seg /2) when the noise component is significant.

4.2.2
Orthogonal experiment design of main parameters-According to Section 4.2.1, the influence of D sd and N c is slight, and the relationship between N def and N del is given, so they are set as constants, that is, D sd = 1024, N c = 5, and N del = 1.Other parameters W s , S def , A def, and A noise were selected as the main variables to ensure the selection principle of the operation variables N seg and N del .The orthogonal experimental design is shown in Table 2, which includes four factors and three levels.
To comprehensively analyse the relationship between the signal parameters and the SOM's operating variables, the effect of signal parameter variation on processing time t p and total failure probability ε f were analysed as follows.
(1) Computational efficiency analysis: In general, the processing time t p is related to the number of processed groups and the number of data points per group.Fig. 14 shows that the processing time t p of Case 1 was fitted by linear regression.
The processing time t p is proportional to the data volume V d and the number of eliminated combinations N ec by performing a linear regression on the data.The mathematical expression of the linear regression curve is: where V d is the data volume, V d = (1-N del /N seg )N c D sd , and c f is the fitting coefficient of the waveform of simulation signal.The fitting coefficients c f of Cases 1 to 9 were calculated, and the results are shown in Table 4.
It can be seen from Table 4 that the processing times of Cases 1-3, Cases 4-6, and Cases 7-9 are marginally the same.Compared with Table 2, the main factor determining the fitting coefficient c f is the fluctuation frequency W s of the signal.The signal acquisition parameters (N c and D sd ) should be selected according to the accuracy requirement for simulation signals with the same fluctuation frequency W s .
(2) Fitting accuracy analysis: As mentioned previously, the total failure probability is ε f = 1-ε.The relationships between each operation variable group (N seg , N del , N sec ) and the total failure probability ε f are depicted in Fig. 15.Each bar represents an operation variable group (N seg , N del , 3).The abscissa represents N seg = 2,3,…,8, and the rainbow bars represent N del = 1,2,…,7, successively.The ordinate represents the total failure probability, ε f .The black and pink curves represent the changing trends of ε f with increasing N seg or N del , respectively.
As shown in Fig. 15, except for the same signal parameters, N def = 1, D sd = 1024, and N c = 5, there is only one parameter with the same definition value in each row, column, and diagonal.The signal waveform W s of the three sub-graphs in each row of Fig. 15 are the same; ε f of the first row is less than 5 % in most operation variable groups, and ε f increases as the signal order increases.S def of each column is the same, and ε f of the second column is smaller when N del S seg ≈S def , which demonstrates that the signal span of the removed part should be approximately equal to S def .Beyond that, A def of the counterdiagonal and A noise of the leading diagonal are the same, and they have less association than S seg and W s .
S seg and W s are the main factors to be considered in the selection process of the operation variable group.By comparing the ε f of the first row and first column, the influence of W s should be primarily evaluated if the harmonic signal is mixed with the frequency multiplier.When a large defect exists in the signals, N del should be increased appropriately.In most cases, the value of S seg should satisfy the conditions S seg > S def and S seg = 1.5 ~ 2S def are commonly used.N del should be set as small as possible, on the premise that N del > N def .When S def is uncertain, regardless of the signal parameters, a larger N seg and N del = 1/2N seg is a good choice.

Selection principles of the optimum operation variable group
This subsection summarises the effects of the signal parameters on the total fitting effectiveness.The research results on the association of signal parameters with fitting effectiveness and suggestions for operation variables are listed in Table 5.
To obtain a higher fitting accuracy, the selection principles of the operation variable group (N seg , N del , N sec ) are as follows: (1) S seg should be larger than S def , and usually S seg = k s S def , k s ∈(1.5,2],N seg = T/S seg . (2) The selection of N del should ensure complete elimination of defects while preserving as much valid information as possible, usually N def + 1 ≤ N del < N seg /2.
(3) N sec is proportional to the processing time and fitting accuracy, which are usually set to 2 or 3 to balance their influence.
(4) When harmonic signals are mixed with frequency multiplier components, the amount of signal rejection N del S seg should be minimised as much as possible, on the premise of satisfying principles (1)-( 3).
Based on the selection principle, periodic defects are effectively eliminated, and the fitting accuracy of the harmonic signals can be distinctly improved.However, similar to the selection of degree n in polynomial curve fitting based on the root mean square error for different data, the fitting errors of various operation variable groups are different; therefore, the coincidence error e cm is selected as the judging criterion when seeking the most suitable (N seg , N del , N sec ).The smaller the coincidence error, the higher the fitting accuracy.
The operation variable must be chosen according to the operator's requirements and signal conditions.
In some cases, the span of defect is unknown.We propose an estimation method for the optimum operation variable group to simplify the selection of operation variables.The estimation method is based on minimising the coincidence error e cm .All operation variable groups that satisfied the selection principles were selected, and their coincidence errors were calculated.The operation variable group with the smallest coincidence error can be approximated as the optimal group for the measurement signal.

Measurement signal acquisition
In this section, a bending shaft with a small deflection y d is used as an example to validate the effectiveness of SOM in real-life engineering applications.When the bending shaft rotates, the measurement signals are regarded as harmonic signals that are resilient to systematic mechanical errors and environmental disturbances.The fundamental period of measurement signal is equal to the rotational period of the bending shaft.The measurement signal of surface damage (e.g.burrs, pits, bumps) can be taken as periodic defect components in harmonic signals, whose period is the same as the rotational period of the bending shaft.As shown in Fig. 16, the bending shaft was clamped on the experimental platform by a pair of centres.The driving force of shaft rotation is transmitted from the stepper motor to the driving centres by a gear drive organ.The fluctuation signal of the bending shaft rotation y was measured by the measuring mechanism, and the rotation angle x was collected in real time by an optical rotary encoder.
To verify the effectiveness of the SOM, we first collected the bending deformation signals of a smooth shaft.The waveform of the measurement signal of the smooth shaft was flat, with only a few noise components.Then, two surface damages were carried out in-house at different positions of the measured outline of the bent shaft in sequence.The deformation signals of the bending shaft for two cases of surface damage were collected.Each cycle of the measurement signal after the first damage included one periodic defect, and the measurement signal after the second damage included two periodic defects.Each measurement experiment was repeated five times.Surface photographs of the measured profiles and the corresponding measurement signals are shown in Fig. 17.

Validation of SOM's effectiveness
Three groups of measuring signals with zero, one, and two periodic defects were fitted using OLS and SOM, respectively.The trigonometric first-order component f 1 (x) = a 1 cos(x) + b 1 sin(x) of the measurement signals reflects the bending state of the shaft.The first-order amplitude of the measurement signal A 1 = a 1 2 + b 1 2 is the deflection of the bending axis of the shaft at the measured point, that is, A 1 = y d .The fundamental period of the measurement signals is T.
The fundamental periods of the measurement signals were equal to the rotation period of the bent shaft, where T 360°.The measurement signals of the smooth shaft can be approximated as a standard harmonic signal; therefore, the operation variable group can be selected randomly, which is set as (10,1,3) in this part, whose fitting curve is shown in Fig. 18 (a).The defect span of the first damage was S def1 ≈25°, and the value range of the operation variable group was N seg ∈ [6,9], N del ∈ [1, N seg /2], and N sec = 3.The defect spans of the second damage are S def1 ≈25° and S def2 ≈15°.The value ranges of the operation variable group are N seg ∈ [6,9], N del ∈ [2, N seg /2], and N sec = 3.The coincidence error e cm of each operation variable group and corresponding first-order amplitude A 1 are listed in Table 6.
It can be seen from Table 6 that the first-order amplitudes A 1 of the various operation variables are similar.According to the selection principle shown in Section 4.3, the optimal group of measurement signals with one damage and two damages is (8, 4, 3) and (7,3,3), respectively.The fitting curves and corresponding first-order curves obtained by the two methods were calculated and are shown in Fig. 18 (b) and (c), respectively.
The fitting results for each measurement signal were calculated and are listed in Table 7.
A 1 represents the first-order amplitude of each measurement signal.A mean represents the average amplitude of five repeated measurements.e is the difference between A 1 of each measurement signal with a defect and A mean , reflecting the robustness of the fitting method.
As shown in Fig. 18 and Table 7, both OLS and SOM had good fitting effects for the measurement signals of the smooth shaft.However, in the process of fitting measurement signals with defect components, the fitting curves processed by OLS significantly deviate from the normal trajectory, while that of SOM has a high coincidence with measurement signals, which shows that the fitting effect of SOM is more accurate than that of OLS.As shown in Table 7, the fitting error of the measurement signals with defect components using OLS is noteworthy.The fitting effect of the SOM is better than that of OLS, and the fitting error e≈1 is approximately equal to the resolution of the sampling devices.
The results in this section show that the fitting accuracy of the SOM is higher than that of OLS.SOM can effectively process measurement signals with periodic defects and precisely obtain the component characteristics of the signal.

Conclusion
In this study, a novel distribution-based selective optimisation method (SOM) is proposed to effectively process harmonic signals with large periodic defects.SOM is the first and most accurate and effective method based on unsupervised distribution-clustering methods for defect component elimination.The SOM process includes signal segmentation, random elimination of segments, residual signal regression fitting, error distribution statistics, and Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts fitting curve optimisation.The fitting accuracy and computational effectiveness of SOM were verified in a theoretical model and compared with conventional signal fitting methods, including least mean square, wavelet transform-based methods, and Gaussian regression.
The operation variables of the proposed SOM include the segment number, eliminated segment number, and sample size.The associations of the signal characteristic parameters with the fitting effectiveness are investigated, and the selection principle of the operation variables is given in detail.The segment number is related to the defect span.The number of eliminated segments is determined by the segment number and defect number per cycle, and the sample size is usually set to 2 or 3 to balance the fitting accuracy.
Meanwhile, we use a bent shaft with zero, one, and two defects as an example to examine the feasibility of SOM in real-life industrial settings.This experiment confirms that SOM can simultaneously eliminate periodic defect signals from harmonic signals and perform signal regression fitting both accurately and effectively.
In conclusion, this study confirms that the proposed method can effectively eliminate periodic defects and extract standard harmonic components from the measurement signal.
It has broad application prospects in mechanical, electronic, instrument, and aerospace industries.The event that defect is involved by one segment

Nomenclature
The event that defect is involved by two adjacent segments

D
The event that defects are eliminated The event that defects are eliminated when C 1 is satisfied The event that defects are eliminated when C 2 is satisfied                        The fitting results of the SOM and OLS methods.The fitting coefficient value of each case.
(a), where each star represents the fitting result of each elimination combination.The frequency of some coincidence errors is taken as the ordinate, and the probability statistics of the coincidence errors are shown in Fig.5 (b).The processed signal and operation variables in Fig.5are identical to those in Fig.4.As shown in Fig.5(a), there were three sets of obvious clustering points in the distribution graph, and the coincidence error in these clusters was relatively small.Meanwhile, it can be seen from Fig.5 (b) that the group of data with the minimum error have the highest frequency.The optimum coefficient vector P opt was obtained by selecting the dataset with the smallest coincidence error and averaging the corresponding trigonometric polynomial coefficients.The SOM fitting function can be expressed as f opt (x i ) = P opt A. The fitting results of the harmonic signal in Fig.4using SOM are shown in Fig.6.3Effectiveness and applicability of SOMThe effectiveness and applicability of the SOM are verified in the following theoretical simulation.The simulation signal can be considered a standard harmonic signal mixed with periodic defects and random noise.The parameters of the standard harmonic signal include the signal amplitude A 0 , fluctuation frequency W s , sampling density of the data acquisition equipment D sd , and number of sampling signal cycles N c .Considering SOM's universality, the signal amplitude A 0 was normalised (i.e.A 0 = a 01 2 + b 01 2 = 1).The defect component parameters include defect amplitude A def , defect span S def , and number of defects per cycle N def .The noise component was ensured according to the noise amplitude, A noise .The fitting success probabilities of various random signals are calculated to verify the effectiveness of the SOM in this section.
a e0 a e1 b e1 a e2 b e2 … a en b en P SOM = a p0 a p1 b p1 a p2 b p2 … a pn b pn (12)

Fig. 8
Fig. 8 (a) shows the curve-fitting results of the simulation signal, which was previously shown in Fig. 7.In Fig. 8 (b)-(e), we provide four typical examples of defect signals and Numerical simulations were carried out to intuitively compare the fitting success probabilities of OLS and SOM.Without compromising generality, the initial signal characteristic parameters were set as follows: standard harmonic signal parameters, D sd = 1024, W s = [1, 3/7, 2, 3/4], N c = 5.The defect component parameters are S def ∈ [0,30°], A def ∈ [0,5], and N def ∈ [0,2].Noise component parameters: A noise [0,0.05].By calculating the difference between the coefficient vectors of the two fitting methods and that of the simulation signals, the fitting errors of OLS and SOM can be written as δ OLS = ∑ P OLS −

4. 2
Selection of operation variables N seg and N del 4.2.1 Influence of signal parameters on total failure probability-To explore the influence of each signal parameter on the total failure probability, N sec was set to 3 according to the selection principle described in Section 4.1, and the operation variable groups (2, 1, 3) (5, 1, 3) (8, 1, 3) (8, 2, 3) (8, 4, 3) (8, 6, 3) were selected as research objects, considering the influence of N seg , N del , and N del /N seg on the total failure probability ε f .(1)Effect of standard harmonic signal parameters: Standard harmonic signal parameters mainly include the number of sampling cycles N c , sampling density D sd , and signal fluctuation frequency W s , where N c and D sd determine the data volume, and W s determines the harmonic signal waveform.Their influence on the total failure probability ε f was calculated, as shown in Fig.11.

Fig. 5 .
Fig. 5. Distribution and statistics of coincidence errors under various eliminated combinations.(a) Distribution and clusters of coincidence error.(b) Frequency histogram of coincidence error.

Fig. 7 .
Fig. 7. Simulation signal for method verification.(a) Standard harmonic signal; (b) Interference component, A noise is noise amplitude; (c) Periodic defect component, S def is the defect span; (d) Simulation signal acquired by standard signal mixing with defect and interference components.

Fig. 14 .
Fig. 14.The relationship between processing time and data size of case 1.

Fig. 17 .
Fig. 17.Surface topography and measurement signal of the measured point on the bending shaft.(a) Smooth surface; (b) First damage; (c) Second damage.

Waveform of harmonic signals that merged with periodic defects.
Residual signalResidual data of measurement signal after signal elimination in SOM / the processing result of Step 2 and the object ofStep 3 Fig. 1.Fig.

2. Flow diagram of the selective optimization method.
Fig.

3. Schematic diagram of signal preprocessing.
Xin et al.Page 34 Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.
Europe PMC Funders Author ManuscriptsEurope PMC Funders Author Manuscripts Xin et al.Page 35 Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.
Europe PMC Funders Author ManuscriptsEurope PMC Funders Author Manuscripts Xin et al.Page 37 Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.
Europe PMC Funders Author ManuscriptsEurope PMC Funders Author Manuscripts

Table 1
Xin et al.Page 39 Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.

Table 5 Association of signal parameters with the fitting effectiveness. Signal parameter Changing trend of fitting success probability with signal parameter increasing Application suggestions for SOM variables
Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.

Table 6 Fitting result of each operation variable group. N seg N del One damage Two damage e cm A 1 e cm A 1
* Minimum coincidence error e cm in columns.Mech Syst Signal Process.Author manuscript; available in PMC 2023 August 31.