Nondestructive Prediction of Isoflavones and Oligosaccharides in Intact Soybean Seed Using Fourier Transform Near-Infrared (FT-NIR) and Fourier Transform Infrared (FT-IR) Spectroscopic Techniques

The demand for rapid and nondestructive methods to determine chemical components in food and agricultural products is proliferating due to being beneficial for screening food quality. This research investigates the feasibility of Fourier transform near-infrared (FT-NIR) and Fourier transform infrared spectroscopy (FT-IR) to predict total as well as an individual type of isoflavones and oligosaccharides using intact soybean samples. A partial least square regression method was performed to develop models based on the spectral data of 310 soybean samples, which were synchronized to the reference values evaluated using a conventional assay. Furthermore, the obtained models were tested using soybean varieties not initially involved in the model construction. As a result, the best prediction models of FT-NIR were allowed to predict total isoflavones and oligosaccharides using intact seeds with acceptable performance (R2p: 0.80 and 0.72), which were slightly better than the model obtained based on FT-IR data (R2p: 0.73 and 0.70). The results also demonstrate the possibility of using FT-NIR to predict individual types of evaluated components, denoted by acceptable performance values of prediction model (R2p) of over 0.70. In addition, the result of the testing model proved the model’s performance by obtaining a similar R2 and error to the calibration model.


Introduction
The global market related to functional food has been increasing considerably in the last decade. In Asia, the market is predicted to continue to grow due to the high population [1]. In terms of functional foods, soybeans are one of the most interesting products to be evaluated due to the complex chemical compound inside. Soybeans are one of the world's most protein sources and are the primary protein supply in animal feed. It is also reported as a leading share in the global oilseed output and contributes to 60% of the world's meal production. Wilson in Roger-Boerma [2] reported that protein common anthocyanins in soybeans have also been successfully predicted based on the spectra of NIR [28]. However, all of the previous research for soybean collected spectral data using a powder sample type.
An investigation of soybean chemical components based on the data obtained from seed samples has also been carried out. Kovalenko et al. [29] evaluated amino acids content in soybean based on NIR spectral data of the seeds. Moreover, three different sample types (single-bean, whole bean, and powder) have been investigated to determine their effects on the success of predicting protein and amino acids in soybeans [30], while FT-NIR and FT-IR (in MIR region) spectra from intact seed have also been successfully used to predict anthocyanin in soybean [31]. The reports reveal that using a single bean to obtain NIR and MIR spectral data is a promising strategy for nondestructive chemical component evaluation of soybeans. However, soybean protein and amino acid content are relatively high (about 40% dw), and anthocyanin is only found on the bean surface.
Our concern in this study was soybean isoflavones and oligosaccharides, which are available in low concentrations and distributed in the whole bean part. Therefore, the effectiveness of FT-NIR and FT-IR was investigated to predict the above-mentioned targeted components based on the spectral data of a single bean. In addition, a more detailed investigation of the individual type of each microcomponent has also been carried out and presented side-by-side with the results of powder samples.

Sample Preparation
In this research, 310 soybean samples were used to develop models to predict the examined soybean components. These samples were supplied by Rural Development Administration (RDA), South Korea, and among them, seventy samples had black color coats. The bean size was varied based on the sample ranging from 2.90 to 7.90 mm. In addition, the bean size within a single sample was also varied, which possibly affected the chemical content among different seeds [32]. Therefore, the seed's chemical composition variability will affect the spectral data measured by every instrument. On the other hand, the evaluation of chemical components was constrained by the minimum weight requirement in the extraction process prior to the HPLC process. Hence, in this study, 30 g of soybean seeds were collected from each examined sample, and among them, twenty-one beans were selected randomly for the spectral data acquisition using Fourier transform near-infrared (FT-NIR) and Fourier transform infrared (FT-IR) systems. Therefore, there were 6510 (21 bean × 310 samples) spectral data collected for every instrument, which were then divided into calibration and validation datasets about 70% and 30%, respectively. After being measured using both instruments, all 30 g of beans were ground, obtaining powder that passed through the sieve designation mesh no 60, with a sieve opening at 250 microns. The spectral data for the powder samples were then collected using both instruments in the same instrumental setup. Finally, the powder samples were sent for chemical analysis to obtain reference values for further analysis.
The additional 65 soybean samples containing yellow soybeans (50 samples) and black soybean (15 samples), which were not initially involved in the model construction and obtained from the same institution, were used for testing the resulting model. Twenty-one beans were also selected randomly from each sample and collected their spectral data using both instruments.

FT-NIR Spectroscopy
Laboratory FT-NIR spectrometer Antaris II FT-NIR analyzer (Thermo Scientific Co. Waltham, MA, USA) was performed to acquire the diffuse reflectance spectral data of soybean seed in range of waveband between 10,000 and 4000 cm −1 (1000-2500 nm). A single bean was placed in the sample holder and scanned 32 times to obtain average spectral data in the specified waveband at 4 cm −1 intervals. The interference of environmental light during the measurement process was blocked by covering the sample holder with a black lid ( Figure 1A). Similar techniques and instrumental settings were applied to collect spectral data of powder samples. attenuated total reflectance (ATR) sampling mode. The absorbance spectra of each seed were collected in the MIR wavelength region ranging from 4000 to 400 cm −1 (2500-25,000 nm) at the interval of 4 cm −1 spectral resolution. During the measurement, soybean seed was placed on the surface of the diamond crystal and clamped using a pointed tip ( Figure  1B), while the background was collected by scanning an empty plate. As a result, the average spectrum of 32 successive scans of each seed was obtained for further analysis. In this study, the FT-IR with an ATR sampling model was selected due to only requiring a very small quantity of sample [33], allowing the analysis of the sample in its natural state. Moreover, the FT-IR ATR also enables us to measure directly onto a solid-state sample surface by pressing the sample towards an ATR crystal with no sample preparation.

Isoflavones Determination
The isoflavones determination was carried out based on the procedure used by previous researchers [34] with some modifications. One gram of soybean sample was extracted with 70% EtOH (40 mL) at room temperature for 24 h. The 70% EtOH extract was

FT-IR Spectroscopy
The spectra of soybean seeds and powder samples were collected using an FT-IR spectrometer Nicolet 6700 (Thermo Scientific Co., Waltham, MA, USA) equipped with an attenuated total reflectance (ATR) sampling mode. The absorbance spectra of each seed were collected in the MIR wavelength region ranging from 4000 to 400 cm −1 (2500-25,000 nm) at the interval of 4 cm −1 spectral resolution. During the measurement, soybean seed was placed on the surface of the diamond crystal and clamped using a pointed tip ( Figure 1B), while the background was collected by scanning an empty plate. As a result, the average spectrum of 32 successive scans of each seed was obtained for further analysis. In this study, the FT-IR with an ATR sampling model was selected due to only requiring a very small quantity of sample [33], allowing the analysis of the sample in its natural state. Moreover, the FT-IR ATR also enables us to measure directly onto a solid-state sample surface by pressing the sample towards an ATR crystal with no sample preparation.

Isoflavones Determination
The isoflavones determination was carried out based on the procedure used by previous researchers [34] with some modifications. One gram of soybean sample was extracted with 70% EtOH (40 mL) at room temperature for 24 h. The 70% EtOH extract was filtered with Whatman No. 6 (Whatman Inc., Maidstone, UK) filter paper. The 20 mL of 70% EtOH extract was evaporated under 40 • C and dissolved in 70% EtOH (5 mL). Before analysis, all samples were filtered through a 0.  The gradient elution profile was as follows: 0 min, 85% A, 0-60 min, 85-70% A; 60-65 min, 70-60% A; 66 min, 85% A; 66-75 min, 85% A. The flow rate was 1.0 mL/min, and the injection volume was 20 µL. The isoflavone contents were calculated by comparing HPLC peak areas with external standard calibration curves. The linear standard calibration curves (R 2 = 0.999) were generated by injecting 0-20 µg of isoflavone standards in 1 mL of 80% methanol.

Oligosaccharides Determination
In brief, soybean powder (1.0 g) was extracted with 70% EtOH (10 mL) at room temperature for 24 h and then centrifuged at 5000× g for 20 min. Next, the supernatant was filtered with a Sep-Pak C18 solid-phase extraction cartridge (Waters, Milford, MA, USA), and the remaining residue was then dissolved in water. Finally, the diluted extract (20 µL) was injected into an HPLC system.

Spectral Data Preprocessing and Multivariate Analysis
All spectral data collected using FT-NIR and FT-IR were subjected to several preprocessing methods, including normalization (mean, range, and max), standard normal variate (SNV); Multiplicative Scatter Correction (MSC); and Savitzky-Golay (SG) first and second derivative methods. This process aims to remove noise generated during the measurement due to instrumental or environmental effects [35,36]. The obtained preprocessed spectral data were then divided into calibration and validation datasets by 70% and 30%, respectively.
Afterward, a multivariate analytical model of partial least square regression (PLS-R) was selected to develop models to predict the content of the targeted components. The general equation for PLS-R is expressed as follows: where X is an m × n matrix that holds the spectra values of the sample, B is the regression coefficient, and E is the error term. In this study, for the construction of the PLS-R model, spectral data of soybean seeds and powder were arranged in a matrix X, while the Y matrix contained the reference values obtained from chemical analysis. The values X and Y are decomposed into latent variables (LVs) to establish a linear connection between the response and predictor variables. The optimum number of LVs was selected based on the standard procedure documented by Varmuza and Filzmozer [37] Section 4.2.
T and U are score matrices, whereas P and Q are loading matrices. E x and E y are the error matrix of X and Y, respectively. Statistical methods, i.e., coefficient determination (R 2 ) and standard error (S.E.) for both calibration and validation datasets, were used to select the best-obtained model [38]. The R 2 is the proportion of the variance in the dependent variable that is predictable from the independent variables and can be found with Equation (4). Meanwhile, the standard error of the regression represents the average distance that the observed values fall from the regression line, which can be calculated using Equation (5).
whereŷ i represents predicted value, y i represents the reference value and y is the mean of the reference values. The number of samples is symbolized with n. All data analysis processes were conducted using MATLAB software (The Math Works, Natick, MA, USA, R2019a). The workflow for model development was summarized and presented as Figure 2 (developing model).

Spectral Data Interpretation
The characteristics of the FT-NIR and FT-IR spectra for the soybean (Figure 3) represent some functional groups associated with the main components of the soybean, such as protein, carbohydrates, lipid, and moisture. The NIR spectra arise due to the vibration energy change between the fundamental (ground) state to the higher-order level or the combination bands. For the soybean sample, the near spectra can be highlighted as follows. The bands range from 5200 to 5100 cm −1 , corresponding to the O-H vibration's first overtone or combination mode, while the band between 5600 and 5000 cm −1 can be assigned to the vibration of C-H functional groups (first overtone and HC = CH form) re-

Model Testing
To evaluate the applicability of the obtained model, the spectral data of 1365 soybean seeds belonging to 65 soybean varieties not initially involved in the model construction were collected using both instruments (FT-IR and FT-NIR). Then, the prediction values of the targeted components were calculated by multiplying the obtained regression coefficient with the seed spectral data in the new data set. On the other hand, after acquiring the spectral data, the soybean seeds were ground and sent for chemical analysis to examine the content of the targeted components. The coefficient determination (R 2 ) and the standard error (SE) were then calculated to evaluate the model performance by using Equations (4) and (5), respectively. The summaries of the steps for model testing was presented in Figure 2 (testing model).

Spectral Data Interpretation
The characteristics of the FT-NIR and FT-IR spectra for the soybean (Figure 3) represent some functional groups associated with the main components of the soybean, such as protein, carbohydrates, lipid, and moisture. The NIR spectra arise due to the vibration energy change between the fundamental (ground) state to the higher-order level or the combination bands. For the soybean sample, the near spectra can be highlighted as follows. The bands range from 5200 to 5100 cm −1 , corresponding to the O-H vibration's first overtone or combination mode, while the band between 5600 and 5000 cm −1 can be assigned to the vibration of C-H functional groups (first overtone and HC=CH form) related to the fatty acid component. Protein, the main soybean component, can be associated with the band between 5000 and 4500 cm −1 , where the N-H and C=O stretching were detected [26]. The absorption bands ranging from 5720 to 6030 are related to the C-H first overtone region that correlated with the carbohydrates [39].

Spectral Data Interpretation
The characteristics of the FT-NIR and FT-IR spectra for the soybean (Figure 3) represent some functional groups associated with the main components of the soybean, such as protein, carbohydrates, lipid, and moisture. The NIR spectra arise due to the vibration energy change between the fundamental (ground) state to the higher-order level or the combination bands. For the soybean sample, the near spectra can be highlighted as follows. The bands range from 5200 to 5100 cm −1 , corresponding to the O-H vibration's first overtone or combination mode, while the band between 5600 and 5000 cm −1 can be assigned to the vibration of C-H functional groups (first overtone and HC = CH form) related to the fatty acid component. Protein, the main soybean component, can be associated with the band between 5000 and 4500 cm −1 , where the N-H and C = O stretching were detected [26]. The absorption bands ranging from 5720 to 6030 are related to the C-H first overtone region that correlated with the carbohydrates [39].  The FT-IR spectra ( Figure 3B), consistent with the vibration in the primary energy level and the first excited vibration, can be basically separated into two waveband ranges. The fingerprint region (1200-900 cm −1 ) is the first range where the stretching of C-O, C-C, and C-O-C can possibly be identified. The stretching vibration of N-H that characterize protein arises around 1650 cm −1 , while the asymmetric and symmetric CH2 and CH3 closely related to fatty acid were found at the band range of 3040 to 2850 cm −1 [40]. Figure 2 also revealed a straight line in the band range between 2600 to 2100 cm −1 , indicating that no information can be obtained from this band range. Thus, this range of band was excluded for model development.

Reference Values Analysis
The result of the chemical data analysis for isoflavones and oligosaccharides was presented in Table 1.
Isoflavones are the essential flavonoid in soybean and have many health benefits, especially their ability to delay the menopausal period for women. In total, twelve types of isoflavones that belong to the three main types of isoflavones (daidzein, genistein, and glycitein) can be evaluated in the examined samples. However, the aglycone form (daidzein, genistein, glycitein) and two acetyl glucoside forms (acetyl genistin, acetyl glycitin) were only available at a low concentration level (below 50 µg/g) and had a narrow range of concentration. Therefore, the prediction model in this study was only developed for seven types of isoflavones and total isoflavones content, which was calculated by summing the  Table 1. Overall, the concentration range obtained in this study coincided with the result presented by Berhow et al. [41]  Oligosaccharides were the next soybean chemical content observed in this study. Sucrose was the major oligosaccharide evaluated in the examined soybean samples and comprised about 50-55% of the total oligosaccharides in every sample. The galactooligosaccharides from the raffinose family (stachyose and raffinose) were observed around 36 to 40%, while other types of oligosaccharides (verbascose, maltose glucose, and fructose) only can be found in trace amounts in some examined soybean samples. Overall, this result was similar to the result reported by Hollung et al. [42] when evaluating carbohydrate content in Brazilian soybean. Therefore, due to the limited data of some individual oligosaccharide types, in this study, the prediction model was only developed for total oligosaccharides and three major types of observed oligosaccharides in the examined soybean samples.

Isoflavones Model
The chemical composition of each bean within a single sample may differ due to different maturity stages and seed sizes. Thus, the spectral data of 21 seeds in a single sample were averaged, resulting in one spectrum representing one soybean sample. On the other hand, the result from the chemical analysis revealed that the soybean biochemical composition differs among the variety, which affects the number of spectral data used to develop a model for individual isoflavones types. The detail of the data can be seen in Table 1.
Our result, presented in Table 2, shows that it is possible to predict total isoflavones, as well as individual isoflavones in soybeans. We obtained the determination value for the calibration model (R 2 c ) over 0.8 and an error of about 0.32 mg/g or equal to 14% of the total isoflavones content in the examined sample. Even though the R 2 c values for individual isoflavones were acceptable (over 0.7), the error was relatively high (over 15%), affecting the model confidence. The obtained models developed based on seed spectral data of FT-IR had a lower performance than the FT-NIR model, which is indicated by lower determination coefficients values (R 2 c , R 2 p ) and higher error values (SEC, SEP). Meanwhile, the powder model of FT-IR showed comparable performance to FT-NIR. The FT-IR sample holder only allows measuring samples on a tiny area, which allows the possibility of generating inconsistent spectral data, particularly for inhomogeneous samples. Though a total of seven data preprocessing methods were used in this study, only the preprocessing methods that attained the highest performance are presented in Table 2 for brevity. Previous studies related to the application of NIR and MIR to predict isoflavones in soybean were confined by the sample number and the concentration range of isoflavones in the examined sample. Therefore, multiple linear regression analyses have been applied to NIR spectral data to predict the content of isoflavones in whole and powder soybeans using only 48 samples [43]. In addition, by using the cross-validation method, the previous study obtained an acceptable model, denoted by R 2 over 0.74 and 0.94, for individual isoflavones and total isoflavones for powder samples, respectively. The results also revealed an excellent model based on the spectra of whole soybean seeds (group of soybean) to predict total isoflavones (R 2 : 0.90). However, it failed to develop models for individual forms of isoflavones based on the spectra of the whole bean. Wang et al. [44] developed a model to predict isoflavone content in kudzu using NIRs based on 88 samples data and presented a consistent model. Berhow et al. [41] reported an excellent calibration model to predict total isoflavones with an R 2 over 0.90 using more than 700 spectral data of ground soybean samples. For all mentioned reports, no report presented an acceptable model to predict the individual form of isoflavones in an intact seed using NIR or MIR. Therefore, our study obtained an acceptable model for predicting total isoflavones as well as individual types of isoflavones using a large amount of data using NIR and MIR spectra of a single seed.
In terms of PLS-R, one of the essential stages of applying NIRs and MIRs for developing a prediction model is identifying the specific waveband, which contributes significantly to the model performance. In this study, several bands in the NIR region can be identified as influential bands for isoflavones prediction ( Figure 4A), which was also reported by previous research. The bands around 5600 cm −1 (1780 nm) and 6600 cm −1 (1515 nm) were reported by Zhang et al. [27] to have significant contributions to flavonoid and isoflavonoid prediction. These signals correspond to the stretching vibration's first overtone of C-H. Another band that is considered influential is around 8860 cm −1 (1120 nm), which can be associated with the second overtone vibration of C-H from the aromatic ring structure [45].
identified as influential bands for isoflavones prediction ( Figure 4A), which was als ported by previous research. The bands around 5600 cm −1 (1780 nm) and 6600 cm −1 ( nm) were reported by Zhang et al. [27] to have significant contributions to flavonoid isoflavonoid prediction. These signals correspond to the stretching vibration's first o tone of C-H. Another band that is considered influential is around 8860 cm −1 (1120 n which can be associated with the second overtone vibration of C-H from the aromatic structure [45].  Figure 4B locates a specific band of FT-IR spectral data for isoflavone detection of the marked bands corresponded to the unique structure of isoflavones, which con two aromatic rings. The bands around 1185 cm −1 (8430 nm) and 2000 cm −1 (5000 nm) w assigned to C-H's bending vibration and the combination band of aromatic structure. last significant band was around 1600 cm −1 (6250 nm), corresponding to the unique matic ring bonding C=C-C [46].

Oligosaccharides Model
The oligosaccharides prediction models based on FT-NIR and FT-IR spectral were presented in Table 3. The result revealed that the FT-NIR was a promising met  Figure 4B locates a specific band of FT-IR spectral data for isoflavone detection. All of the marked bands corresponded to the unique structure of isoflavones, which contain two aromatic rings. The bands around 1185 cm −1 (8430 nm) and 2000 cm −1 (5000 nm) were assigned to C-H's bending vibration and the combination band of aromatic structure. The last significant band was around 1600 cm −1 (6250 nm), corresponding to the unique aromatic ring bonding C=C-C [46].

Oligosaccharides Model
The oligosaccharides prediction models based on FT-NIR and FT-IR spectral data were presented in Table 3. The result revealed that the FT-NIR was a promising method to predict total oligosaccharides and three main types of short-chain carbohydrates in soybeans using two different sample types: seeds and powder. Furthermore, the statistical parameters obtained based on the FT-NIR spectroscopy technique showed a good agreement between the prediction values and the reference values of the powder sample, indicated by the R 2 c values over 0.75 and low error (below 1%) for all of the evaluated chemicals. The seed-based model also exhibited an acceptable result, proven by low standard error values for both SEC and SEP (lower than 1%), representing a minimum difference between prediction and reference values. In addition, the results of the FT-NIR method were slightly better than those of FT-IR, denoted by higher R 2 p values and lower standard error values (SEP) for both sample types. Table 3. The PLSR statistical model for predicting total oligosaccharides and short-chain carbohydrates in soybean using FT-NIR and FT-IR techniques. Carbohydrate is a macro component and one of the essential nutrients for humans. Thus, many previous research studies evaluated the robustness of NIR and MIR to predict this component in agriculture and food products. The effectiveness of NIR to predict carbohydrates in foxtail millet has been reported by Chen et al. [47] The full cross-validation method using 82 samples was applied to create a model and resulted in an excellent performance (R 2 ) over 0.9 and low RMSE (below 0.8%). Ferreira et al. [39] used NIR spectral data of 82 varieties of Brazilian soybeans to predict the concentration of dietary fiber and obtained a high-performance model (R 2 ) of 0.80 with an RMSEP of 0.86%. In addition, the sucrose content in soybean has also been successfully predicted by Choung [48]. However, most research was constrained by a limited number of samples, and the spectral data were acquired based on the powder sample, which is more homogenous. This current study developed models to predict total oligosaccharides as well as three main short-chain carbohydrates in soybean based on the seed spectral data. In addition, more reference values were also used to produce a more precise prediction model. Even though the obtained model revealed a lower performance, it could demonstrate the possibility of predicting oligosaccharides in soybean using an intact seed sample.

Components (Preprocessing Method) FT-NIR FT-IR
The highly correlated bands of evaluated components can be identified based on the regression coefficients. Figure 5A presents the NIR beta coefficients to predict soybean oligosaccharides, revealing three important peaks. The peak around 4380 cm −1 (2280 nm) refer to the C-H stretch and CH 2 deformation combination belonging to starch, while the peak around 5450 cm −1 (1820 nm) was assigned to the stretching of O-H or C-H bond's second overtone that possibly comes from the cellulose structure. The last pointed band, around 7570 cm −1 (1300 nm), was closely related to the vibration of the C-H second overtone/combination from the CH 2 structure [45]. Masithoh et al. [24] reported that bands at 4264 and 4380 cm −1 were influential bands for glucose. Other research reported by Fereira et al. [39] pointed out that the band at 5400 contributed to the model development based on NIR spectra for soybean dietary fiber. Oligosaccharides are part of carbohydrates; thus, their fundamental chemical structure is similar to monosaccharides and polysaccharides. Hence, the influential bands identified in this study coincide with the previous studies that evaluated carbohydrates components.
Foods 2022, 11,232 4264 and 4380 cm −1 were influential bands for glucose. Other research reported by et al. [39] pointed out that the band at 5400 contributed to the model developmen on NIR spectra for soybean dietary fiber. Oligosaccharides are part of carbohydrate their fundamental chemical structure is similar to monosaccharides and polysacch Hence, the influential bands identified in this study coincide with the previous that evaluated carbohydrates components. The regression coefficients to develop an oligosaccharides model using FT-I troscopy are shown in Figure 5B. The most significant bands to determine oligo rides can be identified at 985, 1430, 1700, and 3280 cm −1 . The bands between 900 a can be associated with the stretching and bending vibration of C-C and C-O bon bands around 1700 cm −1 represent the crystal water in raffinose pentahydrate, w bands that arise at 3260 can be assigned to the O-H stretching vibration [49].

Testing Model
Testing the model was a crucial part of ensuring the model's performance an uating the obtained model's applicability. This study developed a prediction mo isoflavones and oligosaccharides for total content as well as individual types valu the general concept of spectroscopy relies on the chemical bond vibration of the mo The regression coefficients to develop an oligosaccharides model using FT-IR spectroscopy are shown in Figure 5B. The most significant bands to determine oligosaccharides can be identified at 985, 1430, 1700, and 3280 cm −1 . The bands between 900 and 1100 can be associated with the stretching and bending vibration of C-C and C-O bonds. The bands around 1700 cm −1 represent the crystal water in raffinose pentahydrate, while the bands that arise at 3260 can be assigned to the O-H stretching vibration [49].

Testing Model
Testing the model was a crucial part of ensuring the model's performance and evaluating the obtained model's applicability. This study developed a prediction model for isoflavones and oligosaccharides for total content as well as individual types value. Since the general concept of spectroscopy relies on the chemical bond vibration of the molecular structure of a compound, thus the biological compounds with similar chemical structures result in identical vibrational frequencies under the IR spectral range [50]. Hence, the influential bands obtained from the model analysis could not clearly be confirmed to originate from a specific individual type of component. In the same manner, the prediction result can be derived from specific kinds or mixed types (total) of examined components. Previous research by Schoonjans et al. [51] reported that compounds with similar chemical structures have been classified in the close class by using the hierarchical upgma-clustering method based on the IR spectral data.
In our study, individual types of examined components have the same chemical structure backbone. Hence, the influential bands obtained from the model analysis could not clearly be confirmed to originate from a specific type of component. From this point of view, testing the model for total components is more reasonable than individual types ones. Furthermore, the evaluation of isoflavones and oligosaccharides for the commercial application was only carried out for the total component instead of individual types. In addition, this research also emphasized predicting soybean components using an intact seed sample. Hence, the testing procedure was only performed to the seed-based model of the total content of evaluated components.
On the other hand, the results obtained from the chemical analysis showed that four soybean varieties contained isoflavones lower than 0.50 mg/g, and one contained over 5.0 mg/g. Therefore, to maintain the concentration range within the range of the model, we excluded those data. The contents of the targeted components and the number of samples used for the testing procedure are presented in Table 4. Statistical analysis of testing results showed a promising FT-NIR-technique-based model, presenting a similar R 2 and error to the calibration model (Table 5). Meanwhile, the FT-IR technique presented R 2 lower than the calibration model. The distribution of the prediction values and their correlation with the reference values for the FT-NIR technique can be seen in Figure 6.  Overall, the FT-NIR demonstrated better results than FT-IR on predicting isoflav and oligosaccharides in soybean seeds. NIR radiation can penetrate the sample de than IR, which is beneficial for evaluating an agricultural product that is mostly cat rized as an inhomogeneous sample. According to William and Noris [52], a model Rp 2 values between 0.66 to 0.82 can be categorized as a model that can be used for sam screening.

Conclusions
Using an intact seed sample, FT-NIR and FT-IR spectroscopic techniques were in tigated to determine the concentration of isoflavones and oligosaccharides in soybean addition, the possibility of predicting the different types of targeted components was investigated to explore the potential of these techniques to measure microcomponen Overall, the FT-NIR demonstrated better results than FT-IR on predicting isoflavones and oligosaccharides in soybean seeds. NIR radiation can penetrate the sample deeper than IR, which is beneficial for evaluating an agricultural product that is mostly categorized as an inhomogeneous sample. According to William and Noris [52], a model with R p 2 values between 0.66 to 0.82 can be categorized as a model that can be used for sample screening.

Conclusions
Using an intact seed sample, FT-NIR and FT-IR spectroscopic techniques were investigated to determine the concentration of isoflavones and oligosaccharides in soybeans. In addition, the possibility of predicting the different types of targeted components was also investigated to explore the potential of these techniques to measure microcomponents of agricultural products. In total, 6510 seeds were selected randomly from 310 soybean samples, and their spectral data were collected using both instruments to develop a calibration and validation model. The testing model was carried out using 1365 seeds that belong to 65 soybean varieties, which were not involved in the model construction. The result showed that FT-NIR spectroscopy combined with the PLSR was a promising method predicting total isoflavones and oligosaccharides using intact soybeans seed, presenting a performance prediction model (R 2 p ) of 0.80 and 0.72, respectively. The results of the testing model also demonstrated good performances, which were close to the calibration model. Meanwhile, FT-IR spectroscopy also shows an acceptable result, even though the performance (R 2 p : 0.73 and 0.70) was lower than the FT-NIR technique. The results of this fundamental study can be used as basic knowledge to develop a seed-sorting machine based on chemical components.