Long-Term Evaluation and Calibration of Low-Cost Particulate Matter (PM) Sensor

Low-cost light scattering particulate matter (PM) sensors have been widely researched and deployed in order to overcome the limitations of low spatio-temporal resolution of government-operated beta attenuation monitor (BAM). However, the accuracy of low-cost sensors has been questioned, thus impeding their wide adoption in practice. To evaluate the accuracy of low-cost PM sensors in the field, a multi-sensor platform has been developed and co-located with BAM in Dongjak-gu, Seoul, Korea from 15 January 2019 to 4 September 2019. In this paper, a sample variation of low-cost sensors has been analyzed while using three commercial low-cost PM sensors. Influences on PM sensor by environmental conditions, such as humidity, temperature, and ambient light, have also been described. Based on this information, we developed a novel combined calibration algorithm, which selectively applies multiple calibration models and statistically reduces residuals, while using a prebuilt parameter lookup table where each cell records statistical parameters of each calibration model at current input parameters. As our proposed framework significantly improves the accuracy of the low-cost PM sensors (e.g., RMSE: 23.94 → 4.70 μg/m3) and increases the correlation (e.g., R2: 0.41 → 0.89), this calibration model can be transferred to all sensor nodes through the sensor network.


Introduction
Particulate matter (PM) is classified by size bins of maximum aerodynamic diameter (e.g., PM10 < 10 µm, PM2.5 < 2.5 µm, and PM1.0 < 1 µm). Exposure to PM is regarded as a major health risk and it causes various diseases from respiratory and cardiovascular diseases to neurodevelopmental disorders and mental disorders [1]. According to recent reviews, it globally affects a mortality rate of up to 4.2 million deaths per year [2,3]. The collection and analysis of PM concentration data is now being major interest of government and non-government organizations because of such an effect on public health. Meanwhile PM concentration features spatial and temporal fluctuation due to their aerodynamic nature, hence enabling higher spatiotemporal resolution of the PM concentration data is also being increasingly important. However, maintaining such high resolution with a government-grade air monitoring station is nearly impossible by the matter of cost. Additionally, their sampling interval is rather long, at the cost of the data quality. Because of the above facts, low-cost light scattering PM sensor have been widely used for a practical alternative of the air monitoring station in dense sensor deployment [4]. Even though these sensors still have a major challenge on data quality, they have overwhelming advantages of less expensive price, more compact size, and faster update rate [5,6]. As a result, many countries have densely deployed the low-cost sensor in the smart city [7][8][9]. As of April 2020, there are 40 government-operating beta attenuation monitor (BAM) stations in Seoul releasing information to the public every hour [10]. Additionally, approximately 3500 light-scattering PM equipment have been deployed in Korean major cities by leading telecommunication companies [11,12] and have continuously increased spatiotemporal resolution, as shown in Figure 1. As the importance of low-cost sensors has been increasing, more research is being conducted to evaluate and calibrate low-cost light scattering sensors. Evaluation of low-cost sensors was analyzed under various climate and weather conditions over the world from a day to longer than a year [13][14][15]. Additionally, these studies have several aims, such as environmental effect analysis [16], newly developed sensor validation [17], and calibration performance evaluation. We built four kinds of rough prototypes for briefly checking sample-to-sample variability (PMSA003, PMS7003 (Plantower Inc., Beijing, China [18]), SEN0177 (DFRobot Inc., Shanghai, China [19]), and HPMA115s0 (Honeywell Sensing Inc., Charlotte, NC, USA [20])). Subsequently, we chose PMS7003 and developed a muti-sensor platform for further long-term evaluation. We describe performance limitation on the raw signal of low-cost sensors that have been identified by co-locating them with governmental BAM for about 7.5 months in Section 3.2. Plus, we compared theperformance between raw signal and calibrated signals under various environmental explanatory variables, sampling intervals, and calibration methods.
Based on the previous research of low-cost PM sensors [13][14][15][16][17], the low-cost sensor has limited accuracy and it requires a calibration procedure in order to boost accuracy. The most common calibration methods on PM2.5 calibration are a linear calibration accounting for two-thirds of total calibration cases according to a technical report from the Joint Research Center of the European Commission [21] (univariate linear regression (ULR)-46% / multivariate linear regression (MLR)-22%). As such, linear regression (LR) is widely used for PM calibration, since it is a simple and powerful method. However, LR sometimes generates an under-fitting problem when the true function of data is not sufficient to fit the linear function approximation. For example, MLR suffers severe performance degradation under a high humidity environment [22]. On the other hand, non-linear calibration is quite free from the problem, but it is required to avoid an over-fitting problem by selecting an appropriate order of function approximation.
Beyond the cases of a single calibration model, sequentially combined calibration models were studied. Lin et al. 2018 introduced a two-phase calibration model while using Akaike information criterion (AIC) and random forests (RF). As a first phase, several linear models are created by selecting subsets from the entire input variable space based on the AIC index. After that, RF is used to learn the residual of the linear models [23]. However, RF uses the aggregation of randomized models with several decision trees and their results are averaged in the regression problem; it is usually good at avoiding over-fitting problems, but it might present lower accuracy due to the averaged result from several decision trees. Cordero et al. 2018 obtained the calibrated PM value through the linear model to generate the difference of the raw PM value. Subsequently, a non-linear calibration among RF, support vector machine (SVM), and artificial neural networks (ANN) is performed using the difference and the input variables [24]. However, their dataset was small and the training dataset and test dataset were shared with the k-fold cross-validation method.
This paper introduces a novel combined calibration method that selects the most accurate model from models for each sampling. This combined calibration differs from the cited methods in dividing the entire input variable space into segmented cells and applying the best model among multiple models for each cell. Besides, we proposed additional procedures to reduce the residuals probabilistically by managing the sum of residuals that are generated by the selected model in each cell. This combined calibration is named segmented model and residual treatment calibration (SMART calibration). The performance of this SMART calibration method was analyzed with raw data and compared with not only other state-of-the-art calibration methods, but also other study group's calibrated results based on 16 month-duration datasets [25]. The comparison results show that our proposed method offers better accuracy than counterparts.
Our contribution can be summarized, as follows: • Field evaluation of low-cost PM2.5 sensor in Seoul, Korea has been executed and analyzed. These were under several conditions, such as environmental explanatory variables (humidity/temperature/ambient light), sampling intervals (5 min/1 h/24 h), and calibration methods (linear/non-linear/SMART calibration).

•
A novel combined calibration method has been introduced to increase low-cost sensor accuracy.
The performance was compared to other calibration methods. This calibration method can also be applied to an upcoming future dataset with the previously generated models.
The next sections are structured, as follows. Section 2 describes the overall method of this research including data collection, data preprocessing, and data calibration. Section 3 presents the results and discussion. It covers the result of the experiments and explains the analysis of the result. Section 4 summarizes this paper and explains the potential use cases.

Methods
This section is written for describing the overall procedures of evaluation and calibration on low-cost sensors. It includes data collection (Section 2.1), data preprocessing (Section 2.2), data calibration methods (Section 2.3), and metric information (Section 2.4). Figure 2 shows the overall procedures for sensor evaluation and calibration. A multi-sensor platform has been developed and co-located with the governmental BAM in the government station (Dongjak-gu, Seoul, Korea) to evaluate low-cost light scattering PM2.5 sensor. The data have been collected for around 7.5 months (15 January 2019-4 September 2019). The following subsections will explain more information on several procedures we executed.

Data Collection
In this section, the sensor configuration and deployment information on the low-cost sensor and reference system is described.

Multi-Sensor Platform-Low-Cost Light Scattering PM Sensor
We developed prototypes and roughly evaluated the repeatability of signal and the sample-to-sample variability to select a proper low-cost sensor among four kinds of commercial low-cost sensors. Based on this analysis, PMS7003 (Plantower Inc., Beijing, China [18]) was chosen and the configuration and design of the multi-sensor system development proceeded for long-term evaluation and calibration.Detailed information of prototypes evaluation is further described in Appendix C. The selected PM sensor and other environmental sensors were built together as a multi-sensor platform, as shown in Figure 3a. Three low-cost PM sensors are mounted on a single multi-sensor platform to identify sample variation among three low-cost sensor samples. It also includes environmental sensors of humidity, temperature, and ambient light to analyze and calibrate the environmental impact on the measurement of PM. Data collection of each sensor module in low level is performed through Arduino Due, and communication with sensor network in high level is implemented through Raspberry Pi 3B+, as shown in Figure 3b. Data are measured and stored at 1-s sampling intervals and configured to be transmitted to users via wired LAN or Wi-Fi.

Governmental BAM-High-End PM Monitoring Station
In Korea, BAM is the only regulatory reference that received a formal approval from the Korean Ministry of Environment. As a reference to the experiment, the PM711 model (Kimoto Inc., Osaka, Japan [26]) was selected because it has a relatively fast sampling interval (5 min) compared to a sampling interval (1 h) of other BAM as shown in Figure 4b. Five min. sampled output may be less accurate than 1 h averaged output since 5 min. sampling interval data is the data source of 1 h averaged output. This equipment consists of two separate racks of monitoring systems for PM2.5 and PM10 measurements. It features a high accuracy, since it includes a sampling stabilizer, such as particle separator (PM2.5 impactor and PM10 impactor) and environment controllers of temperature, humidity, and air-flow to stably supply PM.

Data Preprocessing
In this step, different data sampling intervals of two equipment were matched so that the data from the multi-sensor platform can be directly compared with the data from the governmental BAM. Data were excluded from data preprocessing if any intermittent data were observed from sensor modules. The data of the multi-sensor platform was averaged with a 5-min. fixed window. The preconditioned data were used to build the linear/non-linear calibration model, such as MLR, MLP, and SMART calibration, and to perform the actual calibration with the prebuilt model in the next step. To build and evaluate the calibration model, the dataset was constructed in two ways for comparison, as shown in Figure 5. One is sampling in a sequential manner (hereinafter sequential) and the other is a random manner (hereinafter shuffled) under various separating ratio (unless otherwise stated, 80% of the total datasets were randomly selected to construct a training dataset and the remaining 20% was used as a test dataset). Data preprocessing was done via Matlab R2018b [27] and Python 3. Pandas [28], the state-of-the-art Python data manipulation library, was also utilized for data preprocessing. Default data separation methods for the training dataset and test dataset. 20% of the training dataset is used for the validation dataset to prevent the over-fitting calibration model. A shuffled method is controlled by a fixed random seed to compare the performance between calibration algorithms.

Data Calibration
In this paper, calibration doesn't mean any correction for the observed data in the training dataset. The calibration means an estimation for the unseen data in the training dataset. PM2.5 (low-cost sensor), humidity, temperature, and ambient light were selected as explanatory variables, and PM2.5 (BAM) was selected as a response variable. The influence of each explanatory variable was separately analyzed in Section 3.3. The calibration methods were analyzed in three ways: linear, non-linear, and SMART calibration. Data calibration was performed via Python 3 libraries (pandas [28], keras [29], sklearn [30] and tensorflow [31]).

Linear Calibration
Based on multivariate linear regression (MLR), we selected PM (low-cost sensor), humidity, and temperature of the multi-sensor platform as explanatory variables and chose PM (governmental BAM) as the response variable. The least-square method was applied with the chosen coefficients as shown in Table 1 (all the p-values for each coefficient were all less than 0.00001 and are omitted hereinafter.) y : PM calibrated, w 0 : intercept, w i : coe f f icient, x i : input variable measured

Nonlinear Calibration
Non-linear calibration was performed based on a multilayer perceptron (MLP) from the neural network and it consists of an input layer, an output layer, and hidden layers. The calibration is performed by making an appropriate sum of weights between neurons existing in each layer, as shown in Figure 6. The sum of each weight passes a non-linear activation function, rectified linear unit (ReLU), to generate a non-linear model. ReLU activation is explained in Equation 2. PM2.5 (low-cost sensor), humidity, and temperature from the multi-sensor platform were preprocessed and used as input variables in the input layer. PM2.5 (BAM) from the governmental station was used as output variables in the output layer. Hyperparameters were manually chosen under several trials, as shown in Table 2.
y : PM calibrated, W i : weight matrix, x : input variable measured Figure 6. The architecture of a fully connected neural network. An input layer in red feeds explanatory variables and an output layer in green feeds response variable. Based on hyperparameter, the weight matrix (parameter) is built.

SMART Calibration (Combined Calibration)
In this section, we introduce a SMART calibration algorithm, which selectively maps most probabilistically appropriate models given multiple linear/non-linear calibration models. LR is the most representative methodology for finding a best-fit line for the approximation and estimation. However, the LR is usually too simple to correctly fit the true function of complex data. And the best-fit line is highly affected by non-linearity, outliers, and data range. Meanwhile, non-linear calibration can optimally generate a model which has lower prediction error of training dataset as the model complexity increases more. However, in this case, a prediction error of the test dataset is largely generated in case the model is overfitted. This is well known disadvantage of non-linear calibration (limitations of linear and non-linear calibration are further described in Appendix A).
Each model has its "weak spot" in their domain due to the above nature of the linear/non-linear calibration models. For instance, LR has its weak spot in the non-linear region of the domain, and MLP has weak spot in the overfitted region. The SMART calibration method has been developed to improve this limitation. Figure 7 shows the overall procedures of model build and model selection. Firstly, two training models and residual maps are generated with training dataset in model build step. Secondly, a prevailing model map is constructed by comparing residual maps. Subsequently, the prevailing model map can be utilized in the model selection step. In more detail, the residual map that divides a full range of explanatory variable space (e.g., temperature and humidity) into segmented small area cells is generated, as shown in Figure 8. Every residual of training data is allocated to a corresponding partitioned cell of residual maps. The distribution of residuals in each cell of a residual map are assumed as a Gaussian, since residual is the error of the estimator. Each cell has its probability density function (PDF), which is expressed by its average and standard deviation. This information is stored in residual maps. For each cell, a prevailing calibration model is defined by comparing the residual maps of the linear and non-linear models. Every prevailing calibration model of each cell is stored in a prevailing model map. Once a prevailing model map is completed through a whole training dataset, the corresponding input cell of test dataset calibrates their data with a predefined suitable model and averaged residual, as shown in Figure 9 (Procedures of SMART calibration are further described in Appendix D). Figures 7-9 are examples of explanations and the number and type of calibration models are not limited in MLR and MLP. SMART calibration has good features on the simpleness of procedures and the compatibility of several models since it is the hierarchical calibration model. As it depends on the consistency of estimators, the number of data in each cell is increased when the accuracy of SMART calibration is increased. Additionally, it has good performance with a high bias model, but it cannot outperform when SMART has only high variance models, since SMART calibration selects model according to variance of data in segmented cell.

Metric Information
Four key metrics were used to analyze the performance as shown in Table 3. The analysis index used mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R 2 (coefficient of determinant). RMSE is excluded hereinafter, because it can be calculated by MSE. In some analysis cases, slope, intercept, mean and standard deviation, quartile, and Pearson's correlation coefficient are also used. Table 3. Metrics for performance analysis.
y : PM re f erenceŷ : PM calibrated,ȳ : PM mean o f re f erence

Results and Discussions
This section is written for describing preliminary analysis (Section 3.1) by varying explanatory variables and sampling interval conditions. Subsequently, we compare the performance of SMART calibration under several conditions, such as before calibration (Section 3.2), after calibration (Section 3.3), other calibrations methods (Section 3.4), and a previous similar study (Section 3.5).

Performance Characteristics: Explanatory Variables
The low-cost sensor features cost-effectiveness, lightweight, rapid, and continuous measurements, but it has a limitation on their accuracy. Accordingly, this low-cost sensor generally excludes any sampling stabilizer for PM size, humidity, temperature, or flow control. As a result, the low-cost sensor is directly affected by the surrounding environment. In particular, the influence of humidity and temperature has been continuously researched by several research groups, and calibration models that are based on meteorological parameters are introduced as Equations 3 and 4 [22,32].
y : PM calibrated, α i : coe f f icient, y : PM measured, ρ : RH measured, t = temp. measured In this section, short-term analysis for the effects of humidity, temperature, and ambient light on PM concentration was performed, and long-term analysis for the effects of humidity and temperature was executed while applying linear and non-linear calibration. As a result, we found that the humidity and temperature is the important variable on PM concentration calibration.
Performance Characteristics: Explanatory Variables, Short-Term Analysis (45 Days) The experimental data from 18 July 2019 to 4 September 2019 were analyzed, since the storage of data on the ambient light sensor was executed in this limited period. This period was summer in Korea and the summer climate of Korea is characterized by high temperatures and high humidity. As previously researched in Equation 3, high humidity features high non-linearity of the calibration function. In our result, the non-linear calibration had a relatively smaller error than the linear calibration, as shown in Table 4. The comparison of the uncalibrated raw PM signal and the calibrated PM signal expressed a significant improvement (e.g., MAE of MLP: 9.78 → 3.55 µg/m 3 ), and the calibration, including the PM raw signal with humidity signal showed remarkable improvement (e.g., MAE of MLP: 3.55 → 2.99 µg/m 3 ). In the case of calibrations, including temperature and ambient light, the improvement was insignificant. The long-term analysis was performed in the next section on the influence of PM, humidity, and temperature.
Performance Characteristics: Explanatory Variables, Long-Term Analysis (7.5 Months) The experimental data from 15 January 2019 to 4 September 2019 were analyzed in Table 5. Similar to short-term analysis, the uncalibrated raw PM signal and the calibrated PM signal (e.g., MAE: 15.87 → 4.21 µg/m 3 ), and the calibration, including raw PM signal with humidity signal (e.g., MAE: 4.21 → 4.04 µg/m 3 ) showed a significant improvement. The performance by humidity signal under the short-term analysis was highly improved where the high humidity region accounted for the majority, whereas, under the long-term, analysis was slightly improved. However, the performance was highly improved by adding temperature, especially for non-linear calibration cases (e.g., MAE: 4.04 → 3.52 µg/m 3 ).

Performance Characteristics: Sampling Interval
In this section, the 5 min. sampling interval was converted into one hour and 24 hours sampling interval to compare with other previous studies. Most of the PM researchers analyzed sensor performance under one hour or 24 hours of sampling interval, because the high-end BAM as a reference-grade instrument was used in hourly sampling intervals. Especially, Met One BAM-1020 (Met One instrument Inc., Grants Pass, OR, USA [33]), a US EPA [34] certified equipment, was used in many previous studies [14,25].
Non-overlapping sliding windows were applied for one hour or 24 h of sampling intervals. MAE decreased with longer sampling intervals, since more aggregated data reduced data variation, as shown in Table 6. In the case of the 24 h sampling interval, R 2 , which indicates proportional variance for response variables was lowered. This lowered R 2 is derived from reduced data range by aggregation. This can be calculated from the R 2 equation in Table 3 or explained by Figure 10.   [35]). This descriptive statistic is also required when R 2 is used as a metric.

Comparative Analysis: the Low-Sensor and Governmental BAM (Before Calibration)
The performance of the low-cost sensor was analyzed by comparing raw signals from the sensor platform and the reference signal from the governmental BAM (hereinafter three low-cost sensors' raw signals are described as Raw (a/b/c), and the BAM signal is remarked as BAM in Tables and  Figures). Figure 11 shows the correlation between three low-cost sensors and the BAM. Additionally, their correlation coefficient, evaluation metrics, and statistic summary are listed in Appendix B (Tables A1 and A2).  Figure 11. Comparison between low-cost sensors and governmental BAM (before calibration). R 2 of low-cost sensors with BAM was expressed as 0.416, 0.546, and 0.417. However, R 2 among low-cost sensors expressed a very strong positive correlation coefficient, with 0.937, 0.994, and 0.933. It is possible to expect the effectiveness of the performance improvement via the calibration due to a very strong correlation coefficient with BAM output. Additionally, high R 2 among the low-cost sensors in the commonplace indicates that a common calibration model can be shared under logged condition. The data distribution expresses the overall difference between the low-cost sensor and the BAM, as shown in Figure 11. The reproducibility among the low-cost sensors looked high with a very tight output span, but the reproducibility between the low-cost sensors with the BAM output looked low with a wide output span.

Comparative Analysis: the Low-Cost Sensor and Governmental BAM (After Calibration)
MLR, MLP, and SMART calibration were executed to evaluate the performance by following the methods in Section 2 with a PM sensor instead of three PM sensors. All of the described results from this subsection were only calculated by the test dataset, since the training dataset was used for calibration model generation. Figure 12 shows the correlation between low-cost sensors and BAM. Additionally, their correlation coefficient, evaluation metrics, and statistic summary are listed in Appendix B (Table A3, A4). The means and standard deviations in 38.12 ± 31.18 µg/m 3 (raw signal), 23.13 ± 13.74 µg/m 3 (MLR), 22.7 ± 13.12 µg/m 3 (MLP), and 23.09 ± 13.85 µg/m 3 (SMART calibration) were obtained and compared with 23.10 ± 14.84 µg/m 3 (BAM). The normalized mean bias error declined from 65% to 1.7% and standard deviation decreased from 110% to 11.6% by applying MLP calibration models. R 2 were observed as 0.41 (raw signal), 0.84 (MLR), 0.86 (MLP), and 0.89 (SMART calibration), respectively. By these results, the calibration significantly improves the performance of the low-cost sensors.
As shown in Figure 13 and Table 7, several calibration results were analyzed by applying different data preprocessing conditions. Our dataset was analyzed by a shuffled method as well as a sequential method, since Korea has four distinct seasons and 7.5 months collected dataset was experienced through the limited climate and season. The shuffled dataset features a higher R 2 than the sequential dataset. On the other hand, the sequential dataset features a lower error in MAE and MSE than the calibration result of the dataset under the shuffled condition. Appendix E further describes more information on several shuffled methods on successive hourly or daily data chunk size.  This calibration can also be applied to an upcoming future dataset with the previously generated calibration models under the sequential method. As an example, the sequential datasets from the raw signal, the SMART calibration signal, and the government BAM's signal were plotted, as shown in Figure 14. For detailed information, a training dataset was constructed with the sequential condition from 15 January 2019 to 8 August 2019 and their calibration model was created. After that, the test dataset was built from 8 August 2019 to 4 September 2019 and the previously derived model from the training dataset was applied. As a result, the test dataset confirms a very similar BAM output (e.g., MAE = 2.79, MSE = 14.02, and R 2 = 0.76).

Comparative Analysis: Other Calibration Methods
The SMART calibration method was compared with other regression methods, such as lasso regularization, ridge regularization, and polynomial linear regression (PLR). Additionally, we applied state-of-the-art ensemble learning methods such as random forests (RF), extreme gradient boosting (XGB), and light gradient boosting (LGB). The hyperparameters of these methods were exhaustively searched over specified hyperparameters. A cross-validated grid search algorithm was applied in order to optimize hyperparameter and more information of the hyperparameter grid is further described in Appendix F. SMART calibration parameters were also customized with an increased cell size of the residual map and another calibration model. Several dataset ratios under the sequential method were applied for the data precondition method. Our calibration method expressed the smallest MAE and MSE among twelve calibration methods, as shown in Figure 15 and Table 8.

Comparative Analysis: Previous Similar Study
The SMART calibration result was compared with the latest results from a similar study because we could not get a long-term dataset of other research under similar conditions [25]. The study had a field test for 16 months in North Carolina, USA by comparing a commercial product (PA-II (Purple Air Inc., Draper, UT, USA [36])) with a BAM 1020 (Met One instrument Inc., Grants Pass, OR, USA [33]). This study included a long-term performance evaluation and a calibration under 1 h sampling interval basis. 90% training dataset and 10% test dataset by the shuffled (random) method was conducted in data preprocessing. MLR with raw PM signal, humidity, and temperature was applied for their calibration method.
Before the calibration, the results of the other group study were superior, thanks to a factory calibration under product manufacturing, as shown in Table 9. After the calibration, our group's shuffled dataset showed higher R 2 than the other group study and our group's sequential dataset with SMART calibration was superior in all performance aspects.

Conclusions
The low-cost PM sensor was evaluated and it was calibrated with co-located governmental BAM in the urban air monitoring station (Dongjak-gu, Seoul, Korea). The performance of the low-cost PM sensor was analyzed using the analysis metrics of MAE, MSE, RMSE, R 2 , slope, intercept, mean, standard deviation, and quartile. The means and standard deviations in the raw signal of the low-cost sensor and BAM output were 38.15 ± 31.29 and 23.10 ± 14.84 µg/m 3 , with around 65% normalized mean bias error. Additionally, a comparison of calibration methods, such as MLR, MLP, and SMART calibration, was performed. The means and standard deviations in the SMART calibration of the low-cost sensor and the BAM output were 23.09 ± 13.85 and 23.01 ± 14.74 µg/m 3 with around 0.35% normalized mean bias error. When the raw signal and calibrated signals of the low-cost sensor were compared to the figures from BAM output by applying correlation index, R 2 , increased correlations between the low-cost sensor and the BAM output were observed as 0.41 (raw signal), 0.82 (LR), 0.84 (MLR), 0.83 (MLP), and 0.89 (SMART calibration). Furthermore, this calibration model was verified with the possibility of being applied to future datasets. These results explain the fact that calibration is highly required when low-cost sensors are used for high accuracy sensing.
A sample-to-sample variability of the low-cost sensors was evaluated among three co-located low-cost sensors. The sensors were very strongly correlated having an extremely high correlation coefficient ranging from 0.985 to 0.997. Based on this finding, a calibration model can be continuously updated and improved by co-locating a single multi-sensor platform with BAM and it can be transferred toward all nodes in a sensor network to calibrate the entire nodes. This approach is the base concept of an online calibration for low-cost sensors. For future studies, a mobile node that is converted from the co-located multi-sensor platform travels among all of the nodes in the sensor network by performing an offline calibration of slope and intercept of each node. This successive calibration is named Hybrid Calibration, which features both an entire online calibration and an individual offline calibration.

Conflicts of Interest:
The authors declare no conflict of interest. Figure A1 shows Anscombe's quartet which intuitively describes the error on LR [37]. The four datasets have the same best-fit line slope, intercept, and R 2 even though the data are very different. To solve this ambiguousness, we properly compared calibration effectiveness with other metrics in Section 2.4. As shown in Figure A2, it is important to avoid the generation of underfitting or over-fitting models and to generate an appropriate trade-off model to reduce total error.  Tables   This appendix section includes more detailed table information.  Table A1. Correlation coefficient and metrics of low-cost sensor and governmental BAM (before calibration).  Table A2. Descriptive statistic summary of low-cost sensor and governmental BAM (before calibration).  Figure A4. Correlation plot between inter / hetero sensors. (a) Intercorrelation plot with heterotypic sensors (b) Calibrated PM and Humidity plot by the time Figure A5. Prototypes output analysis.

Raw(a) Raw(b) Raw(c) BAM
The dataset was preprocessed and analyzed in several shuffled methods by selecting successive hourly or daily data chunk size, as shown in Table A6 and Figure A6.  Figure A6. Comparison plot by data preprocessing methods: shuffled -hourly(top) / shuffleddaily(bottom).