• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of japPublished ArticleArchivesSubscriptionsSubmissionsContact UsJournal of Applied PhysiologyAmerican Physiological Society
J Appl Physiol (1985). Dec 2011; 111(6): 1804–1812.
Published online Sep 1, 2011. doi:  10.1152/japplphysiol.00309.2011
PMCID: PMC3233887

Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample


Previous work from our laboratory provided a “proof of concept” for use of artificial neural networks (nnets) to estimate metabolic equivalents (METs) and identify activity type from accelerometer data (Staudenmayer J, Pober D, Crouter S, Bassett D, Freedson P, J Appl Physiol 107: 1330–1307, 2009). The purpose of this study was to develop new nnets based on a larger, more diverse, training data set and apply these nnet prediction models to an independent sample to evaluate the robustness and flexibility of this machine-learning modeling technique. The nnet training data set (University of Massachusetts) included 277 participants who each completed 11 activities. The independent validation sample (n = 65) (University of Tennessee) completed one of three activity routines. Criterion measures were 1) measured METs assessed using open-circuit indirect calorimetry; and 2) observed activity to identify activity type. The nnet input variables included five accelerometer count distribution features and the lag-1 autocorrelation. The bias and root mean square errors for the nnet MET trained on University of Massachusetts and applied to University of Tennessee were +0.32 and 1.90 METs, respectively. Seventy-seven percent of the activities were correctly classified as sedentary/light, moderate, or vigorous intensity. For activity type, household and locomotion activities were correctly classified by the nnet activity type 98.1 and 89.5% of the time, respectively, and sport was correctly classified 23.7% of the time. Use of this machine-learning technique operates reasonably well when applied to an independent sample. We propose the creation of an open-access activity dictionary, including accelerometer data from a broad array of activities, leading to further improvements in prediction accuracy for METs, activity intensity, and activity type.

Keywords: wearable activity monitors, intelligent prediction models

accelerometer sensors are popular tools to estimate physical activity (PA) behavior. The devices are easy to use and impose nominal subject and researcher burden. These sensors provide objective estimates about PA features, such as point estimates of energy expenditure (EE) and categorically defined activity intensity levels. Despite their popularity, the traditional regression methods used to translate accelerometer output to estimates of EE or time spent in different activity intensity levels remain problematic. For example, traditional regression approaches are not accurate across a range of activity types and intensities (3, 7, 14, 22), and, although they often produce relatively small or nonsignificant mean differences between estimated and actual EE, the individual estimation errors are often substantial (4, 15). Recent advances in motion sensor technology permit accelerometers to capture and store more detailed information than originally possible, leading several groups to explore more advanced data processing methods, such as hidden Markov models (HMM) (18), decision trees (3), cross-sectional time series (5), multivariate adaptive regression splines (5), and artificial neural networks (nnet) (21, 23).

HMMs, decision trees, and nnets are adaptive machine-learning systems capable of “learning” the shape of complex data. When applied to accelerometer output, these machine-learning methods do not assume a simple parametric relationship (e.g., linear, exponential, cubic) between accelerometer counts and EE. This inherent flexibility allows such techniques to use more information from the acceleration signal than the counts per minute used in the traditional regression approaches. These two factors suggest machine-learning approaches will improve estimates of accelerometer-based PA metrics across a range of activity types and intensities when applied to large, diverse samples. These methods also allow us to identify activity type which is not possible with simple regression methods. A review of several different machine-learning activity classification methods and algorithms can be found in a review by Preece and colleagues (19).

Our group and others previously reported success in applying HMMs to identify specific modes of activity (6, 13, 18). The HMM method is relatively complex and relies on custom software that may be a barrier for many applied researchers. Our group (23) and de Vries et al. (8) have used nnet models to successfully identify different activity types (23) and specific activities (8). Rothney et al. (21) developed nnet-using raw acceleration input features that improve EE estimates compared with traditional regression techniques. This approach is promising, but, at present, it requires expensive analytic software (Matlab, Mathworks, Cambridge, MA) and a very complex multiple accelerometer system (Intelligent Device for EE and Activity, MiniSun LLC, Fresno, CA). Thus its application to free-living environments and large-scale epidemiological studies remains impractical. De Vries et al. (8) used nnet models from one or two Actigraph accelerometers positioned on the hip and wrist to successfully identify activity type. However, their nnets do not predict EE, which is of interest to the research community.

Our group recently published a proof-of-concept paper for two nnets using the Actigraph 7164. One nnet estimated metabolic equivalents (METs), and another nnet identified activity type (22). Our model improved MET estimates compared with three traditional regression approaches (7, 9, 24) and successfully differentiated activity type into four general categories (sedentary, locomotion, lifestyle, or vigorous sport). Unique features of our nnet prediction models are that we used a single hip-mounted accelerometer (ActiGraph 7164; ActiGraph, Pensacola, FL) and the open-source computing language and statistics package R (20) to process the data. The ActiGraph is a commonly used activity monitor in the field, and R is a free statistics package, making this model readily accessible to applied researchers without requiring expensive monitors or skills in advanced statistical methods.

Our methodology established that advanced data processing techniques (artificial nnets) improved accelerometry-based PA measurement without compromising the capacity of applied researchers to implement these tools in the field. However, our original paper was limited in that the nnets were validated on the same sample (n = 48) in which the models were developed (using cross-validation), and we used an ActiGraph accelerometer model (ActiGraph 7164) that is no longer available and is known to produce different output than more recent accelerometer hardware upgrades (e.g., ActiGraph GT1M) (10). Thus the purpose of this study was to evaluate the robustness and flexibility of the nnet method for processing GT1M accelerometer data to estimate activity METs and activity type on an independent sample.


Data collection.

At both sites, participants read and signed an informed consent document that was approved by the Institutional Review Boards at the respective universities. Participants completed a health history questionnaire to ensure eligibility criteria were met.

University of Massachusetts study protocol.

The study sample at the University of Massachusetts (UMass) included 277 participants. The sample was 50.2% women and 17% minorities. The average age was (mean ± SD) 38 ± 12.4 yr, and average body mass index (BMI) was 24.6 ± 4.01 kg/m2.

On the day of the testing, participants reported to the laboratory in a 4-h fasted state, having not consumed caffeine nor participated in exercise for the previous 4 h. Participants completed 11 out of 23 activities (each activity was performed for 7 min continuously with a 4-min rest period between activities) that were divided into two sections: treadmill activities and sport/activities of daily living (ADL). Between each activity section, participants rested for 15 min to avoid the possibility of the physiological responses elicited by prior activity influencing the responses of the subsequent activity bout. Furthermore, the order of presentation of the activity bouts was balanced across subjects.

The treadmill section consisted of six conditions: three speeds (1.34, 1.56, 2.23 m/s) performed at 0% and 3% grade. The ADL portion included five self-paced ADLs with each activity being performed for 7 min continuously. All participants ascended and descended stairs and moved a 6.0-kg box from a shelf to the floor 8 m away. The additional two ADLs were randomly selected from a menu of common household activities and sport activities using a blocked randomized design to ensure activities were completed equally among age and sex groups. There were 14 possible household and sport activities, including sweeping, mopping, gardening, trimming, mowing, raking, dusting, laundry, vacuuming, washing dishes, painting, tennis (with a partner), and basketball. A detailed description of the activities and study protocol has been published elsewhere (11).

Oxygen consumption during activities was measured using a portable metabolic system (Oxycon Mobile, Cardinal Health, Yorba Linda, CA). This portable device is a battery-operated, wireless unit that measures breath-by-breath gas exchange. It was secured to the body using a vest similar to a backpack (950 g). A face mask (Hans Rudolf, Kansas City, MO) was connected to the flow sensor unit, which measured samples of expired air using a microfuel O2 sensor and a thermal conductivity CO2 sensor. Immediately before data collection, and during the break between protocol sections, a two-point (0.2 and 2.0 l/s) air flow calibration was performed using the automatic flow calibrator, and the gas analyzers were calibrated using a certified gas mixture of 16% O2 and 4.01% CO2. The system has been shown to be valid for measurement of respiratory gas exchange during exercise (17).

University of Tennessee study protocol.

The validation sample was from the University of Tennessee (UTenn; n = 65; 58% women and 38.2% minorities). Of the 68 participants who completed the protocol, data from 65 participants were included in the analysis. Three participants were excluded due to technical problems in synchronizing the metabolic and accelerometer data. There were 18 different activities in the testing protocol.

The average age of the sample was (mean ± SD) 40.1 ± 13.0 yr, and average BMI was 27.1 ± 5.61 kg/m2. Age in the UTenn sample was not significantly different from that of the UMass sample (P = 0.6064), and BMI was significantly higher than the mean of 24.6 kg/m2 in the UMass sample (P = 0.005). Testing occurred on campus or at the participant's or investigator's home. Participants performed one of three routines, each of which included six different physical activities. For all routines, each activity was performed for 10 min, with a 3- to 5-min break between activities.

For routine 1 (n = 25), participants did laundry, including gathering clothes, loading the machines, folding clothes, and putting clothes away. They also ironed, did light cleaning, and aerobics. For routine 2 (n = 22), participants drove through a residential neighborhood, played Frisbee golf, trimmed grass using an electric trimmer, gardened, and moved dirt with a wheelbarrow. Participants also walked with a 6.8-kg box in their arms, set it down, picked it up, and carried it to another location. For routine 3 (n = 18), participants played singles tennis and completed self-paced walking and running activities. Distance was recorded to determine speed for each subject in these activities. Participants walked and ran on a track and a road course that included sidewalks, crosswalks, and a slightly hilly terrain. Participants also performed a self-paced walk carrying a 6.8-kg over-the-shoulder laptop computer case. The mean (SD) speeds for the road and track walks were 1.49 (0.18) and 1.52 (0.19) m/s, 2.70 (0.54) and 2.73 (0.62) m/s for the road and track runs, and 1.43 (0.17) m/s for the walk carrying the computer bag.

The criterion method for measuring oxygen consumption was the CosmedK4b2 (Cosmed, Rome, Italy) portable metabolic system. The Cosmed K4b2 is a breath-by-breath gas analysis system consisting of a face mask, analyzer unit, and battery. Before testing each subject, the unit was warmed up for 45–60 min and then calibrated according to the manufacturer's instructions. Calibration of instrument included four parts: room air calibration, reference gas calibration (16.03% O2 and 3.98% CO2), turbine flow-meter calibration with a 3.0-liter syringe (Hans-Rudolph), and CO2/O2 analyzer delay calibration with the participant wearing the face mask. To reduce analyzer drift caused by extreme temperatures, the outdoor routines were not performed when the temperature was below 50°F (10°C) (7).

At both UMass and UTenn, the ActiGraph GT1M (ActiGraph, Pensacola, FL) accelerometer was used. The device is a small (3.8 × 3.7 × 1.8 cm), light-weight (27 g), uniaxial accelerometer. Detailed specifications of the monitor are published elsewhere (1). Each participant wore an ActiGraph GT1M initialized to collect data in 1-s epochs and secured on the anterior superior iliac spine along the anterior axillary line on the nondominant hip.

Nnet training and development.

For both the development (UMass) and validation data sets (UTenn), we used the identical data cleaning methods, as described by Staudenmayer et al. (23). Data points where the coefficient of variation of the counts was greater than 90% different than the mean coefficient of variation for a given activity were eliminated from the final data sets. We removed 16 of 2,745 (0.60%) subject/activity combinations for the development group data set (UMass) and 3 of 368 (0.80%) subject/activity combinations for the validation group (UTenn). The accelerometer count features used to develop the nnets were those used in Staudenmayer et al. (23) and included variables representing the signal distribution (10th, 25th, 50th, 75th, and 90th percentiles of the second-by-second accelerometer counts) and the temporal dynamics (lag-1 autocorrelation). Each subject contributed one set of features for each activity, and those features were calculated from the second-by-second accelerometer counts, excluding the first 2 min and last 10 s of accelerometer data. Each subject performed each activity for 7 min in the UMass study and 10 min in the UTenn study. The METs for each unique subject and activity combination in each study were calculated using the mean measured O2 uptake (ml·kg−1·min−1) divided by 3.5 ml·kg−1·min−1, excluding the first 2 min and last 10 s of measurements. As in Ref. 23, we did not find the inclusion of subject-specific characteristics, such as age, sex, height, weight, or body mass index, to improve the performance of the model.

We developed two nnets: 1) a prediction of METs (nnetMET) and 2) a prediction of activity type (nnetACT). We used the same nnet technical specifications as Ref. 23. Briefly, nnetMET was fit to minimize the penalized squared difference between the criterion MET values and the model's predictions. The penalization was done to avoid overfitting, and the penalty value was chosen through cross-validation. The nnetACT was fit to minimize the penalized negative logistic likelihood, and the penalty value was again chosen through cross-validation. In the main analyses, we examined the accuracy and precision of the nnetMET trained on UMass data by computing the bias (mean difference between prediction and criterion measure) and root mean squared error (RMSE; square root of the mean of the squared differences between the prediction and the criterion measure) of the predictions for the UTenn data. We also compared the nnetMET prediction bias and RMSE to the bias and RMSE for the Crouter et al. (7) and Freedson et al. (9) regression equations applied to the UTenn data. The Crouter et al. (7) model development was performed with data that were not part of the UTenn validation data set. We examined activity intensity classification accuracy by comparing the actual intensity classification from the measured METs to those predicted from the nnetMET and Crouter et al. (7) and Freedson et al. (9) equations. We validated activity type categories predicted from the nnetACT trained on UMass and applied to UTenn.


The mean counts per minute, the coefficient of variation for the accelerometer counts per minute, the averages of the signal distribution input features, the lag-1 autocorrelation input feature, and the mean (SD) METs for each activity for UMass and UTenn are shown in Table 1. The measured MET values for the individual activities performed by the development and validation groups are presented in Table 2 (UTenn and UMass data). Notable is that the range of mean measured METs was 1.88 (washing dishes) to 9.75 METs (treadmill, 2.23 m/s, 3% grade) for UMass and 0.78 METs (driving) to 11.17 METs (track running) for UTenn. Additionally, for UMass, 4 activities (17%) were below 3 METs, 14 activities were between 3.1 and 6 METs, and 5 activities were above 6 METs. In contrast, for UTenn, there were 8 activities (44%) below 3 METs.

Table 1.
Descriptive summary for accelerometer output from UMass and UTenn
Table 2.
Measured METs, METs predicted from nnetMET, and nnetMET biases and RMSEs for independent sample validation (UTenn) and cross-validation (UMass)

The validation of the nnetMET trained on UMass is shown in Fig. 1 (validated on UTenn). The bias was 0.32 METs, the RMSE was 1.90 METs, and the correlation between measured METs and the nnetMET was r = 0.78. Eight of the activities where METs were overestimated were in the light intensity range, and four activities greater than 6 METs (vigorous) were underestimated. The bias and RMSE for the individual activities for the nnetMET are presented in Table 2. We note that this figure suggests that a simple additive measurement error model does not explain the relationship between the nnetMET estimates and the criterion measures. Further exploration of measurement error models for nnetMET is outside the scope of the present work.

Fig. 1.
Measured metabolic equivalents (METs) vs. METs predicted from neural network (nnetMET). The nnnetMET was developed on University of Massachusetts (UMass) data set (n = 277) and applied to University of Tennessee (n = 65) data set. The bias was 0.32 METs, ...

For comparison purposes, we applied the Freedson et al. (9) and Crouter et al. (7) regression models to UTenn and UMass data (Table 2). For all activities combined (mean measured METs = 4.32), the biases for Freedson et al. (9) and Crouter et al. (7) were −0.95 and 0.18 METs, respectively, when applied to the UTenn data (top of Table 2). The lowest mean measured METs was 1.88 for the UMass data (bottom of Table 2: washing dishes), whereas there were three UTenn activities with mean METs below 1 (top of Table 2: driving, watching television, and reading). We investigated whether those differences in activity intensity between UMass and UTenn influenced the performance of the nnet by removing the three sedentary behaviors from the UTenn data and rerunning the validation analysis. When sedentary behaviors were removed, the bias for the nnet validation was reduced from 0.32 METs to 0.10 METs and increased the nnet validation RMSE from 1.90 to 1.99 METs (Table 2, top). The bias increased to −1.31 METs (from −0.95) for the Freedson et al. (9) equation and decreased to 0.14 METs (from 0.18) for Crouter et al. (7) equation. The RMSEs increased to 2.26 (from 2.07) and 2.15 METs (from 1.97 METs) for the Freedson et al. (9) and Crouter et al. (7) equations, respectively.

Using UTenn, we examined the activity intensity classification accuracy for nnetMET, and the Freedson et al. (9) and Crouter et al. (7) regression equations. Based on the measured METs, each activity was placed in an activity intensity category (sedentary/light: less than 3 METs, moderate: 3.0–5.99 METs, and vigorous: 6.0 METs and above). Predicted METs from the Freedson et al. (9) and Crouter et al. (7) regression equations and nnetMET were directed to the appropriate intensity level classification. The confusion matrices illustrating these analyses are shown in Table 3. The Freedson et al. (9) and the Crouter et al. (7) regression equations correctly classified activity intensity 72.9 and 72.3% of the time, respectively. The nnetMET correctly classified activity intensity 77% of the time, and the classification accuracy was relatively constant across intensity categories. The nnetMET classification accuracy is lowest for vigorous activities (71.9%). This is largely due to aerobics, which was classified as a vigorous activity (6.2 METs, on average), which was not included in the UMass training data, but was in the UTenn validation data.

Table 3.
Confusion matrices for intensity category classification comparing criterion measure (measured METs) and Freedson et al. (9), Crouter et al., (7) and nnetMET

We validated the nnetACT to predict activity type by developing and training the model on UMass data and applying it to UTenn data. We placed the activities into household, locomotion, and sport activity categories and did not include the UTenn sedentary behaviors, since the UMass study did not include sedentary behaviors (see Table 4 for assignment of activity type). Table 5 presents a confusion matrix illustrating the percentage of activities correctly classified. Application of the nnetACT trained on UMass to UTenn data yielded an overall correct classification rate of 80.9% (Table 5). Correct classification occurred for over 98.1% of the household activities, 89.5% of the locomotion activities, and 23.7% of the sports activities. Sport activities were often misclassified as household activities. Correct classification was 97.3% when we applied the nnetACT trained on UMass data to the UMass data (using hold-one-out cross-validation) (Table 5). Classification accuracy for the individual activities (UTenn data) for nnetMET, Freedson et al. (9), and Crouter et al. (7) are shown in Table 6. Activity-specific classification accuracy ranged from 24% [frisbee golf, Crouter et al. (7) regression equation] to 100% for most of the sedentary behaviors for all three prediction models. Household and locomotion activities were correctly classified 95% of the time, while sport activities were correctly classified 76.3% of the time.

Table 4.
Activities and activity type assignment
Table 5.
Confusion matrices illustrating accuracy of activity type classification
Table 6.
Intensity classification accuracy by activity for nnet, Freedson et al. (9), and Crouter et al. (7)

We also developed and cross-validated nnetMET using the hold-one-out cross-validation method. The training of the nnetMET on UMass data and cross-validation on UMass data yielded a bias of 0 METs. In comparison, the biases were −1.26 and −0.84 METs for Freedson et al. (9) and Crouter et al. (7), respectively (applied to UMass data). The RMSEs were also higher for Freedson et al. (9) and Crouter et al. (7) (2.18 and 2.05 METs, respectively) compared with the nnet (1.43 METs). An nnet was trained on a combination of the UTenn and UMass data and evaluated with hold-one-out cross-validation. Bias and RMSE were 0.0 and 1.2 METs, respectively.


The primary aim of this study was to advance the nnet methodology for assessing PA metrics using the GT1M Actigraph accelerometer by 1) training the nnets on a large, diverse sample using a broad range of locomotion, lifestyle, and sporting activities; and 2) validating the nnets on an independent sample. The nnet methodology produced reasonably valid MET estimates, with an overall bias of 0.32 METs and RMSE of 1.90 METs, respectively. The nnet also successfully identified activity intensity category 77% of the time and activity type 80.9% of the time. These data are novel in that they move the nnet methodology from a proof of concept (22) to a viable and validated method for processing accelerometer data. An alternative approach for validating METs using a decision tree prediction model was employed by Albinali et al. (3). They successfully predicted activity type from a decision tree algorithm and then used MET values from the Compendium of Physical Activities (2) to predict METs. For model validation, our approach and that used by Albinali et al. (3) are both viable options to examine prediction model performance.

Our laboratory previously demonstrated the nnet methodology success in estimating METs (bias = 0.00 METs, RMSE = 1.43 METs) and identifying activity type (88.8% correct) using a “hold-one-out” cross-validation technique (23). The nnets were validated on a single observation from the original sample, and the remaining observations were used for nnet training. This process was repeated such that each observation from the original sample was used once for validation, and the results were then averaged to produce a single estimate of the precision and accuracy of nnet model. When used in calibration studies, cross-validation provides an estimate of how well the nnet model will generalize to an independent sample. This approach is not ideal since the validation sample was not truly independent, and the activity protocol and research procedures were identical for the cross-validation. Thus researchers should expect that the model will not be as successful when applied to an independent sample that is performing different activities.

In the present study, we again demonstrate the nnets' success using cross-validation. Although a primary aim of this paper was to validate the nnets on an independent sample, we present these ancillary results (Table 2, bottom) to make several points. First, the measurement error reported using cross-validation in this study (bias = 0.00 METs, RMSE = 1.43 METs) is similar to previous cross-validation results (bias = 0.05 METs, RMSE = 1.22 METs) (23). This comparison is interesting because, in the present study, we used a much larger, more diverse sample and broader range of activities to train the nnets, yet the validity remained comparable to the smaller, less diverse sample results. Accommodation to a broad range of activities performed by a diverse population illustrates the adaptive nature of the nnet method. This inherent flexibility is an improvement over the traditional linear and nonlinear regression models that assume simple, rigid relationships between accelerometer counts and EE. It has been repeatedly documented that traditional regression models do not perform well when applied to diverse samples performing a range of activities (7, 14, 22).

The second reason to present cross-validation results is for comparison to the independent sample validation. The error reported when the nnets are cross-validated (bias = 0.00 METs, RMSE = 1.43 METs) is less than that reported using the independent sample validation (bias = 0.32 METs, RMSE = 1.90 METs). The error range is also narrower for the cross-validation compared with the independent sample validation, indicating the nnet performs better for individual activities (see Table 1). These data show the discrepancies that arise when different validation techniques are used and illustrate the need to validate PA measurement techniques with independent samples. Independent sample validation provides a clearer picture of method robustness.

Figure 1 shows the average measured and predicted METs for each activity when the nnet was trained on UMass and applied to UTenn. The closer an activity is to the line of identity, the better the nnet MET estimate is to the truth. The nnet that was trained on UMass and applied to UTenn tended to overestimate METs (positive bias), but this was not statistically significant overall (see Table 2). This is perhaps because the UTenn study included sedentary activities, and the UMass study did not. The UMass nnet returns a MET estimate of 1.98 METs when the counts in a minute are all zero.

In the present study, 12 activities are “different” between development and validation (track run, road run, aerobics, 15-lb. bag walk, load/unload boxes, moving dirt, track walk, Frisbee golf, road walk, reading, television, driving; see Table 2). The average RMSE for these activities is 2.25 METs. The average RMSE for activities that were “similar” between UMass and UTenn (ironing, gardening, laundry, light cleaning, trimming, tennis; see Table 2) is 1.32 METs. It is expected that the absolute errors would be larger for higher MET activities; the activities identified as being “different” had higher measured METs (mean = 4.66 METs) than the activities identified as being “similar” (mean = 3.64 METs). We also assessed this difference in terms of percent RMSE (measured METs/RMSE). Using this approach, activities identified as different had a mean percent RMSE of 70.8%, while similar activities had a mean percent RMSE of 29.4%. This supports the observation that the error was substantially larger for activities not used in the training data set. This issue is also discussed by Albinali et al. (3), who recommend “tuning” machine-learning algorithms to individual activities to improve the precision of activity type identification.

In our laboratory's original study (23), we suggested the nnet improved MET estimates compared with traditional regression approaches. This could not be conclusively confirmed, given that the nnet was cross-validated, while the traditional regressions were being tested on an independent sample. Table 2 presents the RMSE for the nnet method, the Freedson cut-point method (9), and the Crouter two-regression method (7), all using an independent sample for validation. These data support that the nnet improves MET estimates compared with simple regression. Although the improvement in RMSE was modest for the nnet in comparison to the regression models, across all activities, the nnet had the lowest RMSE, 1.90 METs compared with 2.07 METs [Freedson et al. (9)] and 1.97 METs [Crouter et al. (7)]. Both the nnet and the Crouter et al. (7) regression method had slightly positive biases (0.32 and 0.18 METs, respectively), indicating that they tend to overestimate METs on average, whereas the Freedson et al. (9) regression underestimated METs on average (bias = −0.95 METs). It is not surprising the nnetMET tended to overestimate METs, given that no sedentary behaviors were included in the training of the nnetMET. There were three UTenn activities below 1 MET (0.79–0.86 METs), where the nnetMET produced a substantial error (%RMSE = 149.0–196.2%). For comparison purposes, we removed these activities from the analysis and reevaluated the three prediction methods. The RMSE was slightly higher (1.99 METs), and bias was reduced to 0.10 METs, respectively (Table 2). These data further illustrate the difficulties of prediction models, where activities in the nnet training data set are not identical to those used in its nnetMET validation.

It is not clear as to why there was only a small improvement in the nnet RMSE in comparison to the RMSE from the regression models. One possible explanation is that there were several activities in the training data set that were not in the validation data sets. It is also possible that there is a limit to the size of improvements expected, given the finite range of activities performed. To address this knowledge gap, future machine-learning model development protocols should include a broad spectrum of activities, across the range of EE that represent activities performed in daily life.

The Actigraph accelerometers and the present data processing techniques were not designed to measure sedentary behavior. Recently, however, researchers have become increasingly interested in understanding the interaction between sedentary behavior and health. This shift has led to new challenges for the field of PA measurement. The nnets currently available do not identify sedentary behaviors, nor accurately estimate sedentary activity METs. Some researchers advocate using an “inactivity threshold” to identify and assign METs to sedentary behaviors. Nonetheless, sedentary behaviors often make up a large portion of an individual's day (16), and thus training the nnet to identify sedentary behaviors is an important next step.

A novel feature of the nnet methodology for measuring PA is the identification of activity type. We categorized activities into household, locomotion, and sport activity type categories. Table 4 presents a confusion matrix illustrating the percentage of activities that the nnetACT correctly classified as activities into these categories. The nnetACT trained on UMass correctly classified 80.9% of activities from UTenn. The nnet was successful at identifying household (98.1% correct) and locomotion (89.5% correct) activities in the UTenn study. A possible factor contributing to why the classification accuracy was higher for household activities compared with locomotion activities was that locomotion activities were treadmill based in the training data set and were performed on a track or road in the validation data set. Nevertheless, classification accuracy was high, despite these differences in locomotion protocols. The nnetACT did not perform well for sport activities (23.7% correct), which were often misclassified into the household activities category (69.5% of time). All of the UTenn sports, aerobics, Frisbee golf, and tennis were poorly classified. The UTenn aerobics and Frisbee golf activities were not included in the UMass study. The third sport in the UTenn study, tennis, was played solo against a wall in the UTenn study, while in the UMass study tennis was played with a partner. As the registry of activities used to train nnets expands both in terms of the types and intensities of activities included and the number of samples available for a given activity, improvement in identification of activity type will follow.

A major strength of this study is our use of separate development and validation samples. Although some activities were “similar,” the activity protocols were different between the two sites. Additionally, the overall study procedures and metabolic measurement equipment were different between UMass and UTenn (e.g., at UMass activities were performed for 7 min vs. 10 min at UTenn, Oxycon Mobile vs. Cosmed K4b2). By validating the nnet on a completely independent sample, we provide researchers with evidence how the nnet will perform when applied to other independent samples.

A second strength of this study is our use of a very large, diverse sample for the training data set. We also used a wide range of commonly performed locomotion, lifestyle, and sport activities. There will always be some level of interindividual variability in how activities are performed, but training the nnet on a broad range of activity types and intensities and on a sample with a range of physical characteristics increases the generalizability of the model. Another strength is the use of measured activity EE to compute METs as the criterion for the nnetMET model validation. Albalini and colleagues (3) used a different approach, where raw signals from multiple accelerometers were used in machine-learning algorithms to first identify activity type. They then applied the Compendium of Physical Activities (2) to estimate MET levels, which produced an underestimate of EE of 15–21%.

Our methodology has several limitations. The nnet cannot identify sedentary behaviors. Moving forward, inclusion of sedentary behaviors in the calibration and nnet training process should be a priority. A second limitation is that the results apply only to experimental conditions in a highly controlled laboratory data collection setting. Thus differences in protocol and criterion measures may alter nnet error estimates. Additionally, the nnet produces PA estimates on a minute-by-minute basis. Free-living behavior does not take place in minute increments; thus to apply the nnet to free-living settings, methodology advances need to include analytic procedures for identifying the end of one activity type and the beginning of the next activity type. One possible solution to this problem is to train the nnet to identify individual activity bouts and to then produce PA estimates for specific activities. It should also be noted that the nnet algorithms may only be applied to adults 20–60 yr of age. Future investigations should develop specific nnet algorithms for children and older adults using activities that are relevant in model development and validation for these age groups. We used the fixed denominator of 3.5 ml·kg−1·min−1 to compute activity METs. Although baseline resting metabolic rate is known to be influenced by such factors as age and fat-free mass, we used the standard of 1 MET = 3.5 ml·kg−1·min−1 to comply with recommendations for MET computation (2). A limitation of using the constant 3.5 ml·kg−1·min−1 in the denominator is evident in the UTenn validation data set with MET values for selected sedentary behaviors (driving, TV viewing, and reading) falling below 1.0 (see Table 2). As shown in the present study, use of this constant is particularly problematic and may lead to underestimates for computing METs for sedentary behaviors. Although the advantage of using the 3.5 ml·kg−1·min−1 constant standardizes the expression of METs, future studies should consider this limitation in light of individual differences in resting metabolic rate.

Finally, this analysis uses derived activity counts to produce the nnet prediction models. Future studies should employ raw acceleration features as nnet input variables to provide a universal metric for accelerometer sensor output. However, given that currently there is pervasive use of accelerometers employing integrated outputs (e.g., counts/min), nnets developed from integrated accelerometer signals remain useful.

In summary, we developed and trained nnets to estimate METs, classify activity intensity, and identify activity type. We validated these nnets on an independent sample, performing activities that were not identical to the training data set, and we compared the nnetMET results to regression models. Our nnet produced a lower bias and RMSE than the regression models in estimating METs. The intensity classification from the nnetMET was reasonably accurate, and we were successful in identifying activity type using the nnetACT for household and locomotion activities. Further advancement of these techniques will require algorithm modification to estimate sedentary behaviors and to identify specific activity bouts under free-living conditions. The nnetMET models only predict absolute intensity prediction, and further work is warranted to extend this approach to address relative intensity predictions. We also recommend the development of an open-access PA registry, where accelerometer and metabolic data from a broad array of activities are created. This will facilitate refinement and improvement of machine-learning algorithms for prediction of activity EE and activity type identification.


P. Freedson is a member of the Scientific Advisory Board for Actigraph. This is the activity monitor that was used in this study. She receives an annual honorarium as a member of this board.


The authors thank the graduate and undergraduate students for assistance with data collection and the subjects for participation. The authors thank Dr. David Bassett Jr. for providing the University of Tennessee data for independent sample validation.


1. ActiGraph Actisoft Analysis Software 3.2 User's Manual. Fort Walton Beach, FL: MTI Health Services, 2005, p. 17
2. Ainsworth BE, Haskell WL, Whitt MC, Irwin ML, Swartz AM, Strath SJ, O'Brien WL, Bassett DR, Schmitz KH, Emplaincourt PO, Jacobs DR, Leon AS. Compendium of physical activities: an update of activity codes and MET intensities. Med Sci Sports Exerc 32, Suppl: S498–S516, 2000. [PubMed]
3. Albinali S, Intille S, Haskell W, Rosenberger M. Using wearable activity type detection to improve physical activity energy expenditure estimation. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing, New York: ACM, 2010, p. 311–320
4. Bassett DR, Jr, Ainsworth BE, Swartz AM, Strath SJ, O'Brien WL, King GA. Validity of four motion sensors in measuring moderate intensity physical activity. Med Sci Sports Exerc 32, Suppl: S471–S480, 2000. [PubMed]
5. Butte NF, Wong WW, Adolph AL, Puyau MR, Vohra FA, Zakari IF. Validation of cross-sectional time series and multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents using doubly labeled water. J Nutr 140: 1516–1523, 2010. [PMC free article] [PubMed]
6. Chang KH, Chen MY, Canny J. Tracking free-weight exercises. In: Ubicomp 2007: Ubiquitous Computing, edited by Krumm J, Abowd GD, Seneviratne A, Strang T, editors. Innesbook, Austria: Springer, 2007, p. 19–37
7. Crouter SE, Clowers KG, Bassett DR., Jr A novel method for using accelerometer data to predict energy expenditure. J Appl Physiol 100: 1324–1331, 2006. [PubMed]
8. de Vries SI, Garre FG, Engbers LH, Hildebrandt VH, Van Buuren S. Evaluation of neural networks to identify types of activity using accelerometers. Med Sci Sports Exerc 43: 101–107, 2011. [PubMed]
9. Freedson P, Melanson E, Sirard J. Calibration of the Computer Science and Applications accelerometer. Med Sci Sports Exerc 30: 777–781, 1998. [PubMed]
10. Kozey SL, Staudenmayer JW, Troiano RP, Freedson PS. Comparison of the Actigraph 7164 and the Actigraph GT1M during self-paced locomotion. Med Sci Sports Exerc 42: 971–976, 2010. [PMC free article] [PubMed]
11. Kozey SL, Lyden K, Howe CA, Staudenmayer JW, Freedson PS. Accelerometer output and MET values of common physical activities. Med Sci Sports Exerc 42: 1776–1784, 2010. [PMC free article] [PubMed]
12. Kozey-Keadle SL, Libertine A, Lyden K, Staudenmayer J, Freedson P. Validation of wearable monitors for assessing sedentary behavior. Med Sci Sports Exerc 43: 1561–1567, 2011. [PubMed]
13. Lester J, Choudhury T, Kern N, Borriello G, Hanneford B. A hybrid discriminative/generative approach to recognizing physical activities. In: Proceedings of the 19th International Joint Conferences on Artificial Intelligence Edinburgh: IJCAI, 2005, p. 766–772
14. Lyden K, Kozey SL, Staudenmayer JW, Freedson PS. A comprehensive evaluation of commonly used accelerometer energy expenditure and MET prediction equations. Eur J Appl Physiol 111: 187–201, 2011. [PMC free article] [PubMed]
15. Matthews CE. Calibration of accelerometer output for adults. Med Sci Sports and Exerc 37, Suppl: S512–S522, 2005. [PubMed]
16. Matthews CE, Chen KY, Freedson PS, Buchowski MS, Beech BM, Pate RR, Troiano RP. Amount of time in sedentary behaviors in the United States, 2003–2004. Am J Epidemiol 167: 875–881, 2008. [PMC free article] [PubMed]
17. Perret C, Mueller G. Validation of a new portable ergospirometric device (Oxycon Mobile) during exercise. Int J Sports Med 27: 363–367, 2006. [PubMed]
18. Pober DM, Staudenmayer J, Raphael C, Freedson PS. Development of novel techniques to classify physical activity mode using accelerometers. Med Sci Sports Exerc 38: 1626–1634, 2006. [PubMed]
19. Preece SJ, Goulermas JY, Kenney LPJ, Howard D, Meijer K, Crompton R. Activity identification using body-mounted sensors–a review of classification techniques. Physiol Meas 30: R1–R33, 2009. [PubMed]
20. R Core Development Team R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, 2009
21. Rothney MP, Neumann M, Beziat A, Chen KY. An artificial neural network model of energy expenditure using nonintegrated acceleration signals. J Appl Physiol 103: 1419–1427, 2007. [PubMed]
22. Rothney MP, Schaefer EV, Neumann MM, Choi L, Chen KY. Validity of physical activity intensity predictions by Actigraph, Actical and RT3 accelerometers. Obesity (Silver Spring) 16: 1946–1952, 2008. [PMC free article] [PubMed]
23. Staudenmayer J, Pober D, Crouter S, Bassett D, Freedson P. An artificial neural network to estimate physical activity energy expenditure and identify physical activity type from an accelerometer. J Appl Physiol 107: 1300–1307, 2009. [PMC free article] [PubMed]
24. Swartz AM, Strath SJ, Bassett DR, Jr, O'Brien WL, King GA, Ainsworth BE. Estimation of energy expenditure using CSA accelerometers at hip and wrist sites. Med Sci Sports Exerc 32, Suppl: S450–S456, 2000. [PubMed]

Articles from Journal of Applied Physiology are provided here courtesy of American Physiological Society
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • EST
    Published EST sequences
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...