Send to

Choose Destination
J Appl Physiol (1985). 2018 May 1;124(5):1284-1293. doi: 10.1152/japplphysiol.00760.2017. Epub 2018 Jan 25.

Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer.

Author information

Department of Integrative Physiology and Health Science, Alma College, Alma, Michigan.
Clinical Exercise Physiology Program, Ball State University, Muncie, Indiana.
Department of Mathematics and Computer Science, Alma College, Alma, Michigan.
Department of Kinesiology, Michigan State University , East Lansing, Michigan.


Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3-78.5%) had higher accuracy than other machine learning models (70.9-76.4%) and accuracy similar to combination methods (77.0-77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6-92.8% vs. 87.8-91.9% in MSU data set; 79.3-80.2% vs. 76.7-78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0-77.4% vs. 74.1-79.1% in MSU data set; 76.1-77.0% vs. 75.5-77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5-78.5% vs. 53.6-70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.


GENEActiv; artificial neural network; decision tree; random forest; support vector machine

[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center