Format

Send to

Choose Destination
Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2578-2583. doi: 10.1073/pnas.1708283115. Epub 2018 Mar 12.

Training replicable predictors in multiple studies.

Author information

1
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215.
2
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115.
3
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215; gp@jimmy.harvard.edu.

Abstract

This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.

KEYWORDS:

cross-study validation; ensemble learning; machine learning; replicability; validation

PMID:
29531060
PMCID:
PMC5856504
DOI:
10.1073/pnas.1708283115
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center