Format

Send to

Choose Destination
See comment in PubMed Commons below
Stat Methods Med Res. 2016 Aug;25(4):1313-29. doi: 10.1177/0962280213480420. Epub 2013 Apr 16.

Two-stage sampling designs for external validation of personal risk models.

Author information

1
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA alicesw@stanford.edu.
2
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA.

Abstract

We propose a cost-effective sampling design and estimating procedure for validating personal risk models using right-censored cohort data. Validation involves using each subject's covariates, as ascertained at cohort entry, in a risk model (specified independently of the data) to assign him/her a probability of an adverse outcome within a future time period. Subjects are then grouped according to the magnitudes of their assigned risks, and within each group, the mean assigned risk is compared with the probability of outcome occurrence as estimated using the follow-up data. Such validation presents two complications. First, in the presence of right-censoring, estimating the probability of developing the outcomes before death requires competing risk analysis. Second, for rare outcomes, validation using the full cohort requires assembling covariates and assigning risks to thousands of subjects. This can be costly when some covariates involve analyzing biological specimens. A two-stage sampling design addresses this problem by assembling covariates and assigning risks only to those subjects most informative for estimating key parameters. We use this design to estimate the outcome probabilities needed to evaluate model performance and we provide theoretical and bootstrap estimates of their variances. We also describe how to choose two-stage designs with minimal efficiency loss for a parameter of interest when the quantities determining optimality are unknown at the time of design. We illustrate these methods by using subjects in the California Teachers Study to validate ovarian cancer risk models. We find that a design with optimal efficiency for one performance parameter need not be so for others, and trade-offs will be required. A two-stage design that samples all outcome-positive subjects and more outcome-negative than censored subjects will perform well in most circumstances. The methods are implemented in Risk Model Assessment Program, an R program freely available at http://med.stanford.edu/epidemiology/two-stage.html.

KEYWORDS:

Bootstrap; calibration; competing risks; discrimination; personal risk models; two-stage sampling

PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Atypon Icon for PubMed Central
    Loading ...
    Support Center