Send to

Choose Destination
Biostatistics. 2009 Apr;10(2):282-96. doi: 10.1093/biostatistics/kxn035. Epub 2008 Nov 27.

A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions.

Author information

National Cancer Institute, 6130 Executive Boulevard, EPN Room 8124, Rockville, MD 20892, USA.


Constructing a confidence interval for the actual, conditional error rate of a prediction rule from multivariate data is problematic because this error rate is not a population parameter in the traditional sense--it is a functional of the training set. When the training set changes, so does this "parameter." A valid method for constructing confidence intervals for the actual error rate had been previously developed by McLachlan. However, McLachlan's method cannot be applied in many cancer research settings because it requires the number of samples to be much larger than the number of dimensions (n >> p), and it assumes that no dimension-reducing feature selection step is performed. Here, an alternative to McLachlan's method is presented that can be applied when p >> n, with an additional adjustment in the presence of feature selection. Coverage probabilities of the new method are shown to be nominal or conservative over a wide range of scenarios. The new method is relatively simple to implement and not computationally burdensome.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center