We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

Results: 5

1.
Figure 4

Figure 4. From: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Correlation between internal and external validation is dependent on data analysis team. Pearson correlation coefficients between internal and external validation performance in terms of MCC are displayed for the 14 teams that submitted models for all 13 endpoints in both the original (x axis) and swap (y axis) analyses. The unusually low correlation in the swap analysis for DAT3, DAT11 and DAT36 is a result of their failure to accurately predict the positive endpoint H, likely due to operator errors (Supplementary Table 6).

. Nat Biotechnol. ;28(8):827-838.
2.
Figure 3

Figure 3. From: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Performance, measured using MCC, of the best models nominated by the 17 data analysis teams (DATs) that analyzed all 13 endpoints in the original training-validation experiment. The median MCC value for an endpoint, representative of the level of predicability of the endpoint, was calculated based on values from the 17 data analysis teams. The mean MCC value for a data analysis team, representative of the team’s proficiency in developing predictive models, was calculated based on values from the 11 non-random endpoints (excluding negative controls I and M). Red boxes highlight candidate models. Lack of a red box in an endpoint indicates that the candidate model was developed by a data analysis team that did not analyze all 13 endpoints.

. Nat Biotechnol. ;28(8):827-838.
3.
Figure 2

Figure 2. From: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Model performance on internal validation compared with external validation. (a) Performance of 18,060 models that were validated with blinded validation data. (b) Performance of 13 candidate models. r, Pearson correlation coefficient; N, number of models. Candidate models with binary and continuous prediction values are marked as circles and squares, respectively, and the standard error estimate was obtained using 500-times resampling with bagging of the prediction results from each model. (c) Distribution of MCC values of all models for each endpoint in internal (left, yellow) and external (right, green) validation performance. Endpoints H and L (sex of the patients) are included as positive controls and endpoints I and M (randomly assigned sample class labels) as negative controls. Boxes indicate the 25% and 75% percentiles, and whiskers indicate the 5% and 95% percentiles.

. Nat Biotechnol. ;28(8):827-838.
4.
Figure 1

Figure 1. From: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Experimental design and timeline of the MAQC-II project. Numbers (1–11) order the steps of analysis. Step 11 indicates when the original training and validation data sets were swapped to repeat steps 4–10. See main text for description of each step. Every effort was made to ensure the complete independence of the validation data sets from the training sets. Each model is characterized by several modeling factors and seven internal and external validation performance metrics (Supplementary Tables 1 and 2). The modeling factors include: (i) organization code; (ii) data set code; (iii) endpoint code; (iv) summary and normalization; (v) feature selection method; (vi) number of features used; (vii) classification algorithm; (viii) batch-effect removal method; (ix) type of internal validation; and (x) number of iterations of internal validation. The seven performance metrics for internal validation and external validation are: (i) MCC; (ii) accuracy; (iii) sensitivity; (iv) specificity; (v) AUC; (vi) mean of sensitivity and specificity; and (vii) r.m.s.e. s.d. of metrics are also provided for internal validation results.

. Nat Biotechnol. ;28(8):827-838.
5.
Figure 5

Figure 5. From: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Effect of modeling factors on estimates of model performance. (a) Random-effect models of external validation performance (MCC) were developed to estimate a distinct variance component for each modeling factor and several selected interactions. The estimated variance components were then divided by their total in order to compare the proportion of variability explained by each modeling factor. The endpoint code contributes the most to the variability in external validation performance. (b) The BLUP plots of the corresponding factors having proportion of variation larger than 1% in a. Endpoint abbreviations (Tox., preclinical toxicity; BR, breast cancer; MM, multiple myeloma; NB, neuroblastoma). Endpoints H and L are the sex of the patient. Summary normalization abbreviations (GA, genetic algorithm; RMA, robust multichip analysis). Classification algorithm abbreviations (ANN, artificial neural network; DA, discriminant analysis; Forest, random forest; GLM, generalized linear model; KNN, K-nearest neighbors; Logistic, logistic regression; ML, maximum likelihood; NB, Naïve Bayes; NC, nearest centroid; PLS, partial least squares; RFE, recursive feature elimination; SMO, sequential minimal optimization; SVM, support vector machine; Tree, decision tree). Feature selection method abbreviations (Bscatter, between-class scatter; FC, fold change; KS, Kolmogorov-Smirnov algorithm; SAM, significance analysis of microarrays).

. Nat Biotechnol. ;28(8):827-838.

Supplemental Content

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Write to the Help Desk