Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2011 May 13;12:153. doi: 10.1186/1471-2105-12-153.

To aggregate or not to aggregate high-dimensional classifiers.

Author information

  • 1Biosystems Data Analysis group, University of Amsterdam, Amsterdam, The Netherlands.

Abstract

BACKGROUND:

High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data.

RESULTS:

Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets.

CONCLUSIONS:

The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed.

PMID:
21569498
[PubMed - indexed for MEDLINE]
PMCID:
PMC3113942
Free PMC Article

Images from this publication.See all images (6)Free text

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Write to the Help Desk