Format

Send to:

Choose Destination
See comment in PubMed Commons below
J Biopharm Stat. 2008;18(5):841-52. doi: 10.1080/10543400802277967.

A predictive risk probability approach for microarray data with survival as an endpoint.

Author information

  • 1Biostatistics Division, Moffitt Cancer Center & Research Institute, University of South Florida, Tampa, Florida 33612, USA. Dung-Tsa.Chen@moffitt.org

Abstract

Gene expression profiling has played an important role in cancer risk classification and has shown promising results. Since gene expression profiling often involves determination of a set of top rank genes for analysis, it is important to evaluate how modeling performance varies with the number of selected top ranked genes incorporated in the model. We used a colon data set collected at Moffitt Cancer Center as an example of the study, and ranked genes based on the univariate Cox proportional hazards model. A set of top ranked genes was selected for evaluation. The selection was done by choosing the top k ranked genes for k = 1 to 12,500. An analysis indicated a considerable variation of classification outcomes when the number of top ranked genes was changed. We developed a predictive risk probability approach to accommodate this variation by identifying a range number of top ranked genes. For each number of top ranked genes, the procedure classifies each patient as having high risk (score = 1) or low risk (score = 0). The categorizations are then averaged, giving a risk score between 0 and 1, thus providing a ranking for the patient's need for further treatment. This approach was applied to the colon data set and demonstrated the strength of this approach by three criteria: First, a univariate Cox proportional hazards model showed a highly statistically significant level (log-rank chi(2) statistics = 110 with p-value <10(-16)) for the predictive risk probability classification. Second, the survival tree model used the risk probability to partition patients into five risk groups showing a good separation of survival curves (log-rank chi(2) statistics = 215). In addition, utilization of the risk group status identified a small set of risk genes that may be practical for biological validation. Third, analysis of resampling the risk probability suggested the variation pattern of the log-rank chi(2) in the colon cancer data set was unlikely caused by chance.

PMID:
18781520
[PubMed - indexed for MEDLINE]
PMCID:
PMC2717790
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Atypon Icon for PubMed Central
    Loading ...
    Write to the Help Desk