Send to:

Choose Destination
See comment in PubMed Commons below
J Biopharm Stat. 2008;18(5):841-52. doi: 10.1080/10543400802277967.

A predictive risk probability approach for microarray data with survival as an endpoint.

Author information

  • 1Biostatistics Division, Moffitt Cancer Center & Research Institute, University of South Florida, Tampa, Florida 33612, USA.


Gene expression profiling has played an important role in cancer risk classification and has shown promising results. Since gene expression profiling often involves determination of a set of top rank genes for analysis, it is important to evaluate how modeling performance varies with the number of selected top ranked genes incorporated in the model. We used a colon data set collected at Moffitt Cancer Center as an example of the study, and ranked genes based on the univariate Cox proportional hazards model. A set of top ranked genes was selected for evaluation. The selection was done by choosing the top k ranked genes for k = 1 to 12,500. An analysis indicated a considerable variation of classification outcomes when the number of top ranked genes was changed. We developed a predictive risk probability approach to accommodate this variation by identifying a range number of top ranked genes. For each number of top ranked genes, the procedure classifies each patient as having high risk (score = 1) or low risk (score = 0). The categorizations are then averaged, giving a risk score between 0 and 1, thus providing a ranking for the patient's need for further treatment. This approach was applied to the colon data set and demonstrated the strength of this approach by three criteria: First, a univariate Cox proportional hazards model showed a highly statistically significant level (log-rank chi(2) statistics = 110 with p-value <10(-16)) for the predictive risk probability classification. Second, the survival tree model used the risk probability to partition patients into five risk groups showing a good separation of survival curves (log-rank chi(2) statistics = 215). In addition, utilization of the risk group status identified a small set of risk genes that may be practical for biological validation. Third, analysis of resampling the risk probability suggested the variation pattern of the log-rank chi(2) in the colon cancer data set was unlikely caused by chance.

[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Atypon Icon for PubMed Central
    Loading ...
    Write to the Help Desk