Calculating stage duration statistics in multistage diseases

PLoS One. 2011;6(12):e28298. doi: 10.1371/journal.pone.0028298. Epub 2011 Dec 7.

Abstract

Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.

MeSH terms

  • Algorithms
  • Alzheimer Disease / diagnosis
  • Alzheimer Disease / pathology
  • Computational Biology
  • Data Interpretation, Statistical*
  • Dementia / diagnosis
  • Dementia / pathology
  • Disease Progression
  • Humans
  • Linear Models
  • Longitudinal Studies
  • Models, Statistical
  • Probability
  • Prognosis
  • Reproducibility of Results
  • Time Factors