• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of amjphAmerican Journal of Public Health Web SiteAmerican Public Health Association Web SiteSubmissionsSubscriptionsAbout Us
Am J Public Health. 1989 March; 79(3): 340–349.
PMCID: PMC1349563

Modeling and variable selection in epidemiologic analysis.


This paper provides an overview of problems in multivariate modeling of epidemiologic data, and examines some proposed solutions. Special attention is given to the task of model selection, which involves selection of the model form, selection of the variables to enter the model, and selection of the form of these variables in the model. Several conclusions are drawn, among them: a) model and variable forms should be selected based on regression diagnostic procedures, in addition to goodness-of-fit tests; b) variable-selection algorithms in current packaged programs, such as conventional stepwise regression, can easily lead to invalid estimates and tests of effect; and c) variable selection is better approached by direct estimation of the degree of confounding produced by each variable than by significance-testing algorithms. As a general rule, before using a model to estimate effects, one should evaluate the assumptions implied by the model against both the data and prior information.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (2.3M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Vandenbroucke JP. Should we abandon statistical modeling altogether? Am J Epidemiol. 1987 Jul;126(1):10–13. [PubMed]
  • Gordon T. Editorial: Hazards in the use of the logistic function with special reference to data from prospective cardiovascular studies. J Chronic Dis. 1974 Mar;27(3):97–102. [PubMed]
  • Rothman KJ. Epidemiologic methods in clinical trials. Cancer. 1977 Apr;39(4 Suppl):1771–1775. [PubMed]
  • Dales LG, Ury HK. An improper use of statistical significance testing in studying covariables. Int J Epidemiol. 1978 Dec;7(4):373–375. [PubMed]
  • Greenland S, Neutra R. Control of confounding in the assessment of medical technology. Int J Epidemiol. 1980 Dec;9(4):361–367. [PubMed]
  • Miettinen OS. Standardization of risk ratios. Am J Epidemiol. 1972 Dec;96(6):383–388. [PubMed]
  • Greenland S. Interpretation and estimation of summary ratios under heterogeneity. Stat Med. 1982 Jul-Sep;1(3):217–227. [PubMed]
  • Walker AM, Rothman KJ. Models of varying parametric form in case-referent studies. Am J Epidemiol. 1982 Jan;115(1):129–137. [PubMed]
  • Breslow NE, Storer BE. General relative risk functions for case-control studies. Am J Epidemiol. 1985 Jul;122(1):149–162. [PubMed]
  • Moolgavkar SH, Venzon DJ. General relative risk regression models for epidemiologic studies. Am J Epidemiol. 1987 Nov;126(5):949–961. [PubMed]
  • Lagakos SW. Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat Med. 1988 Jan-Feb;7(1-2):257–274. [PubMed]
  • Greenland S. Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med. 1983 Apr-Jun;2(2):243–251. [PubMed]
  • Pregibon D. Data analytic methods for matched case-control studies. Biometrics. 1984 Sep;40(3):639–651. [PubMed]
  • Doll R. An epidemiological perspective of the biology of cancer. Cancer Res. 1978 Nov;38(11 Pt 1):3573–3583. [PubMed]
  • Haber M, Longini IM, Jr, Cotsonis GA. Models for the statistical analysis of infectious disease data. Biometrics. 1988 Mar;44(1):163–173. [PubMed]
  • Robins JM, Greenland S. The role of model selection in causal inference from nonexperimental data. Am J Epidemiol. 1986 Mar;123(3):392–402. [PubMed]
  • Hauck WW, Anderson S. A proposal for interpreting and reporting negative studies. Stat Med. 1986 May-Jun;5(3):203–209. [PubMed]
  • Miettinen OS, Cook EF. Confounding: essence and detection. Am J Epidemiol. 1981 Oct;114(4):593–603. [PubMed]
  • Fleiss JL. Significance tests have a role in epidemiologic research: reactions to A. M. Walker. Am J Public Health. 1986 May;76(5):559–560. [PMC free article] [PubMed]
  • Siemiatycki J, Thomas DC. Biological models and statistical interactions: an example from multistage carcinogenesis. Int J Epidemiol. 1981 Dec;10(4):383–387. [PubMed]
  • Greenland S, Poole C. Invariants and noninvariants in the concept of interdependent effects. Scand J Work Environ Health. 1988 Apr;14(2):125–129. [PubMed]
  • Flanders WD, Rhodes PH. Large sample confidence intervals for regression standardized risks, risk ratios, and risk differences. J Chronic Dis. 1987;40(7):697–704. [PubMed]
  • Greenland S. Multivariate estimation of exposure-specific incidence from case-control studies. J Chronic Dis. 1981;34(9-10):445–453. [PubMed]
  • Liang KY. Extended Mantel-Haenszel estimating procedure for multivariate logistic regression models. Biometrics. 1987 Jun;43(2):289–299. [PubMed]
  • Walter SD, Feinstein AR, Wells CK. Coding ordinal independent variables in multiple regression analyses. Am J Epidemiol. 1987 Feb;125(2):319–323. [PubMed]

Articles from American Journal of Public Health are provided here courtesy of American Public Health Association


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...