• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of annfammedLink to Publisher's site
Ann Fam Med. May 2005; 3(Suppl 1): s46–s51.
PMCID: PMC1466952

Longitudinal Research and Data Collection in Primary Care


PURPOSE This article reviews examples of and experience with longitudinal research in family medicine. The objective is to use this empirical information to formulate recommendations for improving longitudinal research.

METHODS The article discusses 3 longitudinal studies from the Nijmegen academic family practice research network: 1 on the prognosis of depression and 1 each on the prognosis of and outcomes of care for type 2 diabetes mellitus. The Nijmegen network has recorded all episodes of morbidity encountered in Dutch family medicine since 1971 in a stable practice population. This network’s experience is evaluated to identify lessons that may help other practice-based research networks (PBRNs) in pursuing longitudinal research.

RESULTS In terms of external conditions (conditions related to the general setting), the stability of a population and a high level of continuity of care substantially enhance the ability to perform longitudinal research. In terms of internal conditions (conditions related to the PBRN), motivation of family physicians and their staff to conduct ongoing data collection, and their ownership of the data are key for success. Other critical internal conditions include standardization of data; collection of data by clinician-friendly means; training of family physicians and their staff in data collection, as well as meetings for discussion of this task; provision of feedback to practices on the research findings; use of standard procedures to promote adherence to data collection; availability of facilities for regular measurement of patients’ health status or chart review; and use of mechanisms for tracking patients who leave the practice area.

CONCLUSIONS Insight from existing experience suggests that longitudinal research can be enhanced in PBRNs. The best way forward is to build longitudinal data collection by drawing on lessons from successful studies. Primary care research policy should advocate for a role of longitudinal research and stimulate its development in PBRNs under favorable population circumstances.

Keywords: Longitudinal studies, PBRNs, primary health care, family practice, research design, databases, long-term care, historical cohort studies


This article reviews the necessary conditions for conducting longitudinal research in the family medicine setting and suggests possible ways of improving this research. It analyzes the research infrastructure needed to study patients, their illnesses, and their care over time. Three examples of longitudinal studies using a family practice database are presented to illustrate inherent problems and possible solutions.

The aim of this article is to recommend ways of improving longitudinal research in family practice, in part by making strategic choices, ie, by tapping into family medicine populations that have favorable conditions in terms of stability of the population and continuity of care, and in part by promoting better research methodology and better structuring of databases.


Family physicians (FPs) provide primary medical care for patients in the community. The access of patients to this care for any health problem and the professional working relationship with patients over time (continuity of care) form the basis of FPs’ preventive and therapeutic interventions.1,2 To provide care with an eye to patients’ futures, clinicians must have evidence on the long-term effects of preventive and therapeutic interventions.3 This need is the impetus behind longitudinal research.

Unfortunately, longitudinal research is underappreciated, and the conditions of care often pose challenges to such research: health care systems connect patients and providers for episodic rather than ongoing care, while the geographic mobility of patients and FPs hampers the establishment of lasting working relationships. As a consequence, the research infrastructure needed to study health problems in their long-term context is poorly developed.

Primary Care Practice-Based Research Networks

Primary care practice-based research networks (PBRNs) have emerged as the infrastructure for research in family medicine.35 PBRNs can tap into the continuity of patient care and extend the time window of research beyond the few years usually covered by research projects. The long-term natural history of disease and the outcome of care are essential pieces of information in assessing the effectiveness of family practice.

PBRNs are driven by the research interests of practitioners, resulting in their ownership of research. This ownership enhances a long-term commitment to data collection. But consistent data collection over time and ongoing adherence to study protocols also require ensuring that data are collected in a methodologically rigorous way; furthermore, linking PBRNs to a research center or university5,6 is particularly important for longitudinal research. Models of successful research in family practice clearly show the possibility of training of FPs and their staff in data collection and introducing a scientific esprit de corps in this setting.711

Structuring Longitudinal Data in Primary Care

PBRNs constitute a multicenter research setting, and standardization of data and terminology within networks is therefore essential. Standardization is particularly important for longitudinal research: data must not only be consistent across different study sites, but even more important, must be consistent over time.

To structure data longitudinally, information on visits and contacts must be organized into “episodes of illness”12 that can in turn be linked over time to individuals. A first prerequisite is to classify each health problem encountered during practice visits as either a new problem or part of an established problem, and to link the data from multiple practice visits into episodes of illness. The International Classification of Primary Care (ICPC)12 offers a framework to structure episodes, and this framework can be used even without concomitant use of the ICPC classification for recording relevant information, such as physician contact, diagnosis, and diagnostic and therapeutic procedures. This approach is used, for example, in the Nijmegen database,5 from which a number of examples are presented below.

Using both ICPC components has advantages, however, as it helps to further structure the clinical information. In Dutch family practice, for example, the recording of information has been made easier because the ICPC has been used to structure the electronic medical record; the result is a user-friendly way of collecting and recording data under routine conditions of care. In particular, for disease-specific research, ICPC offers diagnostic criteria13 that are applicable under primary care conditions.

A second prerequisite for structuring longitudinal data is to assign episodes to individual patients, for example, through a unique personal identification code. The process can be refined by adding patients’ socioeconomic characteristics and by classifying individuals living in the same household as families. This approach is likewise facilitated in Dutch databases because the health care system works with FPs’ personal lists of patients, and whole families usually register with the same FP.


Below, 3 examples of longitudinal studies from the Nijmegen academic family practice research network5,711 are discussed to illustrate a number of challenges in longitudinal research: (1) ensuring that the database can bridge time, (2) assessing how representative the data are of family practice at large, (3) maintaining scientific quality control of the data, and (4) assessing how quality and consistency of patient care may influence research results.

Research Setting

The Nijmegen academic family practice research network was founded in 1971 in 4 practices to record all episodes of morbidity for which patients consulted FPs (including those for diagnoses made by specialists after referral) and cause of death among these patients. This recording, which takes place in a stable practice population of approximately 12,000 people, has continued ever since; consequently, the data set that has developed enables the tracking of individuals’ medical histories for more than 30 years. Since 1986, the 4 practices, together with 5 other practices in the region, have been recording all data related to the process and outcomes of care among patients with chronic diseases (diabetes mellitus, hypertension, and asthma and chronic obstructive pulmonary disease) and have been giving practices and FPs structured feedback on these measures.8

The database is a key component of the Nijmegen family medicine research program of longitudinal research among patients with chronic diseases. The impetus for establishing this database was the need to access previously unavailable empirical morbidity data from family practice, at the time of founding of the Department of Family Medicine at Nijmegen University. At that time, the research interest was in the development of morbidity in families and over generations14; of note, stability of the practice population and continuity of care were such self-evident features that they were taken for granted. In hindsight and with evidence of continued stability of this population15 at a time of increasing geographic mobility in the Dutch population, no better location could have been chosen for longitudinal data collection.

A number of measures have been taken to ensure consistent recording and classification of information in the database over time:

  • Since founding of the network, the classification for morbidity has been unchanged; conditions are classified using the Dutch translation of the E-book.16
  • All FPs in the network meet regularly to discuss and compare their approaches to registering patients and classifying conditions; in the event of disagreement, consensus is sought and formulated in the registration rules. The comparability of FPs’ performance is checked using case vignettes. These meetings remain important despite the lengthy experience of most FPs in the network and their use of the same classification.
  • New FPs joining the practices are trained in the use of the classification and the registration rules.
  • Practice assistants are trained and regularly supervised in the assignment of unique patient- and family-identifying codes, and in the entry of patients’ social and demographic information.
  • In every practice, practice assistants ensure that recorded data are transported from the practice to the central database.

All patients on the practices’ lists are informed of the use of the database for research and asked to provide written consent. If a patient leaves the practice, the patient’s new address and the name and address of the patient’s new FP are recorded to enable future contact.

Example 1: Depression Recurrence Among Family Practice Patients

Depression is a common chronic condition in family practice for which long-term treatment with antidepressant medication is recommended to prevent a recurrence. As this recommendation is based on research among patients referred for psychiatric care, the aim of the study undertaken with the Nijmegen database was to establish the incidence of recurrence after a first episode of depression among patients treated in family practice.

In looking for an alternative to a long-term prospective study, the investigators considered analyzing data from the Nijmegen family medicine database, which makes it possible to identify patients in whom depression was diagnosed up to 20 years earlier. The investigators therefore undertook a historic cohort study (described in a later section). From the database, they enrolled all patients who had experienced a first episode of depression between 1971 and 1986. Selection of this time period allowed for a follow-up of at least 10 years after the first episode.9

A major challenge was to determine whether all patients had had major depression. For the study findings to be relevant, it was essential that the condition was depression as it is currently understood and defined. It was not possible to assess the criteria used to make the diagnosis through a chart review because FPs only occasionally recorded such information. For that reason, the investigators used a proxy of diagnostic accuracy of depression by the FPs, assessed through psychiatric interviews with patients with recently diagnosed depression. This evaluation showed that in most cases, the episode had fulfilled the diagnostic criteria of major depression according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition.11 To assess the current health status and quality of life of patients enrolled in the study, all were sent a set of questionnaires.

The study found that 60% of patients in whom depression had been diagnosed did not have a recurrent episode in the 10 years thereafter—a rate higher than expected based on studies in psychiatry.9 Results obtained with the questionnaires showed that depression nonetheless continued to have a major adverse impact on patients’ quality of life years later, even with no recurrence.17

Example 2: Cardiovascular Complications Among Patients With Diabetes Mellitus

The aim of the second study using the Nijmegen database was to assess the risk of cardiovascular complications among patients with type 2 diabetes mellitus being treated in family practice. At the time of the study in 1989, this outcome was largely undocumented.

The first challenge of this study was to determine the medical history of patients since the diagnosis of diabetes. Again, the investigators formed a historic cohort, this time one of all patients in the database with diabetes mellitus diagnosed between 1971 and 1989. The complete medical history after diagnosis had been recorded and coded routinely in the database for all of the patients. The data therefore allowed a follow-up from the time of diagnosis until (1) the end of the observation period in 1994, (2) death of the patient, or (3) departure of the patient from the practice. With this approach, 265 patients were enrolled, and the maximum observation time since diagnosis was 23 years. For each diabetic patient (case), the investigators selected a nondiabetic patient (control) matched for age, sex, and social class who received care from the same FP.

The second challenge was determining whether all patients selected truly had diabetes mellitus. This methodologic question was particularly important because in 1985, shortly before the design of the study—but in the middle of the historic observation period—the diagnostic criteria for diabetes mellitus changed.18 The question was therefore whether patients given a diagnosis of diabetes by their FP, particularly before 1985, had diabetes mellitus according to the 1985 criteria.

The investigators undertook a chart review of all patients enrolled. Using all written notes and laboratory reports, they established that more than 95% of the cases fulfilled the reference criteria released in 1985.7,10,18 A comparison of cases fulfilling these criteria with their matched controls demonstrated elevated risks of cardiovascular morbidity and mortality among the cases, and consequently a poor prognosis of diabetes mellitus type 2 in the family practice setting.7

Example 3: Diabetes Care in Academic Family Practices

In a follow-up of the analysis of the diabetes cohort,7 the outcomes of treatment of diabetes mellitus in family practice were studied and compared with external criteria.8

An audit-and-feedback system was introduced in the network in 1992 to improve diabetes care. Using the database, the investigators assessed process of care and outcomes of care in all patients with diabetes mellitus 1 year later (1993) and again 7 years later (1999). They compared these measures with those outlined in the Dutch College of General Practitioners’ guidelines for diabetes mellitus19 and with those of a state-of-the-art randomized clinical trial.20

Between 1993 and 1999, outcomes improved substantially. By 1999, blood glucose levels were adequately controlled in 52% of patients, blood lipid levels in 83%, and systolic and diastolic blood pressure in 54% and 66%, respectively.8 These percentages were in the same order as those achieved under the conditions used in the randomized trial.20

The challenge in this study was the interpretation of the findings. The investigators concluded that high-quality diabetes treatment was feasible in family practice. But given the self-selection of FPs in the network and their academic setting, the findings were not generalizable to unselected FPs, whose diabetic patients had poorer outcomes.21,22


When building a longitudinal database and planning for high-quality longitudinal research, PBRNs must consider both external conditions (ie, those related to the general setting) and internal conditions (ie, those related to the network itself).

External Conditions

External conditions that are favorable for longitudinal research include stability of the population and continuity of care. In principle, every family practice database holds longitudinal data from patients, but the validity of those data is determined by how long patients remain in the practice. As previously noted, the population served by the Nijmegen academic family practice research network has remained very stable over time. The Dutch health care system, with its high level of continuity of care between patients and FPs, offers more favorable conditions than the US system for longitudinal research.

Internal Conditions

In planning a PBRN for longitudinal data collection, it is logical to choose a setting that offers optimal external conditions for such research, but the internal conditions that the PBRN can create and control are as important. First among these conditions is to secure the ongoing commitment of FPs and their staff to longitudinal data collection. Ownership of data is crucial to such a commitment. In addition, success breeds success, and longitudinal data collection should be encouraged as an extension of successful PBRN activities. Other conditions that PBRNs can control include the following:

  • Standardization of data between FPs and between practices, and over time
  • Integration of data collection for research with that for patient care in an FP-friendly manner
  • Training of FPs, other physicians, and staff in the use of classifications and the rules for data collection
  • Provision of meetings for FPs and other physicians in the network to discuss and compare data collection, and to give feedback from the collected data
  • Use of standard procedures to promote adherence to data collection
  • Provision of facilities for regular measurements of patients’ health status or chart reviews
  • Use of mechanisms for tracking patients who leave the practice area


Approaches When Working With Existing Databases

In the examples given above of longitudinal studies conducted with a database, patients were enrolled because of a defined health event in their medical past—in example 1, a first episode of depression; in example 2, a diagnosis of diabetes mellitus—and from that event onward, a sequence of health events (diagnoses) was constructed. This design is called a historic cohort study. Although all events studied occurred and were recorded before the time of study, it is important to emphasize that the recording was done prospectively. In addition, FPs who performed the recording did so without knowledge of later studies that would use the data. In these respects, a historic cohort study differs essentially from a retrospective study.

In example 2, the incidence of cardiovascular complications in patients with diabetes mellitus (cases) was compared with that in matched nondiabetic patients (controls). In this way, a case-control approach was built into the historic cohort study. In example 3, the outcomes of care among a cohort of patients with diabetes mellitus (assessed from their current health status) were compared with those from external sources. This study is an example of outcomes research.

When working with an existing database, these 3 approaches—historic cohort studies, case-control studies, and outcomes research—are the ones most commonly used for longitudinal research.

Alternate Approaches

An alternate approach to longitudinal research is to use a randomized controlled trial (RCT) or other interventional study as the starting point for longitudinal observation. For example, the 1986 extension of the Nijmegen database with follow-up data on chronic diseases was based on the follow-up of an RCT of cardiovascular prevention.23 Investigators must keep in mind when using this approach, however, is the informed consent of patients and practitioners, which is usually given for a study that ends after a finite period. Data collected from that time forward will as a rule relate to the patients’ courses under usual care, as it is only occasionally possible to continue the experimental study conditions for a longer period. Like longitudinal studies that use existing databases, these studies must also meet the conditions of stability of the population and rigorous collection of follow-up data.


Unbiased Observation

As the emphasis in longitudinal research is on descriptive studies, investigators should take into account the methodologic limitations of this research, such as difficulty in achieving unbiased observation in some cases. Particularly when studying outcomes of care, confounding by clinical indication interferes with unbiased observation. This phenomenon has recently been analyzed in depth in the case of hormone replacement therapy (HRT).24,25 Cohort analyses documented reduced rates of cardiovascular events among HRT users, but RCTs later demonstrated elevated rates of such events in this group. The likely explanation for this contradiction was that practitioners had suspected that HRT might have cardiovascular adverse effects and therefore restricted use of this therapy to women with low cardiovascular risk—an example of confounding by clinical indication.

Several strategies can be used to minimize bias in longitudinal studies. One strategy would be to include all patients with the problem being studied in the practice database—the full cohort. This is the strength of the studies described in examples 1 and 2. In example 3, however, which describes a study that excluded diabetic patients who died during the observation period, the inclusion of these patients might have yielded different study results. For this group in particular, tight metabolic and risk-factor control would have been important for optimizing outcomes, but also least likely to have been achieved—a fact that might have contributed to these patients’ deaths. Another strategy would be to include, in addition to the standard social and demographic data of patients, detailed clinical background data such as comorbidities and cotreatments, risk factors, or family medical history. These are the thick and rich descriptive data26 that family practice databases can provide.

Influence of Quality of Care

The ultimate goal of investing in the research infrastructure of PBRNs is to optimize patient care. But long-term analysis of the course of disease usually shows its course under routine clinical care and, in this way, quality of care influences the research. Studying the illnesses and diseases of patients over time requires optimal or at least consistent patient care, just as research requires high-quality and consistent data. This issue is particularly of concern in longitudinal research because deviations from classification criteria or care protocols, or selective participation and dropout accumulate over time, and even when these events occur at a modest rate, their cumulative effect can be substantial.

Generalizability of Research Results

Participation of FPs in clinical research is inevitably a process of self-selection; furthermore, the more strenuous the research efforts, the stronger this process of self-selection. Longitudinal research requires an ongoing commitment to research and, for that reason, PBRNs involved in this type of research in particular can be expected to represent a self-selected group of FPs.

Investigators must keep sight of the implications of self-selection for research findings. In study example 3, a study of outcomes of diabetes care, the focus of research was the performance of the FPs, for example, their adherence to protocols for care. In this case, self-selection will be a major issue, and the participating FPs will not represent FPs at large. But the focus of research in the studies in examples 1 and 2 of the prognosis of depression and diabetes, respectively, was patients and their health problems. Self-selection of FPs should be much less of a problem, as FPs care for unselected patients representing the local community population. For that reason, longitudinal data collection in PBRNs leads to generalizable findings for family medicine when the focus of research is on the unselected patient population. In other words, PBRNs that have been planned under conditions that favor continuity of care still represent family practice at large. This factor should encourage the discipline of family medicine in strategically planning longitudinal databases.


PBRNs provide a solid basis for ongoing research in family practice that can capitalize on FPs’ ongoing commitment to the care of their patients over time. Longitudinal analysis of the clinical course of disease is essential to support clinical decision making that is focused on the long-term perspective. Family practice records contain a rich multitude of patient-related clinical data collected over longer periods, which in principle can be a wealth of data for researchers. The historic cohort study makes it possible to bridge a substantial time frame in the follow-up of patients. Linking longitudinal databases to PBRNs is more efficient than creating new study cohorts, and the data collected in such databases are prospective in nature.

For longitudinal databases to be scientifically valid, it is important to invest in the scientific activities of PBRNs, such as training FPs and staff in data collection and use of classification systems. Self-selection of the FPs in PBRNs has considerable implications for research on quality of care, but is less influential for research on disease or clinical course over time. As long as PBRNs care for unselected populations, longitudinal studies will represent primary care at large, and this fact allows for strategically planning PBRNs for longitudinal research under conditions favoring continuity of care.

Internationally, family medicine has developed a comprehensive classification system on which data collection can be based. This system permits introduction of clinician-friendly scientific criteria in family practice, and there is strong evidence that it is possible and relevant to engage FPs in such a process. This supports the further development of a research culture in family medicine with a longitudinal perspective. It is the best basis on which to advocate for better research funding of studies that span long periods of time.


Conflict of interest: none reported


1. Wonca Europe. The European definition of general practice/family medicine. Available at: http://www.Wonca Europe 2002. Accessed March 10, 2004.
2. Martin JC, Avant RF, Bowman MA, et al. The future of family medicine: a collaborative project of the family medicine community. Ann Fam Med. 2004;2(Suppl 1):S3–S32. [PMC free article] [PubMed]
3. van Weel C, Rosser WW. Improving health care globally: a critical review of the necessity of family medicine research and recommendations to build research capacity. Ann Fam Med. 2004;2(Suppl 2):S5–S16. [PMC free article] [PubMed]
4. Nutting PA, Beasley JW, Werner JJ. Practice-based research networks answer primary care questions. JAMA. 1999;281:686–688. [PubMed]
5. van Weel C, Smith H, Beasley JW. Family practice research networks: experiences from 3 countries. J Fam Pract. 2000;49:938–943. [PubMed]
6. de Gaetano G. Low-dose aspirin and vitamin E in people at cardiovascular risk: a randomised trial in general practice. Collaborative Group of the Primary Prevention Project. Lancet. 2001;357:89–95. [PubMed]
7. de Grauw WJ, van de Lisdonk EH, van den Hoogen HJ, van Weel C. Cardiovascular morbidity and mortality in type 2 diabetic patients: a 22-year historical cohort study in Dutch general practice. Diabet Med. 1995;12:117–122. [PubMed]
8. de Grauw WJ, van Gerwen WH, van de Lisdonk EH, et al. Outcomes of audit-enhanced monitoring of patients with type 2 diabetes. J Fam Pract. 2002;51:459–464. [PubMed]
9. van Weel-Baumgarten E, van den Bosch W, van den Hoogen H, Zitman FG. Ten year follow-up of depression after diagnosis in general practice. Br J Gen Pract. 1998;48:1643–1646. [PMC free article] [PubMed]
10. Van Weel C. Validating long term morbidity recording. J Epidemiol Community Health. 1995;49(Suppl 1):29–32. [PMC free article] [PubMed]
11. van Weel-Baumgarten EM, van den Bosch WJ, van den Hoogen HJ, Zitman FG. The validity of the diagnosis of depression in general practice: is using criteria for diagnosis as a routine the answer? Br J Gen Pract. 2000;50:284–287. [PMC free article] [PubMed]
12. Wonca International Classification Committee. International Classification of Primary Care, ICPC-2. 2nd ed. Oxford, UK: Oxford University Press; 1998.
13. ICHPPC-2 Defined. Inclusion Criteria for the Use of the Rubrics of the International Classification of Health Problems in Primary Care. Oxford, UK: Oxford University Press; 1983.
14. Huygen FJA. Family Medicine: The Medical Life History of Families. New York, NY: Brunner Mazel; 1982.
15. van Weel C, van den Bosch WJ, van den Hoogen HJ, Smits AJ. Development of respiratory illness in childhood—a longitudinal study in general practice. J R Coll Gen Pract. 1987;37:404–408. [PMC free article] [PubMed]
16. College of General Practitioners. A classification of disease. J Coll Gen Pract. 1959;2:140–159. [PMC free article] [PubMed]
17. van Weel-Baumgarten EM, van den Bosch WJ, van den Hoogen HJ, Zitman FG. The long-term perspective: a study of psychopathology and health status of patients with a history of depression more than 15 years after the first episode. Gen Hosp Psychiatry. 2000;22:399–404. [PubMed]
18. World Health Organization, Expert Committee on Diabetes Mellitus. Technical Report No. 727. Geneva, Switzerland: World Health Organization; 1985.
19. Rutten GEHM, Verhoeven S, Heine RJ, et al. NHG-standaard diabetes mellitus type 2 (eerste herziening). Huisarts Wet. 1999;42:67–84.
20. Intensive blood-glucose control with sulphonylureas or insulin compared with conventional treatment and risk of complications in patients with type 2 diabetes (UKPDS 33). UK Prospective Diabetes Study (UKPDS) Group. Lancet. 1998;352:837–853. [PubMed]
21. Tacken M. De diabetespatiënt bij de huisarts. Huisarts Wet. 2002;45:509.
22. de Grauw W. De diabetespatiënt bij de huisarts [letter to the editor]. Huisarts Wet. 2003;46:44.
23. Bakx JC, Van den Hoogen HJ, Van den Bosch WJ, Thien T, van Weel C. Cardiovascular risk factors and disease in general practice: results of the Nijmegen Cohort Study. Br J Gen Pract. 2002;52:135–137. [PMC free article] [PubMed]
24. Lawlor DA, Davey Smith G, Ebrahim S. Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology? Int J Epidemiol. 2004;33:464–467. [PubMed]
25. Vandenbroucke JP. Commentary: the HRT story: vindication of old epidemiological theory. Int J Epidemiol. 2004;33:456–457. [PubMed]
26. Herbert CP. Future of research in family medicine: where to from here? Ann Fam Med. 2004;2(Suppl 2):S60–S64. [PMC free article] [PubMed]

Articles from Annals of Family Medicine are provided here courtesy of American Academy of Family Physicians
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...