Creation of a linked cohort of children and their parents in a large, national electronic health record dataset

Abstract To examine which parental health care and health factors are most strongly associated with a child's receipt of recommended care we must be able to link children to their parents in electronic health record data. Yet, there is not an easy way to link these data. To identify a national cohort of children that link to at least one parent in the same electronic health record dataset and describe their demographics. Methodology to link parents and children in electronic health records and descriptive sociodemographic data. Children with at least one encounter with a primary care clinician between Januray 1, 2007 and December 12, 2018 to a community health center in the OCHIN national network. We identified parents of these children who also had at least one encounter to a community health center in the network using emergency contact and guarantor record fields. A total of 227,552 children had parents with a linkable patient record. After exclusions, our final cohort included 213,513 distinct children with either one or two parent-links. 82% of children linked to a mother only, 14% linked to a father only, and 4% linked to both a mother and a father. Most families consisted of only one linked child (61%). We were able to link 33% of children to a parent in electronic health record data from a large network of community health centers across the United States. Further analyses utilizing these linkages will allow examination of the multi-level factors that impact a child's receipt of recommended health care.


Introduction
Parental health insurance and health status are associated with their children's health insurance and receipt of health care. [1][2][3][4][5][6][7] For example, mothers with less than excellent health have increased odds of having a child with less than excellent health. [5] Maternal depression (compared to no depression) is associated with lower rates of well-child checks and recommended immunizations, and poor child health outcomes. [6] Paternal depression is associated with reports of psychological distress in children at multiple ages. [8] Additionally, adolescent daughters of mothers with a routine doctor visit in the previous 12 months are one and a half times more likely to also have a routine doctor visit in the last 12 months. [9] There is a causal link between parent and child health insurance coverage, [10,11] and a strong association between parental coverage and timely receipt of health care for children. For example, children of Medicaid-enrolled parents experienced a 29% increased probability of receiving an annual well-child visit. [12] Many of these previous studies used cross-sectional survey data to answer questions about the health care services received by parents and children. Electronic health record (EHR) data provide a unique opportunity to assess longitudinal connections between parents' and children's health care and health outcomes without self-report biases, especially for tracking longitudinal The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. utilization, receipt of recommended care, and health status over time. EHR data also have great potential to facilitate a better understanding of multi-level influences (ie, individual, parent, family, and community-level) on children's receipt of recommended health care. Yet, linking children to their parents within EHR data is challenging. As most EHR datasets do not create direct linkages for family members, there is a need to develop methodologies to do so. Here, we applied a previously validated algorithm, [13] which utilized EHR data to link children to parents in Oregon, to a large multi-state network of community health center (CHC) patients that has a centralized instance of the Epic© EHR hosted and maintained by a non-profit health information technology organization. CHCs function as the nation's health care safety net, providing health care to adults and children regardless of health insurance status. [14] We describe how we applied the previous algorithm to identify a national cohort of children that link to at least one parent in the same clinical dataset and present the demographics of the cohort (Fig. 1).

Methods
We used EHR data from OCHIN's national network of CHCs serving >2 million medically underserved patients from 18 states to link parents and children. [15,16] The patients seen in these CHCs are representative of patients seen in CHCs nationally. [14] All included clinics provide comprehensive, coordinated primary care to patients of all ages. Each patient has only 1 patient identification (ID) number. After linking parent and child, we used sociodemographic data from the Accelerating Data Value Across a National Community Health Center (ADVANCE) clinical research network (CRN), a member of PCORnet. OCHIN leads the ADVANCE CRN and selected EHR data is contained within its data warehouse.
We applied our previously validated algorithm to all OCHIN community network CHCs. [13] We included information from all children (aged 0-17) with at least 1 encounter with a primary care clinician from January 1, 2007 to December 12, 2018 (n = 649,894). We then used emergency contact information and guarantor records to identify children with parents who also had a patient record (n = 227,552). Parents also had to have at least one encounter at a networked CHC, which would give them a patient ID within the same linked CHC dataset for inclusion in the linkages.
For emergency contact we used an already-populated data point that specified whom the clinic should call if there was an urgent need. If the emergency contact was also a patient within the network, they had a patient ID. Attached to these contacts are relationship designations that include "mother" and "father," Children with at least one visit with a primary care clinician during the study period n= 649,894 Children with at least one 'parent' with a patient identification number documented by emergency contact or guarantor field n= 227,552 Children who had a parent that was 12-55 years older than the child n= 225,228 Children who had a parent with at least one visit during the study period n= 217,688 Children excluded due to not having at least one parent with a patient identification number through either emergency contacts or guarantor field n= 422,342 Children excluded due to applied age restriction between parent and child n= 2,324 Children excluded due to parents having no record of at least one visit during the study period n= 7,540 Children n= 217,538 Children excluded due to relationship discrepancies n= 150 Children and parent with no discrepancy of listed sex n= 213,659 Children excluded due to sex discrepancies (parent or child) n= 3,879 Children excluded due to having more than two parent links n= 146 Distinct children with either one or two parent-links; all parents designated as a mother or father with conventional female/male designations n= 213,513 Of the 213,513 distinct children: 136,327 distinct parents 220,959 distinct child-parent pairs Guarantor field links n= 189,602 Emergency contact links n= 9,040 Both guarantor field and emergency contact links n= 22,317 among others. There are five different emergency contact fields: mother, father, emergency contact 1, emergency contact 2, and guardian. If emergency contact 1, emergency contact 2, or guardian were populated, they contained a description of the relationship. We included children in our cohort if their emergency contact was someone with a patient ID that was designated as a mother or a father. The emergency contact field does not allow duplicate mothers to be listed. We identified 9040 children that could be linked to a parent in the emergency contact field alone. In addition to emergency contact records we utilized guarantor records, which specify the individual responsible for payment (and a patient ID if they are also a patient within the network). These records also indicate the relationship of the guarantor to the patient. Many relationships are contained in the guarantor records, for example, aunt, brother, daughter, employer, grandmother, spouse, among others. The guarantor contact records can change over time and allow listing of up to four potential mothers and two potential fathers. To be included in our cohort, the guarantor record had to specify mother or father as a relationship; we excluded all other relationships. If a mother or father was specified as anything besides a mother or father in the additional guarantor fields, they were also excluded from our cohort. We identified 189,602 children that could be linked to a parent in the guarantor field alone. Some links were found in both the emergency contact and guarantor field (n = 22,317); none were discrepant.
From the linked emergency contacts and guarantor records we narrowed the children down to those who had parents who were 12 to 55 years older than the child (n = 225,228) in an attempt to prevent siblings from being captured as parents (per our previous algorithm). [13] We kept only children with parents who had a record of at least one encounter during the same time period listed above for the children (n = 217,688). At this point, we excluded 150 children with discrepancies in their EHR record ID (ie, they had >1 ID number). We also excluded another 3879 children who had themselves or whose parents had a discrepancy in sex, meaning the child was listed as both male and female in their records or the child was linked to a mother that was listed as male in the parent medical record, or vice versa. Lastly, we excluded 146 children because they were linked to >2 parents.
We performed descriptive statistics with sociodemographic data from the EHR, including sex, age at first encounter, preferred language, region of the country, payer and federal poverty level at first encounter (or first encounter where these data were available), number of parental linkages, time in the study, average number of encounters per year, chronic conditions listed on the problem list, and age at death, if applicable. We also describe several characteristics of all children.
As the links created thus far were from one child to one parent only, we created family units to understand how many children were linked to 2 parents versus a mother or a father only. To create these categories, we took the full list of distinct children in the final cohort and joined the child's patient ID to the parent(s) patient ID. By doing this we were able to determine the children that linked to a mother only, a father only, a mother and a father, two mothers, or two fathers. Here, parents may be linked to more than one distinct family unit.
This study was reviewed and approved by the Oregon Health & Science University Institutional Review Board (STUDY00019958) with a waiver of consent and authorization, as the research involves minimal risk, does not adversely affect the rights of subjects, and could not be practicably carried out without the waiver.

Results
Our final cohort included 213,513 distinct children with either one or two parent-links; all parents were designated as a mother or a father with conventional female/male designations. Linked to these distinct children, were 126,327 parents and 220,959 childparent pairs. 86% of links derived from the guarantor list (n = 189,602), 4% from emergency contacts (n = 9040), and 10% were found in both the guarantor list and emergency contacts (n = 22,317). (See Figure 1).
Half the children were female (50%), 44% reported Hispanic ethnicity, 27% reported white race, and 36% preferred a non-English language. Most children (97%) linked to one parent and had between 1 and 3 encounters per year (52%). Seventy-seven percent of children had Medicaid recorded at their first encounter (or the first encounter where these data were available). The majority of children did not have any chronic conditions (63%).
Of the parents, 114,395 were mothers and 21,932 were fathers. There were 42% of linked mothers with Hispanic ethnicity and 28% of linked fathers. 57% of mothers had Medicaid and 11% had private coverage, whereas 42% of fathers had Medicaid and 21% had private coverage recorded at their first encounter (or the first encounter where these data were available). 35% of mothers and 56% of fathers had between 1 and 3 encounters per year (see Table 1).
For all children, 30% preferred a non-English language, 14% had private health insurance, 69% had Medicaid coverage, and 63% lived in families earning 138% federal poverty level (FPL) (see Supplemental Table 1, Supplemental Digital Content, which describes the characteristics of both linked children and all children with a visit, http://links.lww.com/MD2/A317).
Eighty-two percent of children linked to a mother only, 14% linked to a father only, and 4% linked to both a mother and a father. Most families consisted of only 1 linked child (61%). Only 2% of children that linked within a family were designated as foster children (see Table 2).

Discussion
We applied our previous EHR algorithm (used in Oregon only) to a national network of CHCs. In our previous work using Oregon EHR data from 2002 to 2010, we were able to link 25% of children to a parent using emergency contacts and the guarantor field and validate these linkages. [13] In this cohort, we were able to link 33% of children to a parent from 2007 to 2018 using the same EHR-based data points. The demographics of children and their parents in our identified cohort are similar to the demographics of those seen in CHCs nationally: CHCs serve 1 in 3 people living in poverty, 1 in 5 uninsured persons, 1 in 5 Medicaid beneficiaries, and 1 in 7 racial and ethnic minorities (note these national numbers do not separate adults and children). [14] Thus, we believe this linkage will provide a useful cohort for future research on the parental factors that impact child health.
Previous studies using large datasets were able to identify some parental factors associated with child health care. For example, maternal lack of health insurance was associated with youth lack of health insurance, [17] Affordable Care Act Medicaid expansion was associated with increased odds of well-child care for families Angier et al. Medicine (2021) 100:32 www.md-journal.com  . Based on all available encounters, where mothers and fathers may have been seen before the birth of their first-born child. Only mothers and fathers that were at least 12 years older than their linked child(ren) met inclusion criteria. For example, a patient who was age 2 at a 2007 encounter and became a parent in 2017 is included in these data. making less than 99% FPL, [18] and parent mental illness was associated with potentially preventable child emergency department use. [19] We will expand on these studies by using this rich cohort of linked parent and child EHR data to longitudinally track measures in real-time and investigate the relationships among parental health status and health care utilization and a child's receipt of recommended services and health. Specifically, we plan to assess the association of parent preventive care on receipt of well-child visits. We also intend to study the impact of other important parental factors, like language concordance, chronic disease diagnoses and treatment, and mental health care on their child's receipt of health care and health. Analyses utilizing this cohort will allow examination of the multi-level factors that impact a child's receipt of recommended health care. Once factors are identified, we can make recommendations to inform health policy and primary care practice for improved child health. These methods have some limitations and methodological issues. We were only able to include children and parents with an encounter in one of the CHCs in the network. We included parents who were 12 to 55 years older than the child, which could have excluded potential parents. We were unable to detect whether the discrepant sex information for child or parent was due to transgender individuals, thus minimizing our ability to include transgender parents (or children). We excluded other relationships that may act similarly to a parent relationship, for example, a grandparent who is raising their grandchild, or a guardian acting as a child's parent. More work is needed to identify broader family definitions using EHR data. We chose a narrow definition of family to include children, mothers, and fathers only, and these exclusions may have removed some child and caregiver pairings.

Conclusion
We were able to link 33% of children to a parent in EHR data from a large network of community health centers across the United States. Future analyses utilizing these linkages will allow examination of the multi-level factors that impact a child's receipt of recommended health care and health.