7Role of Gene-Environment Interactions in Preterm Birth

Publication Details


Until recently, the role of genetic susceptibility and gene-environment interactions in preterm birth has largely been unexplored. Growing evidence indicates that familial or intergenerational factors influence preterm birth. This influence may reflect shared environmental factors or genetic factors, or both. With recent advances in human genetics and molecular biology, assessment of genetic contributions to human diseases has progressed significantly, but the number of studies in this area is limited. These genetic studies have been mostly association studies, not corrected for population admixture using state of the art methods, and based on small sample sizes. Most sophisticated methodology and technology in genomics and proteomics has not been applied to preterm birth research. Epigenetics (the study of how gene regulatory information that is not expressed in DNA sequences is transmitted from one generation to the next) and proteomics (identification of the expression of proteins within biological fluid, tissue, or cells at a certain point in time under conditions of health or disease) have the potential to provide a greater understanding of the pathways to preterm birth but have not been adequately investigated. There is considerable room for improvement in the search for new biomarkers predicting preterm birth. While there are compelling reasons to examine gene-environment interactions, there are a limited number of published studies. Those those that have been con ducted suggest that individual genotypes may modify the risk of preterm birth associated with certain enviornmental exposures. Racial and ethnic differences in preterm birth have been discussed extensively throughout this report. This question remains largely unanswered. New tools for high-throughput genotyping, coupled with very-large-scale population-based studies that use sensitive biomarkers, comprehensive exposure assessment, and advanced biotechnology and analytical strategies, are needed to unravel the complex environmental and genetic factors, and gene-gene, and gene-environment interactions responsible for preterm birth. Understanding these factors and their interactions could lead to major improvements in the diagnosis, prevention, and treatment of preterm birth.

The completion of the first draft of the human genome sequence (Lander et al., 2001) and increasing information about the genome’s function have provided new opportunities for the investigation of human health and disease. Likewise, results from the exploration of human genetic variation through the International HapMap Project, spearheaded by The National Human Genome Research Institute (The International HapMap Consortium, 2003), will furnish researchers with a powerful tool for identifying variants that contribute to common diseases. This information will be especially useful when it is combined with reliable, cost-effective, high-through-put methods that can be used to genotype these variants in large population samples (Shi, 2002).

In parallel, there is a growing recognition that changes in the earth’s environment, in combination with genetic susceptibility, may contribute to many chronic diseases and may hold the key to reversing the course of some diseases (Chakravarti and Little, 2003). The improved methods for measuring nongenetic factors and environmental exposures promise to extend the scope of epidemiological investigation (Weaver et al., 1998).

Together, these developments present an exciting opportunity to address unanswered questions related to the complex contributions of genes, the environment, and gene-gene and gene-environment interactions to complex human diseases, including preterm birth. This chapter provides a review of recent progress in understanding the genetics of preterm birth, summarizes important methodological issues, and highlights areas for future research.


The available literature has provided some evidence of familial and intergenerational influences on low birth weight or preterm birth (Bakketeig et al., 1979; Carr-Hill and Hall, 1985; Khoury and Cohen, 1987; Porter et al., 1997; Varner and Esplin, 2005). A population-based cohort study of data from birth certificates and fetal death certificates from the state of Georgia between 1980 and 1995 suggest that the recurrence of preterm delivery contributes to a notable portion of all preterm births, especially for the shortest gestations (Adams et al., 2000). Analysis of the data from the live birth cohort of the 1988 U.S. National Maternal and Infant Health Survey demonstrated a strong familial aggregation of low birth weight and preterm birth in both white and African American populations (Wang et al., 1995).

Familial and intergenerational influences on preterm birth may be attributable to shared environmental factors or genetic factors, or both. Studies with twins are a powerful approach to detecting the environmental and genetic components of a given disease or trait, but very few studies of preterm birth in human twins have been performed. The heritability of preterm birth was found to be 17 to 27 percent in an Australian population (Treloar et al., 2000), the heritability of gestational length was found to be 25 to 40 percent in a Swedish population (Clausson et al., 2000). The limited number of twin studies was in part due to the difficulty of assembling such a study population (i.e., female twins and their babies). The large registries of twins have the potential of probing not only concordance of preterm birth among female twins, but also the contributions of males and intergenerational effects.

As detailed below, with recent advances in human genetics and molecular biology, assessment of the contributions of genetics to human diseases has progressed from indirect measurements based on family history to direct measures of an individual’s genotype (genome sequence) at particular gene loci. Nevertheless, it should be emphasized that family history and a woman’s past medical history remain valuable tools in assessments of a woman’s risk for preterm birth.

Genetic Association Studies

Disorders found to be associated with changes in the sequence of a single gene have been associated with an increased risk of preterm birth, often as a result of a predisposition to polyhydramnios in pregnancies with fetuses with changes in the sequence of that single gene. Among these conditions are myotonic dystrophy, Ehlers-Danlos syndrome, Smith-Lemli-Opitz syndrome, and neurofibromatosis. However, like many other com plex human diseases, such as obesity, hypertension, diabetes, and asthma, preterm birth is a complex trait and possesses the following features: non-Mendelian transmission, the involvement of multiple genes, and gene-gene and gene-environment interactions. Research on the genetics of preterm birth thus faces significant challenges. The approaches available for the identification of genes that may be associated with a particular trait include positional cloning, the identification of positional candidate genes, whole-genome association analysis, and functional candidate gene analysis. Positional cloning requires extended pedigrees or sibling pairs. It has been successful for the analysis of disease transmitted by Mendelian genetics but has not been so successful for the study of more complex diseases and conditions. The identification of positional candidate genes requires linkage information, which is not possible for the genetic analysis of preterm birth. Scanning of the whole genome requires the identification of more than 100,000 single-nucleotide polymorphism (SNPs), which are single-base-pair substitutions in the DNA sequence; but this procedure is costly. The functional candidate gene approach, that is, the study of carefully selected candidate genes associated with major pathogenic pathways of preterm birth, is feasible and is commonly used. Relevant genetic association studies are summarized below.

One characteristic of the human genome with medical and social relevance is that each person’s genome, except those of monozygotic twins, is unique. A persons’ genotype represents the blending of parental genotypes. In addition, the human genome undergoes natural mutation. On average, the DNA sequences of two unrelated humans vary by millions of bases. Nearly all human genes are capable of causing disease if they are altered substantially. Mutations known to cause disease have been identified in about 1,000 genes. About 90 percent of all DNA sequence variations occur as SNPs (Brookes, 1999). The human genome contains about 10 million SNPs. A haplotype, on the other hand, represents a considerably longer sequence of nucleotides (an average of 25,000 nucleotides, which are the building blocks of DNA and genes), as well as any variants, that tend to be inherited together. Analysis of both SNPs and haplotypes is thus necessary to identify the genetic factors associated with complex diseases and syndromes, including preterm birth.

Two organizations have focused on the analysis of SNPs to identify the genes that may be associated with particular diseases and syndromes. The SNP Consortium Ltd. is a nonprofit foundation whose mission was to identify up to 300,000 SNPs distributed evenly throughout the human genome and to make the information related to those SNPs available to the public without intellectual property restrictions. Eventually, however, the SNP Consortium Ltd. discovered many more SNPs (1.5 million in total) than it had originally planned. The National Institute of Environmental Health Sciences initiated the Environmental Genome Project (EGP) in 1998 to identify polymorphisms in the genes involved in environment-induced diseases (Olden and Wilson, 2000). In addition to the identification of polymorphisms, EGP aims to characterize the functions of these polymorphisms and supports epidemiological studies of gene-environment interactions.

Studies Involving One or a Few Candidate Genes

To date, most published studies on the genetics of preterm birth have examined only one or a few genes in a given study sample. The frequent association of spontaneous preterm labor and preterm birth with histological infection-inflammation and elevated concentrations of inflammatory cytokines in body fluids has focused investigations on single gene polymorphisms in the genes for these cytokines in both the mother and the fetus (Varner and Esplin, 2005), as it has been well established that upper genital tract infections and inflammation are associated with spontaneous preterm labor and preterm birth (Goldenberg and Andrews, 1996). The polymorphisms examined include those in the genes for the cytokines tumor necrosis factor alpha (TNF-α) nucleotide 308 (Dizon-Townson et al., 1997; Roberts et al., 1999), interleukin-1β (IL-1β) nucleotides 3953 and 3954 (Genc et al., 2002), and IL-6 nucleotide 174 (Jamie et al., 2005; Simhan et al., 2003); but the findings of an association of polymorphisms in these genes and preterm birth have been inconsistent.

Other studies have examined the roles of SNPs in preterm labor and preterm birth. Toll-like receptors, which are important components of the innate immune system, have been linked to spontaneous preterm labor and preterm birth (Lorenz et al., 2002). Gene polymorphisms in matrix metalloproteineases (MMPs) and preterm premature rupture of membranes (PPROM) were examined in African Americans. The breakdown of the interstitial collagens is mediated by MMPs. A fetal genotype of a mutation in the gene for matrix metalloproteinease type 1 (MMP-1) was found in association with PPROM (Fujimoto et al., 2002). Three SNPs located at positions –799 C to T), –381 (A to G), and +17 (C to G) (where C, T, A, and G represent the nucleotides cytosine, thymine, adenine, and guanine, respectively) from the major transcription start site in the MMP-8 gene have been identified; and the functional significance of SNP haplotypes in the MMP-8 gene and associations with PPROM has been demonstrated (Wang H et al., 2004). MMP-8 is an enzyme that degrades fibrillar collagens and that imparts strength to the fetal membranes; it is expressed by leukocytes and chorionic cytotrophoblast cells. There are cell host-dependent differences in MMP-9 promoter activity related to CA-repeat number and fetal carriage of the 14 CA-repeat allele is associated with PPROM in African Americans (Ferrand et al., 2002). Finally, a study (Ozkur et al., 2002) found that a mutation in the β2-adrenergic receptor, the predominant β-adrenergic receptor subtype that relaxes myometrial muscle fibers at term (Liu et al., 1998), changed the amino acid glutamic acid to glutamine at codon 27 and is associated with preterm labor.

Multiple Candidate Gene Study

Investigators generally agree that the “one gene, one risk factor” approach to understanding the etiology of complex human diseases will not likely yield great progress in understanding the causes of human diseases and syndromes, and as mentioned earlier in this report, preterm birth is increasingly recognized as a syndrome with multiple etiologies. Therefore, because of the heterogeneous nature of preterm birth, it is necessary to study a large number of candidate genes to better understand genetic influences on preterm birth. However, the numbers of published studies of this kind are limited. One study simultaneously investigated the relationships of polymorphisms in six cytokine genes associated with inflammation (IL-1α, IL-1β, IL-2, IL-6, TNF-α, and lymphotoxin alpha [LTA]) with spontaneous preterm birth and the birth of infants who are small for gestational age in a nested case-control study with women from a prospective pregnancy cohort (Engel et al., 2005a). Two haplotypes spanning the TNF-α and LTA-α genes were associated with an increased risk for spontaneous preterm birth in white subjects (for the AGG haplotype, odds ratio [OR] 1.5; 95% confidence interval [CI] 0.8–2.6; for the GAC haplotype, OR 1.6; 95% CI 0.9– 2.9). Additionally, carriers of the GAG haplotype were found to have a decreased risk of spontaneous preterm birth (OR 0.6; 95% CI 0.3–1.0). The TNF-α and LTA variants TNF-α(–488)A and LTA(IVS1-82)C, constituents of the AGG and GAC haplotypes, respectively, were also strongly associated with an increased risk of spontaneous preterm birth.

A large-scale case-control study explored the associations of 426 SNPs with preterm birth in 300 mothers with preterm deliveries (cases) and 458 mothers with term deliveries (controls) (Hao et al., 2004). Twenty-five candidate genes were included in the final haplotype analysis, and a significant association of the Factor V (F5) gene haplotype with preterm birth was revealed and remained significant after Bonferroni correction for multiple testing (p = 0.025). That study also performed exploratory ethnicity-specific analyses, which confirmed the findings that the association of the F5 gene haplotype with preterm birth is consistent across ethnic groups.

Until now, the discovery of genes found to be associated with preterm birth has been limited to studies of candidate genes. Although it is not possible for this report to cover all candidate genes, a list of potential candidate genes affecting preterm birth is provided in Table 7-1. On the other hand, the availability of SNP gene microarrays and high-throughput genotyping technologies makes it possible to conduct studies of the association of genes with preterm birth by use of the entire genome. Gene microarrays consist of microscope-sized slides on which the sequences of thousands of genes can be placed. Hybridization of those sequences with the gene sequence to be tested can be used to determine the genotypes of the test gene. Recently, a 500,000-SNP microarray became available. In this way, the discovery of genes associated with preterm birth will not be limited to known or suspected candidate genes. Although these technologies are costly, they represent means for the systematic identification of the genes associated with preterm birth.

TABLE 7-1. Potential Candidate Genes for Preterm Births.


Potential Candidate Genes for Preterm Births.


The goal of gene-environment studies in epidemiology is to advance knowledge of how genetic and environmental factors combine to affect the risk of disease and, more specifically, how the variations in the human genome (polymorphisms) can modify the effects of exposures to environmental health hazards (Kelada et al., 2003). There are compelling reasons to examine the association of gene-environment interactions with preterm birth. The data presented in the previous chapters and in this chapter suggested that both socioenvironmental factors and genetic factors may influence preterm birth. Given individual genetic variations and differential environmental exposures, stratification of study subjects by genotype may allow the detection of risk of preterm birth among individuals exposed to a particular environmental toxicant (Rothman et al., 2001). Furthermore, enhanced understanding of pathologic mechanisms may allow the development of drugs or interventions that can be used to prevent or treat preterm birth. To date, however, only a relatively few studies on the association of gene-environment interactions with preterm birth have been published (Genc et al., 2004; Macones et al., 2004; Nukui et al., 2004; Wang et al., 2000, 2002). Two of these studies are described below.

Gene-Genital Tract Infection Interaction

Given the association between genital tract infections such as bacterial vaginosis (BV) and preterm birth, a case-control study of 375 women examined the interactions among BV, the TNF-α genotype, and preterm birth (Macones et al., 2004). Maternal carriers of the rarer allele (TNF-α-2) were found to be at a significantly increased risk of spontaneous preterm birth (OR 2.7; 95% CI 1.7–4.5). The association between carriage of the TNF-α-2 allele and preterm birth was found to be modified by the presence of BV, such that those with a genotype that made them susceptible to preterm birth and BV had an increased odds of preterm birth compared with the odds for those who did not (OR 6.1; 95% CI 1.9–21.0). The study thus provides evidence that an interaction between genetic susceptibility (i.e., carriage of TNF-α-2) and an environmental factor (i.e., BV) is associated with an increased risk of spontaneous preterm birth.

Gene-Smoking Interaction

In the United States, about 13 percent of all pregnant women smoke cigarettes, which is a recognized risk factor for preterm birth. A study of 741 U.S. mothers investigated whether maternal genotypes can modify the association between maternal cigarette smoking and infant birth weight, gestational age, and intrauterine growth retardation (Wang et al., 2002). The study found that without consideration of genotype, the OR of preterm birth in association with maternal smoking was 1.8. When the mothers were stratified by their CYP1A1 genotypes, the mothers with variant genotypes had a higher risk of preterm birth. Similarly, when the mothers were stratified by their GSTT1 genotypes, the mothers with variant genotypes had a higher risk of preterm birth. More strikingly, the mothers with both CYP1A1 and GSTT1 variant genotypes had the highest risk of preterm birth (OR = greater than 10). This study provides additional evidence that individual genotypes may modify the risk of preterm birth in association with an environmental exposure.


DNA is not freely floating within the cell cytoplasm or nucleus; it is organized with proteins called histones to form a complex substance known as chromatin (see Box 7-1 for definition of terms). Biostructural modifications to the DNA or the histones alter chromatin without changing the actual nucleotide sequence of the DNA. These modifications are described as epigenetic. The two main sources of epigenetic modification are DNA methylation and histone deacetylation (Haig, 2004). DNA methylation is a chemical modification of the DNA proper by an enzyme known as DNA methyltransferase. Methylation can directly switch off gene expression by preventing transcription factors from binding to promoters. However, a more general effect of methylation is the attraction of methyl-binding domain proteins. Methyl-binding domain proteins can activate histone deacetylases, which function to chemically modify histones and change chromatin structure. Chromatin containing acetylated histones is open and accessible to transcription factors, and, thus, the genes contained within that chromatin are potentially active. Histone deacetylation causes the condensation of the chromatin. When chromatin is condensed, the genes therein are unable to be expressed. In this manner, genes may be “switched off” or silenced (Haig, 2004; Henikoff et al., 2004). Thus, broadly considered, epigenetic changes to DNA or to histones can alter the expression of genes within the genome. Even in the setting of identical genetics, differences in epigenotype may account for important phenotype differences. For example, epigenetic differences may account for disease discordance among monozygotic twins (Wong et al., 2005). Epigenetics has been hypothesized to play a major role in human health and disease in a wide variety of areas, from psychoneurodevelopment (Abdolmaleky et al., 2005; Hong et al., 2005) to cancer (Laird, 2005) to heart disease (Muskiet, 2005). Epigenetic modification of genes may be influenced by environmental exposure, such as nutritional micronutrients. Folate, biotin, niacin, and tryptophan may all influence gene silencing (Oommen et al., 2005).

Box Icon

BOX 7-1

Definition of Terms in Epigenetics. Chromatin—DNA organized by histones DNA methyltransferase—The enzyme responsible for methylating DNA

The pattern of epigenetic modifications of a genome may be termed an epigenotype (Jiang et al., 2004). Epigenotypes are, by definition, more plastic than genotypes, and are highly context dependent; that is, epigenotypes vary between cells within the same organism and are modifiable by the environment in critical windows of exposure (Henikoff et al., 2004; Jiang et al., 2004; Wang Y et al., 2004). Extreme examples of pregnancy disorders related to epigenetics are choriocarcinoma and hydatidiform moles (Xue et al., 2004). However, the consequences of epigenetic influences on pregnancy course may be much more subtle. Van Dijk and coworkers (2005) noted that epigenetic modification of the STOX1 gene might be of importance in preeclampsia. While there are no data regarding the possibility of epigenetic influences on spontaneous preterm parturition, it is important to recognize the possible influence that epigenotype may have gene expression and, thus, on the functional consequences that it may have on the length of gestation.


Despite the many advantages and advances in knowledge attributable to genomics and microarray analysis, these approaches have several limitations. Although the human genome contains approximately 30,000 genes, many more messenger RNA transcripts potentially coding for different proteins exist because of the alternate splicing of genes. Depending on codon bias, there is only a limited relationship between the expression of a gene and the amount of protein expression directed by that gene. The expression or function of proteins is modulated at many points from transcription to posttranslation, and protein expression or function cannot be reliably predicted merely by analysis of the nucleic acid sequences. Extensive posttranslational protein modification (e.g., phosphorylation, methylation, and compartmentalization) may occur and may dramatically alter the function of a protein. Because of the wide variety of posttranslational modifications, it is estimated that as many as 1 million distinct proteins derived from the 30,000 genes in the human genome may exist. This has important implications in the understanding of biological mechanisms and pathways, as well as in the development of disease-specific biomarkers, because the flow of information between cells and tissues is mediated by protein-protein interactions in both health and disease.

Recent advances in protein chemistry and the identification of peptide fragments by two-dimensional gel electrophoresis and mass spectrometric analysis have led to the emerging field of proteomics (McDonald and Yates, 2002). Proteomics refers to the identification of the global expression of proteins within a biological system (biological fluid, tissue, or cell) at a certain point in time under given conditions of health or disease. Because the expression and concentrations of many proteins depend on complex regulatory systems, the proteome, unlike the genome, is highly dynamic. The dynamic nature of the proteome complements genomics both in providing an understanding of pathophysiological processes such as preterm birth and in the discovery of protein biomarkers that may be uniquely associated with certain conditions and therefore useful as diagnostic biomarkers.

Proteomic strategies are based on the description of protein expression or protein function. Expression proteomics involves the identification or cataloguing of all proteins present within a biological system under given conditions. The differential expression of some proteins can link dynamic changes in protein expression to physiological conditions or disease states. Thus, expression profiling is uniquely suited to the identification of potential diagnostic biomarkers or to the description of biological changes that occur under certain conditions (e.g., labor). Functional proteomics places these proteins within their proper context by mapping their intracellular localization and their interactions with other proteins. Both genomics and proteomics allow for a comprehensive evaluation of proteins or messenger RNA transcripts that may provide valuable insights into the complex etiologies of preterm birth, and may facilitate potential biomarker for preterm birth (see Shankar et al., 2005 for review).

To date scientists in reproductive medicine have not vigorously used the field of proteomics in reproductive medicine. No research reported thus far has used a proteomics approach in the study of preterm labor or preterm birth. Several reports, however, have emphasized the potential importance of proteomics in pregnancy-related research (Page et al., 2002; Shankar et al., 2005); and others have addressed factors that are directly relevant to preterm birth, including implantation (Daikoku et al., 2005), preeclampsia (Koy et al., 2005; Myers et al., 2004; Sawicki et al., 2003), premature rupture of fetal membranes (Vuadens et al., 2003), and intra-amniotic fluid infection (Buhimschi et al., 2005; Gravett et al., 2004). For example, novel biomarkers discovered by proteomic profiling of amniotic fluid, including defensins, calgranulins, and specific proteolytic fragments of insulin growth factor binding protein-1, have recently been identified in intra-amniotic fluid infection, an important and potentially preventable cause of preterm birth. Two recent reports suggest that the detection of these peptides by proteomic analysis yields a sensitivity and a specificity in excess of 90 percent each for the detection of subclinical intra-amniotic fluid infection associated with preterm labor (Buhimschi et al., 2005; Gravett et al., 2004).

Proteomic profiling has also identified a unique protein expression pro file in women with severe preeclampsia that precedes the clinical onset of symptoms (Koy et al., 2005; Myers et al., 2004). An improved understanding of the early events in preeclampsia and the ability to provide an early diagnosis of this and other pregnancy-related conditions are necessary to develop rational and efficacious intervention strategies that may reduce the risks of preterm birth.

Two initiatives that may lead to significant contributions of proteomics to pregnancy-related research have been instituted. The Human Proteomics Organization (www.hupo.org) was organized in 2001 to facilitate proteomic research in humans. More recently, the National Institute of Child Health and Human Development initiated the Genomic and Proteomic Network for Premature Birth Research. The aim of this network is to accelerate the pace of research on preterm birth by focusing on global genomic and proteomic strategies and the dissemination of genomic and proteomic data to the scientific community. Specifically, the network will (1) design and implement hypothesis-driven, mechanistic studies based on large-scale, high-output genomic and proteomic approaches and (2) provide a public, web-based, genomic and proteomic database that the research community can use to mine and deposit data. It is anticipated that the creation of this network will hasten a deeper understanding of the pathophysiology of premature birth, discover novel target molecules and diagnostic biomarkers, and ultimately, aid in the formulation of more effective interventions for the prevention of preterm birth.


The significance of the racial and ethnic differences of human populations is frequently debated in clinical, epidemiological, and molecular research (Ioannidis et al., 2004). The undeniable evidence of health disparities between individuals of different races and ethnicities indicates that in some cases a correlation exists between race and health or disease. However, this relationship is complex and poorly understood. First, it is essential to point out that there are no generally agreed upon definitions of race. By and large, racial categories are social defined and are associated with certain social, cultural, educational, and economic dimensions. However, racial categories are also associated, to varying degrees, with genetic inferences of ancestry, the frequency of gene variants, and genetic effects (Bamshad, 2005). There is considerable controversy regarding the existence and importance of genetic influences on racial differences in complex diseases influenced by a large number of genes (Cooper et al., 2003).

Preterm birth is an example of a condition in which disparities among individuals of different races and ethnicities exist, with the largest and most persistent disparity occurring between Asian or Pacific Islanders and non-Hispanic black women, who have overall rates of preterm birth of 10.5 and 17.9 percent, respectively (CDC, 2005i). As discussed in Chapter 4, some evidence suggests that genetic factors may play a role in the disparities in preterm birth rates by race-ethnicity. However, the evidence by no means proves that genetic factors contribute to these disparities. The most direct way to study whether genetic factors vary among racial-ethnic groups is to find variants that influence susceptibility to the risk of preterm birth and then to assess whether these variants differ in frequency or effect across populations.

Allele Frequency

One possible reason for a genetic influence on racial disparities in preterm birth is that susceptibility variants may be present in one population but absent in others or may vary in frequencies across diverse populations. This may affect the number of individuals at increased risk for preterm birth. One obvious example is the unequal distribution of disease-associated alleles for certain recessive disorders, such as sickle cell disease or Tay-Sachs disease. One study examined a total of 179 African American women and 396 white women for the presence of functionally relevant allelic variants in cytokine genes (Hassan et al., 2003). African American women were found to be significantly more likely to carry allelic variants known to upregulate proinflammatory cytokines, and the ORs increased with the allele dose. The ORs for African American women compared with those for white women to have genotypes that up-regulate the proinflammatory cytokine variantss IL-1, IL1A-4845G/G, IL1A-889T/T, IL1B-3957C/C, and IL1B-511A/A ranged from 2.1 to 4.9. The proinflammatory cytokine genotype IL6-174G/G variant was 36.5 times (95% CI 8.8–151.9) more common among African American women than white women. The frequencies of genotypes known to down-regulate the antiinflammatory cytokine genotypes IL10-819T/T and IL10-1082A/A were elevated 3.5-fold (95% CI 1.8– 6.6) and 2.8-fold (95% CI 1.6–4.9), respectively, in African American women compared with those in white women. Except for the gamma interferon genotype, cytokine genotypes found to be more common in African American women were consistently those that up-regulate inflammation (Hassan et al., 2003; Ness et al., 2004).

Genetic Effects

Another possible reason for a genetic influence on racial-ethnic disparities in preterm birth rates is the variation in the effect of a given genetic variant between racial groups. However, data that can be used to either support or disprove this hypothesis are limited. One study examined the genetic effects of 43 validated gene-disease associations among 697 study populations of various descents (Ioannidis et al., 2004). The frequencies of the genetic marker of interest in the control populations often (in 58 percent of the studies) showed large heterogeneity (statistically significant variability) between people of different races. Conversely, large heterogeneity in the genetic effects (ORs) between races was found for only 14 percent of the studies. This finding suggests that the frequencies of genetic markers of complex diseases often vary among populations, but their biological effects may usually be consistent across traditional racial-ethnic boundaries.

Gene-Environment Interactions

Because the constellation of socioenvironmental variables known to affect the risk of preterm birth is not equitably distributed across racialethnic groups, the interaction of these factors with genetic predispositions may also produce highly disparate clinical outcomes. As discussed in the previous section, this area needs to be further investigated.

In summary, the question of whether genetics explains a substantial proportion of health disparities in preterm birth, is largely unanswered. It is anticipated that as populations increasingly become admixed (that is, as populations increasingly comprise couples of different ancestries), race and ethnicity will become even more inaccurate proxies for health risk. Without discounting self-identified race or ethnicity as a variable correlated with health, researchers must move beyond these weak and imperfect relationships. We need to understand not only what is downstream from race or ethnicity, but also upstream factors that explain how and why a racialethnic group’s disparity exists in health or disease. Such information may shed light on the pathways and mechanisms explaining why race-ethnicity are associated with health or disease. Furthermore, future genetic epidemiological studies need to employ advanced methodology in dealing with admixed populations (see section on Methodological Issues below).


Although multiple genetic markers have been identified to be potentially associated with preterm birth, preterm labor, or PPROM, none of the markers has been adequately validated as a cause of preterm birth in studies with various populations and no single marker appears to be highly sensitive or specific to preterm birth, preterm labor, or PPROM. Although many gene-environment interaction studies have been conducted with human populations in the past decade, the number of studies that have demonstrated important and consistent positive relationships between genes and the environment is remarkably small. Key methodological issues that need to be carefully addressed in future molecular genetic epidemiological studies of preterm birth are highlighted below.

Definition of Preterm Birth Phenotypes

Despite considerable research efforts, limited progress has been made in understanding the etiology of preterm birth. One important problem is the definition of the preterm birth phenotypes. The current approaches for defining and assessing preterm birth phenotypes are inadequate for etiological research and for making the optimal use of genomic data. Most previous studies have relied exclusively on the conventional definition of preterm birth (less than 37 weeks of gestation). Cases of preterm birth so defined, however, constitute a highly heterogeneous group; and even subgroups of preterm births, defined as very preterm births (less than 32 weeks of gestation), preterm labor, and preterm rupture of membranes, constitute heterogeneous groups. As such, standard genetic or epidemiological analysis may lack the power to detect the causative genes and the environmental risk factors because of the dilution effect.

One way to overcome this challenge is to divide preterm birth into more homogeneous subgroups according to the underlying pathogenic pathways to preterm birth. This will require the incorporation of detailed clinical information (for example, pregnancy complications), pathological examination of the placenta, as well as genetic or nongenetic markers to stratify preterm birth into pathogenically meaningful subgroups. Although division into more homogeneous subgroups increases the complexity of the scientific endeavor and requires much larger sample sizes, it helps to reduce genetic heterogeneity and to enhance the ability to identify genetic associations and the gene-environment interactions that are specific to pathogenic pathways of preterm birth. It also raises the possibility that the specific causes of preterm birth in a specific woman may be identifiable, preventable, or treatable. The availability of such information will be important so that interventions can be targeted to women and their infants with different underlying etiologies for preterm birth in the future.

Analytical Challenges

Analytically, in a simplistic case, a case-control study in which exposure and genotype are dichotomized, the conventional analysis of exposure and disease by use of a two-by-two table needs to be expanded to include genotype, which yields a two-by-four table. In this manner, the raw exposure and genotype data are displayed in such a way that relative risk estimates for each factor alone and their joint effect can be easily generated (Botto and Khoury, 2001). Regression models of interactions can also be used (Neter et al., 1996). However, the burgeoning volume of genetic data provides both unprecedented opportunities and unprecedented challenges for dissecting the genetics of preterm birth. It requires the development and application of innovative statistical methods, which will be further elaborated in the following section.

Testing of Multiple Genes

The testing of multiple genes is almost inevitable in large-scale studies of candidate genes that play a role in preterm birth. Traditional gene testing approaches study SNPs one at time, which ignores gene-gene interactions or the linkage disequilibrium among linked SNPs. Such an approach has an inherent low power, because the results of testing of multiple genes need to be adjusted by use of a Bonferroni correction to protect against an inflated Type I error. Many researchers argue that the multiplicity problems encountered in genetic epidemiology research require the use of a new paradigm to handle the problem. Haplotype analysis is advantageous, in that more information about variation in a gene can be captured (Nebert, 2002). Haplotype analysis has been applied in two recent studies of candidate gene that may play a role in preterm birth. In one study haplotypes were inferred by use of the EM algorithm and the Bayesian method (Engel et al., 2005b). In another study, both Gibbs sampling and expectation-maximization were used to reconstruct haplotype phases (Hao et al., 2004). These studies demonstrated the utility of the haplotype-based approach in a large-scale study of candidate genes that may play a role in preterm birth.

Methods are evolving to include adjustment for covariates in the analysis (Annells et al., 2004;a, Engel et al., 2005b; Hao et al., 2004; Schaid et al., 2002; Wang H et al., 2004). Although tests that incorporate haplotypes (especially within a haplotype block with limited haplotype diversity) are suggested to be more powerful than tests that incorporate only single markers, the block structure of haplotypes may not always be evident. It has been shown that under some circumstances the single-marker test is more powerful than the haplotype test. What remains unclear are the effects of haplotyping error because of the uncertainty of the inference drawn from the results of the derivation of the SNP block structure and subsequent association tests obtained with unrelated diploid subjects.

Population Admixture

A source of potential confounding in genetic tests is a hidden population genetic structure. The population sampled may consist of several genetically distinct subpopulations that are incompletely mixed. If those popu lations differ by both the prevalence of a variant allele at the candidate locus and the prevalence or magnitude of a trait, an apparent association between the allele and the trait may simply reflect confounding of the allele’s effect by subpopulation identity. Because exposure prevalences may also vary among genetically distinct subpopulations, exposure and gene-exposure interaction effects can also be biased by the subpopulation structure. For a diverse population such as that of the United States, admixture-induced bias may be relatively small for common variants (Wacholder et al., 2000). Concerns about the possible effects of population admixture have stimulated the development of family-based association tests, which essentially eliminate the potential bias from population stratification (Schaid and Sommer, 1993; Spielman et al., 1993) and gene-environment interactions (Schaid, 1999; Umbach, 2000).

In addition to conventional stratified analysis by maternal subgroups such as ethnicity, various methods were used to address population admixture, including the use of a within-population permutation procedure in the association analysis (Hao et al., 2004); use of ancestry-informative genetic markers to infer and control population admixture in genetic association study (Reiner et al., 2005); and admixture-matched cases and controls for genetic association study (Tasi et al., 2006).

The Role of Maternal Versus Fetal Genes

Children receive half of their alleles from their mother and half from their father. Diseases that develop during gestation may be influenced by the genotype of the mother and the inherited genotype of the embryo-fetus. Understanding of the separate, joint, and synergistic effects of the two relevant genotypes is important to obtaining an understanding of the etiology of the disease. Understanding of these effects may also allow recurrence risk counseling. However, given the correlation between maternal and offspring genotypes, the relative importance of these two interrelated risk factors (or of their interactions with exposures) may be difficult to assess by studies with the commonly used case-control or cohort designs (Umbach, 2000).

The two-step transmission disequilibrium test was the first family-based test proposed for the differentiation of maternal and offspring genetic effects (Mitchell, 1997). However, this approach, which requires data from “pents,” which comprise data for the affected child and the child’s mother, father, and maternal grandparents, provides biased tests for maternal genetic effects when the genotype of the offspring is associated with disease. An alternative approach based on transmissions from grandparents provides unbiased tests for maternal and offspring genetic effects but requires genotype information for the paternal grandparents, in addition to that for the pents (Mitchell and Weinberg, 2005).

Sample Size and Power

In studies of gene-environment interactions, it is difficult to obtain the correct sample size and power estimate, and there are no well-established methods for doing so. Efforts to study complex gene-environment interactions are also tempered by the difficulty of obtaining adequate sample sizes. Two primary factors to be considered are the prevalence of the polymorphism in the population and the magnitude of the effect modification detected, because there is a trade-off between the prevalence of a polymorphism and the magnitude of the effect that may be detected. On the one hand, common variants are less likely to exhibit a strong effect. On the other hand, there is more statistical power in studying these variants because they are more common. Furthermore, the population-attributable risk of common variants will be greater, even if the penetrance in the population is modest.

At present, there are two approaches to the study of gene polymorphisms and their effects: the analytical method and the simulation method. Analytical methods require knowledge of the underlying distributions and genetic models. The available methods usually handle one exposure and one genetic marker at a time. Use of this method is difficult in higher-order interactions. In comparison, simulation methods estimate power by simulated random sampling. This method is able to deal with higher-order interactions, but it is computationally intensive.

Data Management and Integration

With major advances in genotyping technologies, it has become practical and affordable to screen biological samples for multiple polymorphisms for which there is more or less knowledge about their functional relevance in relation to exposure to toxicological agents or disease. Investigators have had increasing interest in and discussions about the development of an integrated database that links new findings on exposures, etiologic pathways, relevant genes, the polymorphisms in those genes, and their functions.

This database would guide the design of new studies as well as data analysis and interpretation of results (De Roos et al., 2004). The National Institutes of Health has funded the Pharmacogenetics Research Network and Knowledge Base (PharmGKB; http://www.pharmgkb.org and http://www.nigms.nih.gov/funding/pharmacogenetics.html). PharmGKB will become a national resource containing high-quality structured data linking genomic information, molecular and cellular phenotype information, and clinical phenotype information (Klein et al., 2001).

Reporting and Replication of Results

Negative results should have a venue for publication, and an unbiased collection of all results will have considerable value when meta-analyses are conducted (Romero et al., 2002). There has been considerable concern about the lack of ability to replicate the findings of gene-disease association studies. “The literature is full of reports of genetic linkage or association that do not hold up under scientific scrutiny.” “Replication of findings remains a critical step to confirming the presence of such effects” (Vogler and Kozlowski, 2002).

Nevertheless, progress is being made in defining quality standards for genetic-epidemiological research. On the basis of the findings presented at the Human Genome Epidemiology workshop, a checklist for the reporting and appraisal of studies of the prevalence of genotypes and studies of genedisease associations was developed (Little et al., 2002). This checklist focuses on the selection of study subjects, the analytical validity of genotyping, population stratification, and statistical issues. Use of the checklist should facilitate the integration of evidence from genetic and epidemiological studies of preterm birth (Little et al., 2002). Ongoing evaluation is needed to make sure that such guidelines are refined and are suitable for research on the genetics of preterm birth.


For many years, research on the etiology of preterm birth has primarily focused on demographic, social-behavioral, and environmental risk factors. Until recently, the roles of genetic susceptibility and gene-environment interactions in preterm birth have largely been unexplored. The use of molecular genetic epidemiology represents a promising approach to understanding the role and biological mechanisms of the genetic and environmental factors involved in preterm birth and their interactions in the pathogenesis of preterm birth. New tools for high-throughput genotyping, coupled with very-large-scale population-based studies that use sensitive biomarkers, comprehensive exposure assessment, and advanced biotechnology and analytical strategies, are needed to unravel the complex multiple gene-environment interactions responsible for preterm birth. Understanding these factors and their interactions could lead to major improvements in the diagnosis, prevention, and treatment of preterm birth.