Metagenomics reveals novel microbial signatures of farm exposures in house dust

Indoor home dust microbial communities, important contributors to human health, are shaped by environmental factors, including farm-related exposures. Advanced metagenomic whole genome shotgun sequencing (WGS) improves detection and characterization of microbiota in the indoor built-environment dust microbiome, compared to conventional 16S rRNA amplicon sequencing (16S). We hypothesized that the improved characterization of indoor dust microbial communities by WGS will enhance detection of exposure-outcome associations. The objective of this study was to identify novel associations of environmental exposures with the dust microbiome from the homes of 781 farmers and farm spouses enrolled in the Agricultural Lung Health Study. We examined various farm-related exposures, including living on a farm, crop versus animal production, and type of animal production, as well as non-farm exposures, including home cleanliness and indoor pets. We assessed the association of the exposures on within-sample alpha diversity and between-sample beta diversity, and the differential abundance of specific microbes by exposure. Results were compared to previous findings using 16S. We found most farm exposures were significantly positively associated with both alpha and beta diversity. Many microbes exhibited differential abundance related to farm exposures, mainly in the phyla Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria. The identification of novel differential taxa associated with farming at the genera level, including Rhodococcus, Bifidobacterium, Corynebacterium, and Pseudomonas, was a benefit of WGS compared to 16S. Our findings indicate that characterization of dust microbiota, an important component of the indoor environment relevant to human health, is heavily influenced by sequencing techniques. WGS is a powerful tool to survey the microbial community that provides novel insights on the impact of environmental exposures on indoor dust microbiota. These findings can inform the design of future studies in environmental health.


Introduction
In the northern hemisphere, humans spend 90% of their lives indoors (U.S. Environmental Protection Agency, 1989), with much of this time spent in the home, where they both contribute to and are exposed to environmental microbiota. Home dust microbiota are commonly captured by vacuuming living spaces, including bedrooms. Exposure to bacterial and fungal communities inside the home has been associated with allergic, atopic, and respiratory conditions in children and adults (Ege et al., 2011;Dannemiller et al., 2016a;Stein et al., 2016;Lee et al., 2018). These associations could reflect the direct impacts of environmental microbial exposure on inhabitants' health, as well as through indirect effects of dust microbiota on the human gut, skin, oral, and respiratory microbiomes (Lax et al., 2014;Dannemiller et al., 2016b;Gupta et al., 2020). Housing characteristics and other environmental exposures have been shown to influence indoor microbial communities, including farm-related exposures (Dannemiller et al., 2016b;Amin et al., 2022;Panthee et al., 2022;Zhou et al., 2023). Living in or near a farm environment entails unique microbial exposures and subsequent health concerns. Farm exposures have been associated with altered microbial composition in home dust, which in turn have been associated with allergic outcomes in adults and children (Birzele et al., 2017;Lee et al., 2018;Kirjavainen et al., 2019;Lee et al., 2021). Identifying environmental factors that influence home dust microbiota is a critical first step in determining exposure pathways relevant to health outcomes.
The emergence and optimization of high-throughput sequencing have enabled new approaches to assessing the composition of bacterial communities present in home dust samples, which have a complex matrix and low microbial biomass compared to hostassociated microbiome samples such as stool. 16S rRNA amplicon sequencing (16S) is a traditional next-generation technique in which all amplified products are sequenced from a single gene (i.e., the 16S rRNA gene). The technique is limited, however, because annotation is based on putative associations of the 16S rRNA gene with bacterial taxa defined computationally as operational taxonomic units (OTUs). Thus, specific bacterial entities are not directly sequenced, but rather predicted based on OTUs, and consequently have more uncertainty at the lower taxonomy ranks of genus and species Campanaro et al., 2018;Laudadio et al., 2018;Breitwieser et al., 2019). Metagenomic whole genome shotgun sequencing (WGS), in which random fragments of the genome are sequenced, is an alternative approach and offers a major advantage in that taxa can be more accurately defined at the genus/species level (Tessler et al., 2017;Laudadio et al., 2018). However, WGS is more expensive and requires more extensive data processing and analysis (Breitwieser et al., 2019;Durazzi et al., 2021). Most of the published data on associations of home dust microbiota with environmental exposures or health outcomes have relied on the older 16S methodology.
Higher taxonomic classification resolution with WGS provides a more comprehensive description of the microbial community, and may improve the ability to detect novel associations with environmental risk factors, which is important when considering environmental health pathways. In human microbial communities, especially the gut microbiome, WGS generally identifies a larger number of unique phyla and higher overall microbial diversity within samples compared to 16S (Logares et al., 2014;Chan et al., 2015;Tedersoo et al., 2015;Clooney et al., 2016;Guo et al., 2016;Ranjan et al., 2016;Tessler et al., 2017;Laudadio et al., 2018;Durazzi et al., 2021). However, results are mixed for environmental samples in water and soil (Fierer et al., 2012;Poretsky et al., 2014). At present, no research has evaluated sequencing methodology on microbial community characterization in indoor home dust samples, and how this will impact the upstream associations with farm and non-farm environmental exposures.
In the present study, we analyzed samples from 781 participant homes in the Agricultural Lung Health Study (ALHS), a study of farmers and their spouses in North Carolina and Iowa, using advanced WGS methods, and evaluated associations with farm and nonfarm exposures found to be important in previous work based on 16S, in this cohort and others (Dannemiller et al., 2016b;Lee et al., 2018;Sitarik et al., 2018). We considered both microbial community diversity levels and specific bacterial taxa, in order to determine whether WGS can provide novel insights into farming environmental exposure pathways, the results of which are relevant to the design of future research integrating environmental health and microbiology.

Study population and design
Agricultural Lung Health Study is a case-control study of adult asthma study nested within the Agricultural Health Study (AHS), a prospective cohort of licensed pesticide applicators, mostly farmers and their spouses, enrolled between 1993 and 1997 (Alavanja et al., 1996). ALHS participants were selected from among AHS participants who were either farmers or farm spouses in North Carolina (NC) and Iowa (IA) and completed an AHS telephone follow-up conducted from 2005 to 2010. ALHS enrolled individuals with asthma diagnosis and current asthma symptoms or medication use along with individuals with symptoms and medication use suggesting likely asthma (n = 1,223). The comparison group was a random sample of AHS participants without these criteria (n = 2,078). The Supplementary Methods further details study population selection and inclusion criteria. The Institutional Review Board at the National Institute of Environmental Health Sciences approved the study. Written informed consent was obtained from all participants.

Dust sample and environmental exposure data collection
Of the 3,301 ALHS participants, 2,871 received a home visit and had adequate levels of collected dust from the bedroom (Figure 1), as described in Carnes et al. (2017). A trained field technician vacuumed two 1-yd 2 (0.84-m 2 ) areas-one on participants' sleeping surface and one on the floor next to the bed-for 2 min each with a DUSTREAM Collector (Indoor Biotechnologies Inc.). The samples were divided into aliquots of 50 mg and stored at −20°C until DNA processing.
During the home visit, information was obtained on environmental factors, including current (past 12 months) farming Frontiers in Microbiology 03 frontiersin.org activities (living on a farm, working with crops, and working with animals), type of animals raised on the farm (beef or dairy cattle, swine, or poultry) and the presence of indoor pets (cats and dogs). Field technicians noted the presence of carpeting in the bedroom and ranked overall home cleanliness on a standardized five-point scale (Arbes et al., 2003). For our analysis, we created a binary variable comprising poor/lower (score of 1 or 2) or good/higher (score of 3-5) home condition. We categorized season of dust collection based on the date of the home visit: March 21-June 20 for spring, June 21-September 20 for summer, September 21-December 20 for fall, and December 21-March 20 for winter.

DNA extraction
A random selection (n = 879, including 333 asthma cases) of dust samples were sent for WGS analysis (Figure 1). DNA extraction is described elsewhere (Lee et al., 2018). Briefly, DNA was isolated using a MoBio 96 well plate PowerSoil DNA extraction kit (QIAGEN Inc.), as recommended by the manufacturer, with the modification of loading 0.3-0.5 g per dust sample into each well and incubated in PowerSoil bead solution and C1 buffer at 70°C for 20 min before the beating step to aid in lysis of spores. We quantified using the NanoDrop (A260) (Thermo Fisher Scientific Inc.) and normalized to 5 ng/l DNA. Workflow of house dust microbiome study in WGS. This workflow includes a summary of sample selection from the Agricultural Lung Health Study (ALHS) (n = 3,301) to the house dust microbiome study with 16S (n = 879) and WGS sequencing (n = 781).

Metagenomic whole genome shotgun sequencing and preprocessing
The University of California San Diego IGM Genomics Center performed library preparation, multiplexing, and whole genome shotgun sequencing using standard techniques (Sanders et al., 2019). Extracted DNA was quantified via QubitTM dsDNA HS Assay (ThermoFisher Scientific). The library size was selected for fragments between 300 and 700 bp using the Sage Science PippinHT and sequenced as a paired-end 150-cycle run using an Illumina HiSeq2500 v2 in Rapid Run mode.
We performed several quality control steps, which are summarized in Supplementary Figure S1. We first trimmed low-quality reads, duplicates, and adapters based on FastQC results (v0.11.5) (Andrews, 2010). We then identified and removed reads not from microbial genomes, as potential contaminant host genomic sources (human, PhiX, cow, pig, chicken, turkey, horse, goat, sheep, dog, cat, and dust mite genomes) (Supplementary Table S1) using Bowtie2 (Langmead and Salzberg, 2012) and KneadData (v0.7.10) (Beghini et al., 2021). We further assessed the taxonomic classification of sequences using Kraken2 (v2.1.1) (Wood et al., 2019) and obtained accurate estimations of abundance using Bracken (v2.5.0) (Lu et al., 2017) with pre-compiled data comprising RefSeq genomes for bacteria, archaea, eukaryotes, fungi, viruses, and plasmids and NCBI taxonomy information. Supplementary Tables S2, S3 summarize the overall read sequence statistics and proportion of host genome contaminants across samples. Additionally, we accounted for the potential introduction of contaminant DNA sequences during sample collection or laboratory processing by incorporating negative 'blank' sequencing controls of sterile water, with contaminants identified and removed with the decontam R package (v1.10.0) (Davis et al., 2018). A total of 168 taxa were filtered out (Supplementary Table S4). Because dust samples have low microbial biomass (fewer microbes), we performed two sequencing runs, each with separate quality control processes, and then performed abundance pooling across the two runs. At the sample level, we excluded low-quality samples defined by sequencing depths less than 1,000 (Supplementary Figure S2). Rare taxa were filtered out if they did not appear in at least 10 samples (Supplementary Figure S2). This quality control pipeline left 781 samples and 6,528 taxa for downstream analysis. A taxonomy chart was created that assigned all taxa to a taxonomic classification across the seven phylogenetic levels -kingdom, phylum, class, order, family, genus, and species. The Supplementary Methods provides details of the bioinformatic procedures.

Statistical analysis
We performed all statistical analyses and visualization in R (v4.0.3) (R Core Team, 2020). We rarefied data to the minimum library size (1,003) across all samples before calculating alpha and beta diversities using the phyloseq R package (v1.34.0) (McMurdie and Holmes, 2013). We considered both non-farming exposures, including state of residence, sex, presence of indoor pets, home condition, and season of dust collection, and farming exposures in the past 12 months, including living on a farm, crop farming, and animal farming. All exposures were treated as binary variables. For season of dust collection, we compared one season to all other seasons combined.
We included asthma as a covariate in all models due to the nested case-control design.
To evaluate intra-group alpha diversity and its association with farming and non-farming exposures we used the Shannon index, exponentially transformed for normality, as the outcome in linear models. We first fitted a baseline univariable regression model for each exposure to identify exposures associated with alpha diversity. We also considered whether associations differed by state of residence (IA or NC) by using product terms. Our final multivariable model included any exposure with significant association to alpha diversity from the baseline univariable model, along with any significant product terms for the individual interactions of each exposure with state of residence. Detailed analytical formula were described in Supplementary Methods (SM3). We set p < 0.05 as the statistical significance threshold for all analyses.
To explore beta diversity, we calculated unweighted and weighted UniFrac distance metrics. We conducted permutational multivariate analysis of variance (PERMANOVA) analysis to test the differences in microbial community structure across exposure levels using the adonis method in the R vegan package (v2.5.7) (Oksanen et al., 2013;Anderson, 2014). We used the R 2 value to quantify the percentage of variance explained. We did similar analysis as alpha diversity to evaluate differences in associations by state. We conducted non-metric multidimensional scaling (NMDS) analysis to visualize the separation between samples by exposure levels in a two-dimensional space using the phyloseq (v1.34.0) (McMurdie and Holmes, 2013) and R ggplot2 (v3.3.6) (Wickham, 2016) packages.
To identify differentially abundant taxa for each exposure, we used analysis of composition of microbiomes with bias correction (ANCOM-BC, v1.0.5) models (Lin and Peddada, 2020), which is based on a linear regression framework on the log transformed taxa counts, with exposures as dependent variables and sampling fraction as an offset term. To account for variation in sequencing depth, we performed normalization by estimating the sampling fraction using the ANCOM-BC built-in algorithm. We tested taxa at the OTU level and summarized the results by genus and phylum rank. We also calculated the log2 fold-difference which is the ratio of the mean abundance after normalized by ANCOM-BC across exposure levels. We controlled the false discovery rate (FDR) at 0.05 with the Benjamini-Hochberg (BH) method (Benjamini and Hochberg, 1995). We determined a taxon to be significantly differentially abundant if it had both p < 0.05 after FDR correction and had log2 fold-difference larger than 1 or smaller than −1. We performed sensitivity analyses to evaluate differences in associations by state of residence. Lee et al. (2018) analyzed samples for the same population with 16S rRNA amplicon sequencing (V3-V4 region), detailed sequencing method can be found in Supplementary Methods (SM4) and Lee's paper. To examine differences of house dust microbial profile between these two methods, we compared the taxonomic chart from our WGS data to the previous 16S data to determine the number of unique and overlapping microbial organisms, at the phyla rank, detected by each sequencing method. We note how common or rare the uniquely identified phyla were based on the frequency of assigned taxa and the relative abundance across samples. In addition, we evaluated the differences between alpha diversities (richness and Shannon index) generated by the two sequencing methods by calculating the Spearman's correlation coefficient.  (247). Sixty percent of participants were male. Indoor pets (dogs or cats) were present in 43% of homes. Most homes (78%) were in good/higher cleanness, and nearly all had carpeted floors (93%). Overall, 83% of participants lived on a farm, 56% farmed crops in the past 12 months, and 51% worked with farm animals in the past 12 months. Of the 401 (51%) participants who reported animal farming, 281 worked with beef cattle, 48 worked with dairy cattle, 120 worked with hogs, and 90 worked with poultry. Overall, 31% of dust samples were obtained in summer, 25% in spring, 20% in fall, and 23% in winter. Current asthma was present in 296 (37.9%) participants and the overall mean age of participants was 62 years (standard deviation 11).
After filtering out samples with low sequencing depth and filtering out rare taxa, 781 samples and 6,528 taxa remained for downstream analysis with 183,025,561 reads across all samples. At the Kingdom phylogenetic level, 5,661 taxa were assigned to Bacteria, 156 to Archaea, 96 to Eukaryota, and 615 to viruses, with an average of 2,247 (±1,226) taxa per sample (n = 781). Figure 2 outlines the phylum composition across all samples. Among the 59 phyla identified from WGS, 16 had relative abundance greater than 1% in at least one sample (Figure 2 and Supplementary Table S5). Phyla Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes were the most prominent among home dust microbial communities. At lower taxonomy rank, 1789 unique genera were identified, where 36 had relative abundance greater than 10% in at least one sample. The five most abundant genera were Mycobacterium, Serratia, Toxoplasma, Lactobacillus, and Alcaligenes (Supplementary Table S6). Figure 3 shows the association between alpha diversity and each exposure. The presence of indoor pets and farming status (living on a farm, crop farming, animal farming with beef cattle, hogs, and poultry) were positively associated with alpha diversity, while good/ higher home cleanliness was negatively associated with alpha diversity (p < 0.050). State of residence had a suggestive significant association with alpha diversity with p = 0.057. In our multivariable primary model including all statistically significant exposures and all significant interaction terms with state of residence, living on a farm and animal farming remained significantly positively related to alpha diversity (Supplementary Table S7).

House dust microbial community diversity analysis
For beta-diversity, PERMANOVA analysis revealed significant differences in beta diversity for all demographic characteristics and exposure levels based on unweighted UniFrac distance although the percent variance explained by the exposure groups (R 2 values) were small (Supplementary Figure S3). Current  Figures 4A,B). The differences in the microbial composition of home dust samples by state of residence explained around 1% of the variance of bacterial communities (p = 0.001) ( Figure 4C). The Relative abundance at the phylum level across all home dust samples. The 16 phyla with relative abundance greater than 1% in at least one sample are color-coded according to the legend. All other phyla are represented in gray.

FIGURE 3
Association between exposures and alpha diversity (Shannon index with exponential transformation). Data were rarefied to the minimum library size (1,003) across all samples. Effect size refers to the coefficient from the regression model (difference in alpha diversity for yes versus no for each exposure). The 95% confidence interval (CI) and value of p for each exposure from the regression model are reported.
Frontiers in Microbiology 07 frontiersin.org results with weighted UniFrac distance were similar to unweighted metric (Supplementary Figure S4).

Differential abundance analysis of individual taxa
There were 372 unique taxa belonging to 175 genera within 16 unique phyla, that were differentially abundant in relation to at least one exposure (Supplementary  Tables  S8,  S9 and Supplementary Figure S5). Animal farming and living on a farm were associated with more differentially abundant taxa than non-farming exposures. Figure 5 includes volcano plots of differentially abundant taxa related to the presence of indoor pets, living on a farm, crop farming, and animal farming in the past 12 months, color coded by phylum. The top 10 taxa based on FDR values are labeled by their genus rank. Working with hogs was identified with the greatest number of differentially abundant taxa compared with other types of farming animals ( Figure 5A and Supplementary Figure S5).
Living on a farm was associated with differential abundance of 101 taxa (increased abundance for 100 taxa and decreased abundance for one taxon in genus Dickeya), which were mainly in phylum Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria ( Figure 5B). Among the top 10 taxa, two were in genus Bifidobacterium. The 26 differentially abundant taxa all had increased abundance related to crop farming were mainly in phyla Actinobacteria, Firmicutes, and Proteobacteria ( Figure 5C). The most significant taxa were genus Methanobrevibacter and Jeotgalibaca. Animal farming was associated with increased abundance for 191 taxa and decreased abundance for one taxon in phylum Firmicutes ( Figure 5D). Genera Methanobrevibacter, Jeotgalibaca, Corynebacterium, Chryseobacterium, Glutamicibacter, Pseudomonas, and Rhodococcus were among the top 10 taxa. Forty-nine taxa were differentially abundant for the presence of indoor pets, mostly in phylum Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria, and Proteobacteria ( Figure 5E). The taxa with the smallest FDR value were genus Frederiksenia and Poerphyromonas. Only a few differentially abundant taxa belonging to phylum Proteobacteria were related to the season of dust collection (Supplementary Table S8 and Supplementary Figure S5).
Many differentially abundant taxa were shared among exposures, but there were some taxa uniquely related to individual farming exposures ( Figure 6 and Supplementary Table S9). In particular, there were 103 taxa assigned to 67 genera within 7 phyla (Proteobacteria, Actinobacteria, Bacteroidetes, Euryarchaeota, Firmicutes, Tenericutes, Chloroflexi) specific to animal farming. For crop farming, 2 taxa were unique -Tatumella citrea in phylum Proteobacteria and Fusarium graminearum in phylum Ascomycota (Supplementary Table S9 Table S9). In terms of specific type of farm animals, 89 taxa were unique to hogs, including Clostridium, Campylobacter, Pseudomonas, and Streptococcus suis, 14 unique to poultry, including Enterococcus, Brucella, and Escherichia genera, 5 unique to dairy cattle, including Mycoplasma and Acinetobacter, and 26 unique to beef cattle, including Corynebacterium and Bacillus (Supplementary Table S9). Several taxa were identified in multiple types of farming animals: 15 taxa were shared for hogs, beef cattle and dairy cattle, only one taxon (Carnobacterium sp._CP1) were common among hogs, poultry, and beef cattle, and 24 taxa including Methanobrevibacterium was related to either cattle type ( Figure 6 and Supplementary Table S9).
As for non-farming exposures, 44 taxa were uniquely differentially abundant for presence of indoor pets, including animal-related Staphylococcus species pseudintermedius and felis. Additionally, 4 taxa were unique to home condition, 16 unique to carpeting, and 3 unique to spring dust collection (Supplementary Table S9).

Sensitivity analysis by state of residence
For interaction effects by state of residence with either alpha or beta diversity, only sex, home condition, crop farming, general animal Frontiers in Microbiology 08 frontiersin.org farming, beef cattle farming, and spring dust collection had significant interactions, but most effect sizes were minimal (Supplementary Tables S10-S12). Therefore, we did not carry interaction products into the differential abundance analysis. When stratifying by state of residence, several exposures, including the presence of indoor pets, living on a farm, and general animal farming, were significantly associated with either alpha or beta diversity in Iowa, where about 2/3 of participants resided but not in North Carolina which has a much smaller sample size (Supplementary Tables S13, S14). Fourteen phyla were consistent for both states with differentially abundant taxa by at least one exposure (Supplementary Table S8 and Supplementary Figures S6, S7).

Additional findings with WGS from 16S rRNA sequencing results
Whole genome shotgun sequencing data identified many more taxa and phyla than 16S rRNA. The 6,526 taxa identified by WGS data were assigned to 59 phyla, compared to 1,346 taxa from 18 phyla for 16S. The three phyla with the largest proportion of taxa assignment (most frequent) for WGS results (Proteobacteria, Actinobacteria, Firmicutes) were identical for 16S results. Among the 18 phyla identified from 16S sequencing, 17 were present in the WGS results (Figure 7 and Supplementary Table S5). Forty-seven phyla were uniquely identified by WGS, of which the most frequent phyla were Uroviricota with 518 (7.9%) taxa assigned, Ascomycota with 51 (0.8%) taxa assigned, Spirochaetes with 38 (0.6%) taxa assigned, Cossaviricota with 35 (0.5%) taxa assigned, and Apicomplexa with 25 (0.4%) taxa assigned (Supplementary Table S5). Additionally, many of the unique phyla in WGS were not rare, including Apicomplexa with average relative abundance across all samples at 3%, and Ascomycota, Cossaviricota, Basidiomycota, Nucleocytoviricota, and Uroviricota at 2% each (Supplementary Table S5). When examining differences in the alpha diversity of results from WGS and 16S sequencing, Spearman's correlation coefficient for richness (rho = 0.413, p < 2.2e-16) and the Shannon index (rho = 0.355, p < 2.2e-16) were moderate.
Because more microbial organisms were detected by WGS, we observed additional associations with farming exposures compared to 16S data presented by Lee et al. (2018). Notably, a unique phylum (Ascomycota) detected only by WGS was significantly associated with crop farming. One of phyla identified by both WGS and 16S (Tenericutes) had differentially abundant taxa based on animal farming using WGS not with 16S (Supplementary Tables S5, S8). In addition, WGS provided the ability to assign taxa to genus taxonomic levels, including the 175 genera with differential abundance taxa related to at least one exposure (Supplementary Table S8), compared to 16S results at the phyla and family level. Of 175 genera, 16 had relative abundance greater than 10% in at least one sample including Lactobacillus, Staphylococcus, and Bacillus (Supplementary Tables S6, S8).

Discussion
In this study, we evaluated the associations between farming exposures and house dust microbiota using the whole genome shotgun sequencing method in a US agricultural population. Our results indicate that both indoor microbial diversity and composition in homes differ in relation to current farming exposures; living on a farm, and crop and animal farming were associated with increased within-sample microbial diversity levels and altered microbial composition. Expanding on our previous findings performed with 16S rRNA gene amplicon sequencing, we identified four times more unique microbial taxa. The improved detection of unique taxa with WGS enabled us to detect novel associations between farm exposures Differentially abundant taxa related to various types of farming animal (FDR < 0.05). Commonly identified differentially abundant taxa shared by farming animal types were aligned by lines (orange), while differential taxa unique to farm animal type is identified by a single dot (blue). Venn diagram of the number of phyla identified in WGS (blue) and 16S (orange). Seventeen phyla were identified by both methods (Supplementary Table S14).
Frontiers in Microbiology 10 frontiersin.org and increased abundance of specific microbes including Rhodococcus, Bifidobacterium, Corynebacterium, and Pseudomonas. Enhanced identification of factors that impact the indoor microbiome can improve understanding of environmental exposure pathways relevant to human health.
A unique aspect of this study was the use of the whole genome shotgun sequencing technique, compared to many previous home dust microbiome studies that use the 16S rRNA amplicon sequencing technique (Lee et al., 2018;Kirjavainen et al., 2019). This work is the first reported to use WGS to evaluate farm exposures in home dust microbiota. WGS has the advantage of sequencing the entire microbial genome, versus just a single gene, which can more accurately assign taxonomic classifications (Rausch et al., 2019). In this study, the use of WGS identified more unique microbial phyla -42 phyla were found only using WGS, including both common and rare taxa, versus only one phylum using the 16S technique. Detection of a greater number of unique phyla from WGS compared to 16S enables better characterization of the mixed, complex microbial composition of indoor dust in homes. Consequently, we observed novel environmental exposure associations with the newly detected microbial outcomes from this more comprehensive WGS method. Expanded taxonomic detection and depiction, as well as the development of updated, robust bioinformatic and statistical tools for metagenomic data (Berg et al., 2020), will then have downstream effects on the interpretation of association to environmental exposures.
Consistent with findings using 16S, our data with WGS found that numerous bacteria were associated with environmental exposures across various phyla. At the phyla level, Actinobacteria, Bacteroidetes, Firmicutes, and Proteobacteria were positively associated with farm exposures, including living on a farm and crop and animal farming. These trends are similar to our findings using 16S, which found Firmicutes and Proteobacteria to be associated with farm exposures. In previous research, these phyla have been associated with various health conditions, such as asthma, atopy, and cardiometabolic outcomes (Ley et al., 2006;Abrahamsson et al., 2012;Lynch et al., 2014). However, in our previous 16S results, crop farming was associated with significant decreased abundance of taxa in 16 of the 19 phyla (Lee et al., 2018), whereas using WGS all 26 of our significantly associated taxa had an increased abundance with crop farming. Complementary studies evaluating home dust in Germany and Finland (Kirjavainen et al., 2019) and classroom dust in China (Fu et al., 2021) have found positive associations between nearby farm exposure and increased abundance of Proteobacteria (also known as Alphaproteobacteria) and Actinobacteria. In turn, studies have shown that the presence in house dust of some bacteria in the Bacteroidetes and Firmicute phyla in are associated with lower risk of atopy and lower risk of asthma in early life, an important health implication for workers' children (Lynch et al., 2014;Bacharier et al., 2019).
WGS enables improved classification of microbial taxa at lower taxonomic levels, including the identification of genera that are differentially abundant by environmental exposures. Using WGS, we ascertained genera that were associated with our farming exposures, including Rhodococcus, Bifidobacterium, Corynebacterium, and Pseudomonas. Rhodococcus and Corynebacterium, gram-positive bacteria, and Pseudomonas, a gram-negative bacterium, are found commonly in environmental sources (Weinstock and Brown, 2002;Wong et al., 2010;De Bentzmann SaP, 2011). Certain strains of each can be pathogenic in immunocompromised individuals (Weinstock and Brown, 2002;Wong et al., 2010;De Bentzmann SaP, 2011), and their abundance has been shown to be elevated in dust from children with asthma and atopy (Valkonen et al., 2018). Increased abundance of these potential pathogens in the homes of farmers can have important health implication, both infectious and allergic outcomes, for the workers and their cohabitating family members. Pseudomonas was also found to be increased using WGS in classroom dust samples in rural regions near farms compared to suburban areas in China (Fu et al., 2021). Interestingly, Rhodococcus, Pseudomonas, and Methylobacterium (another microbe positively associated with farm exposures in our data) have been previously identified in agricultural settings, where they can be bioremediation agents and degrade certain pesticides (Pujar et al., 2022). Bifidobacterium is ubiquitous in the human and animal gastrointestinal tract and is associated with positive gut homeostasis, inhibition of pathogen colonization, and modulation of the local and systemic immune system (Kau et al., 2011;Fiocchi et al., 2012). We observed that Methanobrevibacter and Jeotgalibaca, both previously associated with cattle rumen and manure fermentation (Skillman et al., 2006;Hatti-Kaul et al., 2018), were increased with crop and animal farming, and unique to dairy and beef cattle farming, which is consistent with previous studies evaluating farm exposures in human microbial communities (Shukla et al., 2017;Kirjavainen et al., 2019;Kraemer et al., 2021). Two taxa unique to crop farming, Tatumella citrea and Fusarium graminearum, are pathogens associated with grain production (Goswami and Kistler, 2004;Bull et al., 2012). Reassuringly, we noted an increased abundance of microbes specific to farm and companion animals associated with concurrent exposure to those animals, such as Streptococcus suis with hog farming exposure (Staats et al., 1997) and Staphylococcus pseudintermedius and felis with dog and cat exposure (Bannoehr and Guardabassi, 2012;Sepich-Poore et al., 2021).
Our findings suggest that the home dust microbial diversity levels differ between participants exposed to farming activities, as well as pets, both for alpha and beta diversity levels. Overall, the findings from this study were generally similar to those preformed previously using 16S (Lee et al., 2018). For microbial composition beta diversity, we found distinct microbial community structure based on farm and non-farm exposures, which was significant for all explored variables, similar to results from 16S. The coefficient-of-determination R-squared (R2) statistic was greater using 16S, which supports the hypothesis that WGS resulted in more diverse microbial community identification with greater heterogeneity, so the same exposure would account for less of the variability. Both WGS and 16S findings had low R2 explained variance, consistent with previous research (Dunn et al., 2013). Both analyses showed positive associations between alpha diversity and crop and animal farming. Living on a farm was a significant factor using WGS but not 16S. In addition, there were differences based on the type of animal production, with hog production having a positive association using WGS but not 16S, and dairy cattle production having a positive association using 16S but not WGS (although there was a positive trend).
The differences in associations between exposures and Shannon alpha diversity in the WGS compared to our previous 16S data are to be expected given differences between the methods and batch effects when comparing two different methods run 3 years apart in different laboratories. Alpha diversity was slightly higher in WGS than 16S samples with moderate correlation (Spearman's rho = 0.36); unsurprisingly, as a greater number of unique microbes were identified with WGS and is similar to previous research on environmental Frontiers in Microbiology 11 frontiersin.org samples (Tessler et al., 2017). The discrepancies in measurements and effect sizes between WGS and 16S can lead to altered interpretations regarding risk factors for dysbiosis in home dust microbial composition and highlights the importance of how the processing of microbiome samples can impact downstream analyzes. The positive associations with farm exposures and alpha diversity reinforce trends observed in other literature (Stein et al., 2016;Birzele et al., 2017;Kirjavainen et al., 2019;Fu et al., 2021;Amin et al., 2022), in addition to our prior 16S analyses (Lee et al., 2018). In a study of 203 homes in Finland and Germany, homes located on farms had significantly higher indoor microbial richness and diversity compared to rural non-farm home indoor dust, which was associated with decreased asthma risk in child inhabitants (Kirjavainen et al., 2019). Amin et al. (2022) reported that airborne bacterial diversity was more abundant in farmer's indoor environment than in suburban homes. Using WGS, a study in Shanxi Province, China, found higher microbial diversity in schools in rural area near farms compared to urban non-farm schools (Fu et al., 2021). While exposure to highly diverse environments in early life can be protective for some allergic outcomes (Ege et al., 2011;Depner et al., 2020), the consequences of exposure to a diverse environmental microbiome in adults are less well studied and some opposite associations have been seen. In studies of adults, high environmental microbial diversity is associated with more asthma symptoms and worsening asthma severity (Dannemiller et al., 2016a;Lai et al., 2018). On the other hand, within our farming population, lower bacterial diversity levels in home dust were associated with atopy and hay fever (Lee et al., 2021).
A limitation of this work is that we only have a single dust sample per household, collected in the bedroom. Thus, we assume the sample reflects the normal home condition. To the extent that microbial composition differs across the household (Zhou et al., 2023), this may not be true. However, people spend about a third of their time in the bedroom, making this a logical single sampling location. This limitation would be expected to lead to nondifferential misclassification of exposure and a bias toward the null. Our work benefits from an advanced next-generation technique, whole genome shotgun sequencing, to explore the impact of detailed farm exposures on the indoor microbiome in a large sample size compared to previous studies. One disadvantage of this technique is that we could not assess absolute abundance of specific microbes, including pathogens (Nayfach and Pollard, 2016). Progress has been made toward doing so by combining sequencing with density measurements from flow cytometry (Hingamp et al., 2013) or quantitative PCR (Liu et al., 2012), and by incorporating DNA or mRNA standards (Satinsky et al., 2013). However, the improved detection from WGS across novel phyla at the genus level adds insights on factors influencing the built environment microbiota, which plays a key component on host microbiome composition and subsequent health outcomes. Future investigations on the functional capabilities of the dust microbiota, such as presence of antibiotic resistance genes, can help better understand human health and disease etiology caused by environmental exposures.

Conclusion
We evaluated a comprehensive set of factors related to farming to determine their influence on home dust microbiome assessed using state of the art whole genome shotgun sequencing. The increased identification by WGS of microbial entities led to detection of associations missed using older 16S technology. Identifying significant predictors of indoor built environmental microbiota is an important element in understanding environmental exposure health pathways. The use of advanced whole genome shotgun sequencing techniques produced novel insights into these health pathways and may be considered an optimal metagenomic method for future environmental health studies.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at the following link: Microbiome sequencing data are available at the Sequence Read Archive (SRA) under project number PRJNA975673 (https://www.ncbi.nlm.nih.gov/sra/).

Ethics statement
The studies involving human participants were reviewed and approved by Institutional Review Board, National Institute of Environmental Health Sciences. The patients/participants provided their written informed consent to participate in this study.
Author contributions SL and ML were responsible for study design and data acquisition. CP and LB initiated ALHS study and were responsible for the sample collection. ZW designed and performed all bioinformatics and statistical analysis, with SZ, AM-R, KD, SL, and ML providing analytical input. QZ, AG, and RK planned shotgun metagenomics sequencing and prepared raw sequences data. ZW and KD formulated the research ideas and drafted the manuscript. All authors contributed to the interpretation of results and editing of the manuscript.

Funding
This work was supported by the Intramural Research Program of the National Institutes of Health (NIH), the National Institute of Environmental Health Sciences (NIEHS) (Z01-ES049030 and Z01-ES102385), the National Cancer Institute (Z01-CP010119B), and by American Recovery and Reinvestment Act funds. The Center for Microbiome Innovation at the University of California San Diego provided support by generating sequencing data.