Comparative Genomics Identifies Features Associated with Methicillin-Resistant Staphylococcus aureus (MRSA) Transmission in Hospital Settings

ABSTRACT Methicillin-resistant Staphylococcus aureus (MRSA) is a serious public health concern in the United States. Patients colonized and/or infected can transmit MRSA to healthcare workers and subsequent patients However, the components of this transmission chain are just becoming evident, including certain patient factors, specific patient-healthcare worker interactions, and microbial factors. We conducted a comparative genomic analysis of 388 isolates from four hospitals in three states: Maryland, California, and New York. Isolates from nasal surveillance or clinical cultures were categorized as high, moderate, or low transmission surrogate outcomes based on the number of times the species was identified on the gloves or gowns of healthcare providers. The comparative analyses included a single gene, multigene, and core genome phylogenetic analysis, as well as a genome-wide association analysis to identify molecular signatures associated with the observed transmission surrogate outcomes, geographic origin, or sample source of isolation. Based on the phylogenetic analysis, 95% (n = 372) of the MRSA isolates were from four well-described genomic clades, with most of the isolates being part of the USA300 containing clade (n = 187; 48%). Genome-wide association studies also identified genes that were exclusive or prevalent among specific geographic locations. The identified genes provide insights into the transmission dynamics of MRSA isolates providing additional insights into the basis of the geographical differences of MRSA for molecular diagnostics. IMPORTANCE Methicillin-resistant Staphylococcus aureus (MRSA) is considered a serious threat to public health and contributes to the dissemination of S. aureus in both the healthcare and community setting. Transmission of MRSA between patients via healthcare worker (HCW) has been described. However, what is not understood are the genetic determinants that contribute to the transmission of MRSA from patients to HCWs. In this study, we demonstrated that certain genes may be associated with transmission in the hospital setting.

1,000 hospitalizations) (3). A third of all individuals in the United States are colonized with S. aureus, and 2% are colonized with MRSA (4,5). Patients who are colonized with MRSA can become a reservoir of transmission and have the potential to spread this bacterium to susceptible HCW and other patients in a hospital setting (6). These transmission events may result in the development of symptomatic infection or asymptomatic colonization among patients who acquire MRSA (7).
The likelihood of transmission of a bacterial isolate from a patient to an HCW can be increased by several factors, including patient characteristics and healthcare worker (HCW)-patient interaction factors (8,9). Patient-level risk factors of MRSA colonization and/or infection among patients hospitalized in the intensive care unit (ICU) include medical comorbidities and the presence of medical devices (10)(11)(12)(13). Moreover, several recent studies have examined HCW-patient interaction factors that facilitate bacterial transmission (8,9,14,15). These studies determined that HCW gowns or gloves are contaminated about 16% of the time after any patient contact. O'Hara et al. (9) identified certain HCW activities, such as wound care  [3.04, 9.39]), and nurses (OR: 3.09, CI: [1.84, 5.19]) represented the greatest risk for glove and gown contamination likely due to the activities that they are performing on colonized or infected patients (9).
The genetic determinants of virulence for MRSA have been explored in detail. However, it is unclear if certain genetic virulence determinants may contribute to the altered transmission of MRSA (6,16). Many adhesive mechanisms of MRSA involve the attachment of MRSA to host cells and to artificial surfaces (e.g., gloves and gowns of healthcare workers) including microbial surface components that recognize adhesive matrix molecules (MSCRAMMs), as well as the synthesis of polysaccharides and intercellular adhesion proteins, which were identified via traditional methods of identification (17)(18)(19)(20). The sasC genes have been known to encode colonization factors that promote the spread of MRSA through microbial cell aggregation and biofilm formation (21,22). However, it is not known whether these virulence factors contribute to the identified glove and gown transmission surrogate outcome.
In a previous study from our group, we identified that we could categorize MRSA isolates into glove and gown transmission surrogate outcomes based on the frequent or infrequent transmission to the HCW gowns and gloves, using a culture-based technique (9). In the current study, as in the previous study by O'Hara et al. (9), bacterial transmission is measured by the frequency of HCW glove or gown contamination after performing patient care activities among patients who were colonized and/or infected with MRSA. The objective of the current study was to use comparative genomics to identify genomic regions and genes that are associated with isolates that display the high or low transmission surrogate outcomes and to characterize MRSA genomic content among a large, multistate cohort. We hypothesized that adhesive mechanisms, biofilm aggregation, regulation of virulence, antibiotic resistance mechanisms, and mechanisms of persistence may all play a role in transmission either directly or indirectly. Here, we identified genetic characteristics, such as the LPGTX-motif encoding gene and sasG intercellular aggregation genes, which are both adhesive proteins that may contribute to geographical differences in the hospital setting. Overall, this study provides insight into the genetic components of MRSA transmission.

RESULTS
Phylogenetic analysis of MRSA strains. Phylogenomic analysis was performed on the 388 MRSA isolates (Table 1, Table S1) and six additional reference genomes that were publicly available in GenBank (23,24). There were four main genomic clades identified that represent over 95% of the isolates examined in this collection of isolates Mechanisms of MRSA Transmission mSphere ( Fig. 1 and Table S1). These major clades were aligned with the three dominant MLST clonal complexes (Table S1). Of the four genomic clades, 48.2% (n = 187) of the isolates were in clade 1 (USA300), 43.6% (n = 169) of the isolates were in clade 2 (USA100), 2.3% (n = 9) of the isolates were in clade 3 (USA500), and 1.8% (n = 7) in clade 4 (USA600). The remaining 16 isolates (4.1%) could not be assigned to any of these genomic clades.
We divided the isolates based on transmission frequency: high, moderate, and low transmitters based on previously identified criteria. However, for these analyses, we focused on the high and low transmitter surrogate outcome groups (9). There were no differences in the number of isolates within each of the genomic clades among the transmission groups (P = 0.31) ( Table 2). Among the high transmitter isolates, 52.8% (n = 28) were in genomic clade 2, while 50.8% (n = 21) of low transmitters were in genomic clade 1. Similarly, there were no differences in the distribution of the isolates in the phylogeny among surveillance/clinical cultures or body site of isolation (P = 0.29 and P = 0.46, respectively) ( Table 2). However, there is a statistical difference in phylogeny groups among geographic locations (P = 0.03) ( Table 2). We identified an association with the phylogenomic groups among MD and NY isolates (P = 0.02). Most NY isolates (54.6%) and CA isolates (45.1%) were in genomic clade 2, whereas the MD isolates were more likely to be from genomic clade 1 (51.5%).
Clinical characteristics and transmission. We examined different clinical characteristics of the hospitalized patients based on the transmission types of the isolates. We determined there was an association between patients with a peripherally inserted central catheter (PICC) and transmission surrogate outcomes (Table 3). Based on this observation, we analyzed if there were genes associated with PICC lines. There were no genes exclusive to the patients with a PICC line compared to patients with no PICC line. However, 22 genes were more prevalent among patients with a PICC line compared to patients without a PICC line (P , 0.05). Genes that were more prevalent in this group included LPXTG-motif encoding genes and sasC genes (Table S5).
Molecular typing. (i) spa-typing. The spa-typing technique is a single locus typing scheme that compares the 39 coding region of S. aureus protein A and assigns a type (25). We assigned 72 spa-types to 382 isolates, with the remaining six isolates unable to be assigned a spa-type. This level of unassigned isolates is not uncommon in genomic studies (26). Among the isolates that could be assigned a spa-type the most common was t008 (n = 145/382; 37.9%), and the second most common spa-type was t002 (n = 84; 21.6%) However, there were no statistical differences between spa-type and transmission (P = 0.36) ( Table 4).
(ii) SCCmec typing. The SCCmec typing method uses the mobile genetic element Staphylococcal cassette chromosome, mecA or mec C genes and assigns a SCCmec type (27). We identified four SCCmec types among 367 isolates. Twenty-one isolates were unable to be typed using this method. The most prominent type of SCCmec type among the isolate was type IV (n = 225; 61.3%) following type II (n = 134; 36.5%). SCCmec type IV was the most prominent type among the high (26/53; 49.1%) and low transmitter groups (113/183; 61.7%) (Table S1).
(iii) Multilocus sequencing typing (MLST). MLST is a typing method that uses the seven housekeeping genes found in all S. aureus to characterize the genomic divergence of isolates (28). A total of 25 MLST types were identified in this collection of isolates. However, 308/388 (79.4%) of the isolates were represented in three sequence a MRSA isolates were classified by three different surrogate outcomes: (i) high transmitters, isolates that were identified on the gloves and gowns of HCW for more than 50% of HCW-patient interactions; (ii) mid transmitters, isolates that were identified on the gown and gloves for less than 50% of HCW-patient interactions; (iii) low transmitters, no transmission event occurred.
types (ST8, ST5, and ST105). Additionally, there were 18 isolates (4.6%) that did not belong to any of the known ST groups (Table S1). These results highlight the clonal nature of the S. aureus isolates. Genome-wide association comparisons. (i) High versus low transmission surrogate outcome. We identified that 53 isolates could be associated with the high transmission surrogate outcome and 183 isolates were associated with the low transmission surrogate outcome. The remaining 152 isolates were moderate-level transmitters and are not examined in detail in these analyses. We examined the 8,768 potential coding sequences from the 236 isolates in the high and low transmission groups to determine which genes were associated with the MRSA isolates from the high or low transmission groups. Among these genes, there were 4,606 genes (52.5%) that were shared among all the isolates indicating a high level of clonality. aureus USA300-ISMMS1 chromosome (GenBank accession number NZ_CP007176) and a total of 157978 SNPs were identified using ISG.26RAxML (60) was used to create the phylogenetic tree using 100 bootstrap replicates and FigTree was used for visualizations (23,24). The isolates highlighted in green are clade 1, those in blue are clade 2, those in orange are clade 3, and those in yellow are clade 4. Other isolates not in the four major clades are uncolored. The black dots at the nodes indicate greater than 80% bootstrap support.

Mechanisms of MRSA Transmission mSphere
Three genes were exclusively identified among the low transmission group isolates. These exclusively low transmission associated genes all encoded hypothetical proteins. Additionally, we identified four genes that were exclusive to the high transmission group, encoding two hypothetical proteins, one surface G protein (SasG) (29,30), and one tandem five-TM protein. Of note, SasG is an immunodominant-containing protein that aids in the promotion of biofilm formation (29,30). While these genes are exclusive in this collection of isolates, they are not statistically significant in the greater comparison of all genes and isolates due to the presence of the genes in relatively few high or low transmission isolates, respectively. When the analysis was corrected for multiple observations (Bonferroni correction [P , 0.05]) there were no genes that were significantly more prevalent in each of the groups.
(ii) Source of isolates. We further analyzed the genomic content of the isolates based on the source of culture because certain genes may play a significant role in a certain body location. Most of the isolates were from nasal surveillance specimens (n = 201/388; 51.8%) ( Table S1). The remaining isolates were from clinical cultures (n = 187/388; 48.2%). The breakdown of the clinical isolates was as follows: 99 (52.9%) were from sputum, 35 (18.7%) from blood, 9 (4.8%) from wounds, 6 (3.2%) from urine, and 38 (20.3%) from other body sites such as tissue biopsy specimens. For this analysis, we combined all the clinical sites (n = 88) except sputum (n = 99) and compared these clinical sites to the isolates from nares surveillance cultures (n = 201). We examined sputum cultures separately from the other clinical sites because we believed that the gene hits would be different in comparison to other clinical culture sites. Comparison of genomes from nares isolates to genomes from all other body sites isolates identified no genes to be more prevalent among one of these groups (Bonferroni correction [P , 0.05]). Similarly, there were no genes that were identified to be more prevalent among clinical isolates compared to the genomes from the surveillance isolates.
(iii) Geographic location. Isolates were sampled at four hospitals in three states: Maryland, California, and New York ( Table 1). Most of the isolates were from Maryland  Among these prevalent genes from the Maryland isolates, we identified 43 genes that were exclusive to Maryland isolates from Bonferroni correction. These genes include the LPXTG-motif encoding gene and surface G protein (SasG) ( Table S2). LPXTG is a conserved peptide motif that is thought to contribute to the surface protein's ability to anchor onto the cell wall, thus also potentially enhancing adhesion and transmissibility among these isolates (31)(32)(33). Of note, SasG is an immunodominant protein that aids in the promotion of biofilm formation (29,30). Among isolates from the NY hospital, there were nine prevalent genes (Table S2). In contrast, 54 genes were statistically significant using a Bonferroni correction among the isolates from California (P , 0.05). There were zero exclusive genes among the California isolates (Table S2). The prevalent genes found in the California isolates include but are not limited to three bacitracin resistance proteins (bacA), two transposase IS66 family proteins, and several hypothetical proteins. BacA is a protein that encodes resistance to the antibiotic bacitracin, which results in the inhibition of the cell wall biosynthesis (34,35). In addition to this resistance mechanism prevalent among the California isolates, transposase IS66 family protein is a protein that is required for transposition (36).
(iv) Prevalence of identified virulence factors in MRSA genomes. There is a large body of literature on the virulence factors of S. aureus in general and MRSA specifically (6,16,(37)(38)(39) (Table S3). Genes known for virulence identified among the 388 MRSA isolates include but are not limited to several enterotoxins, fibrinogen binding proteins, staphlyocoagulase, immunoglobulin G-binding protein A, and capsular polysaccharide biosynthesis proteins (6,16,(37)(38)(39)(40)(41). Most of the virulence genes appear to be highly conserved among all the MRSA isolates in this analysis (Fig. S1). However, many of the enterotoxins (i.e., K, Q, E, C, and H) have divergent genes, which are involved in the mechanism of food poisoning (42). The exfoliative protein B gene appears to be divergent as well, which is known to be important in scalded skin syndrome (6,16). Additionally, there was one virulence factor, fibronectin-binding protein A (fnbA), that was statistically more prevalent in low transmitters compared to high transmitters (P , 0.05) ( Table S4). The fnbA encodes a protein known to be involved in biofilm formation (43). Overall, the traditional S. aureus virulence factors, as listed in Table S3 and S4, do not appear to be involved with the transmission surrogate outcomes observed in this study as they are overrepresented in each of the groups examined.

DISCUSSION
The objective of this study was to characterize the genomic content of MRSA clinical/surveillance isolates from a multistate cohort and specifically identify genetic determinants of transmission. There have been no studies to our knowledge that have characterized the genetic determinants of transmission among MRSA isolates. Our average predicted genome size was 2,857,897 with a range of 2,682,854 bp to 3,009,992 bp among 388 isolates, which is similar to other studies (44). Large scale BLAST score ratio (LS-BSR) analysis predicted 8,768 genomic coding regions across the 388 MRSA isolates, which differs from previous studies of smaller isolate samples and predictions of pan-genome size (44,45). Our study allowed the comparison of the genomic content between high and low transmission MRSA isolates, location, and clinical and surveillance isolates. Based on this novel analysis, we identified genes that were exclusive to geographic locations. However, we did not identify genes that were exclusive or more prevalent among high versus low transmission groups, clinical versus surveillance groups, or clinical source of isolation groups after correcting for multiple corrections. This is interesting because it would suggest that the genomic content is not directly related to the transmission surrogate outcomes observed.
From the phylogenetic analysis, we determined there were four major phylogenomic clades within the sequenced MRSA isolates. As expected, we see that these dominant phylogenomic groups represent common S. aureus genomic clades (6, 16) (Fig. 1), which correspond to the USA100 (CC5), USA300 (CC8), USA400 (CC1), and USA600 (CC45) groups that are common in healthcare settings (6,16). Additionally, USA300 was the most prevalent group among the isolates analyzed and this corresponds with the most frequent emergent MRSA strains in a healthcare setting in North America (6,16,46). However, we have also identified 16 isolates that are outside these dominant clades. Additionally, we identified that the spa-types t008 and t002 were the most prevalent spa-types among our isolates, which is similar to the prior literature that identified spa-types t008, t002, and t242 to be the most prevalent in the United States (47). We also found that spa-type was statistically different between geographical locations, with t008 (84%) and t002 (67%) being more prevalent in MD than in California and New York (P , 0.001).
The LS-BSR analysis identified genomic content that is unique or more prevalent among high versus low transmission groups, clinical versus surveillance isolates, as well as geographic location. A study by McClure et al. (44,48) examined genetic content to determine virulence genes in M92, a strain of MRSA found in a nasal sample from one hospital in Calgary, Alberta, Canada. This analysis was based on only one reference strain of MRSA that identified 3,152 coding regions, which was fewer than what we found and may not be representative of our diverse multistate cohort. The differences seen between the two studies may be due to our isolates being from diverse geographical regions and multiple body sites.
When comparing the genomic content among high versus low transmitters using the GWAS approach, we identified zero genes that were more or less prevalent in either of the groups when correcting for multiple observations. Similarly, we were unable to identify any genes that were prevalent or exclusive among isolates based on the source of isolation. However, we observed geographical differences in the presence or absence of certain genes.
Our results suggest that MRSA gene content does not significantly differ among the isolates from the identified transmission surrogate outcomes. However, the transmission of MRSA from the patient to an HCW occurs 16.2% of the time suggesting that activities performed by the HCW may be the main factor in transmission (9). These results combined with patient level and HCW-patient interaction factors are the first step to providing further knowledge on which patients need to be placed on contact precautions and contribute valuable information to clinical care.

MATERIALS AND METHODS
Isolate selection. MRSA isolates were obtained from patient samples collected as a part of a cohort study previously described (9). This cohort contained 403 isolates from 403 patients treated at four hospitals in the following states: Maryland, New York, and California. Swabs were cultured using local hospital laboratory protocol (9). The 403 isolates were cultured from either the following body sites/sources: Mechanisms of MRSA Transmission mSphere anterior nasal samples or the following clinical sample types: wounds, sputum, urine, blood, and other body sites (Table S1). Nasal surveillance was performed for infection control purposes to identify colonized patients. Clinical samples were ordered by physicians based on clinical signs and symptoms. To assess for MRSA transmission, gowns and gloves of HCW who entered the hospital room and provided care to the 403 patients were sampled immediately before doffing to determine if there were MRSA transmission to the gown and/or gloves. Swabs were cultured onto a CHROMagar MRSA and incubated overnight. 10 HCW-patient interactions were observed per patient. Based on the gown and glove contamination data, we defined MRSA isolates as high transmitters if the gloves and gowns of HCW were contaminated with MRSA in more than 50% of HCW-patient interactions. Whereas isolates were classified as low transmitters if there were no identified transmission events (Table 1). Those for which transmission was detected in 1 to 49% of interactions were considered moderate transmitters but were not included in the detailed comparative analysis in this study. A total of 15 isolates were not included in the comparative genomic analysis because upon sequencing and quality control of the genome data, the genome was determined to not be S. aureus, did not assemble adequately based on the N50 sequencing metric, or the total genome length or GC% not being similar to S. aureus. Thus, there were a total of 388 MRSA genomes for comparative analysis in the current study, for which we previously described the sequencing and assembly (49). The study was approved by Institutional Review Boards at the University of Maryland, Baltimore (HP-000066759-16), Weill Cornell Medicine, New York (1610017615), and LaBioMed, California (042087). Genome sequencing. Genomic DNA was isolated from cultures grown in Lysogeny Broth overnight. DNA was extracted in 96-well format from 100 mL of the sample using the MagAttract PowerMicrobiome DNA/RNA kit (Qiagen, Hilden, Germany) automated on a Hamilton Microlab STAR robotic platform. Bead disruption was conducted on a TissueLyser II (20 Hz for 20 min) instrument in a 96-deep well plate in the presence of 200 mL phenol-chloroform. Genomic DNA was eluted in 90 mL water after magnetic bead cleanup. The resulting genomic DNA was quantified by Pico Green. The sequencing libraries were generated with the KAPA HyperPrep kit (catalog number KK8504) and sequenced on the Illumina HiSeq4000 using a 150bpX2 paired-end kit. Raw sequencing reads were filtered to remove contaminating phiX reads using BBDuk of the BBTools software suite (sourceforge.net/projects/bbmap/). The raw reads were also filtered to remove contaminating Illumina adaptor sequences and quality trimmed using Trimmomatic v.0.36 (50). The resulting filtered reads were then assembled using SPAdes v3.13.0 (51). The resulting assemblies were then filtered to contain only contigs longer than 500 bp with a k-mer coverage $5Â. Genomes containing greater than 500 contigs or an aberrant GC percentage were removed from further analysis. A total of 388 genomes were used in the comparative analysis and the relevant genomic details and clinical data are included in Table S1 and were briefly described in Adediran et al. (49).
(ii) Large scale BLAST score ratio (LS-BSR). LS-BSR analysis of the complete genomic content was performed on the 388 isolates as previously described (49,52). BSR values were visualized (Fig. 2) using the ComplexHeatmaps package of R v.4.0.2 (53,54) to compare the presence or absence of genes to associate them with the observed transmission surrogate outcome (Data Set S1-Data Set S3). Hierarchical clustering of the rows and columns of the heat map was performed using default parameters. To determine if genes were present in greater frequency among transmission groups, phylogenetic groups, or clinical groups, we used Fisher's exact test to examine the frequency of these values. A P value less than 0.05 was considered significant for all comparisons. R version 4.02 was used for this analysis (54). A genome-wide association study was conducted among 388 MRSA isolates using Scoary v.1.6.16 (https://github.com/AdmiralenOla/ Scoary) (55). Genes associated with the surrogate outcome of interest at P , 0.05 were considered significant with either Bonferroni or Benjamini-Hochberg correction. A comparison between surrogate outcomes of interest was conducted to determine if there were genes exclusive to one of the surrogate outcomes of interest.
(iv) spa-typing analysis. spa-typing analysis was performed on the 388 MRSA isolates of interest using spaTyper 1.0 (Center for Genomic Epidemiology, Denmark, https://cge.cbs.dtu.dk/services/spatyper/). Genomes were uploaded to the website to identify the spa gene types for each of the individual isolates (26). Results for each isolate are presented in Table S1.
(v) SCCmec typing analysis. We conducted the SCCmec typing method on the 388 isolates of interest to determine the SCCmec element using SCCmec Finder1.2 (Center for Genomic Epidemiology, Denmark, https://cge.cbs.dtu.dk/services/SCCmecFinder) The 388 MRSA isolates were uploaded to determine SCCmec type. SCCmec types were computed based on the alignment and minimum length coverage of the alignment (57-59). MLST for each isolate is presented in Table S1.
Data availability. GenBank accession and short read archive numbers for these data are provided in Table S1.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, TXT file, 16 MB. DATA SET S2, TXT file, 7.9 MB.

ACKNOWLEDGMENTS
This project was funded in part by federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under grant numbers U19AI110820 (DAR), R01AI121146 (ADH), and 3R01AI221146-04S1 (TA).
We declare no conflict of interest.