Split k-mer analysis compared to cgMLST and SNP-based core genome analysis for detecting transmission of vancomycin-resistant enterococci: results from routine outbreak analyses across different hospitals and hospitals networks in Berlin, Germany

Microb Genom. 2023 Jan;9(1):mgen000937. doi: 10.1099/mgen.0.000937.

Abstract

The increase of Vancomycin-resistant Enterococcus faecium (VREfm) in recent years has been partially attributed to the rise of specific clonal lineages, which have been identified throughout Germany. To date, there is no gold standard for the interpretation of genomic data for outbreak analyses. New genomic approaches such as split k-mer analysis (SKA) could support cluster attribution for routine outbreak investigation. The aim of this project was to investigate frequent clonal lineages of VREfm identified during suspected outbreaks across different hospitals, and to compare genomic approaches including SKA in routine outbreak investigation. We used routine outbreak laboratory data from seven hospitals and three different hospital networks in Berlin, Germany. Short-read libraries were sequenced on the Illumina MiSeq system. We determined clusters using the published Enterococcus faecium-cgMLST scheme (threshold ≤20 alleles), and assigned sequence and complex types (ST, CT), using the Ridom SeqSphere+ software. For each cluster as determined by cgMLST, we used pairwise core-genome SNP-analysis and SKA at thresholds of ten and seven SNPs, respectively, to further distinguish cgMLST clusters. In order to investigate clinical relevance, we analysed to what extent epidemiological linkage backed the clusters determined with different genomic approaches. Between 2014 and 2021, we sequenced 693 VREfm strains, and 644 (93 %) were associated within cgMLST clusters. More than 74 % (n=475) of the strains belonged to the six largest cgMLST clusters, comprising ST117, ST78 and ST80. All six clusters were detected across several years and hospitals without apparent epidemiological links. Core SNP analysis identified 44 clusters with a median cluster size of three isolates (IQR 2-7, min-max 2-63), as well as 197 singletons (41.4 % of 475 isolates). SKA identified 67 clusters with a median cluster size of two isolates (IQR 2-4, min-max 2-19), and 261 singletons (54.9 % of 475 isolates). Of the isolate pairs attributed to clusters, 7 % (n=3064/45 596) of pairs in clusters determined by standard cgMLST, 15 % (n=1222/8500) of pairs in core SNP-clusters and 51 % (n=942/1880) of pairs in SKA-clusters showed epidemiological linkage. The proportion of epidemiological linkage differed between sequence types. For VREfm, the discriminative ability of the widely used cgMLST based approach at ≤20 alleles difference was insufficient to rule out hospital outbreaks without further analytical methods. Cluster assignment guided by core genome SNP analysis and the reference free SKA was more discriminative and correlated better with obvious epidemiological linkage, at least recently published thresholds (ten and seven SNPs, respectively) and for frequent STs. Besides higher overall discriminative power, the whole-genome approach implemented in SKA is also easier and faster to conduct and requires less computational resources.

Keywords: SNP; Vancomycin-resistant Enterococcus faecium; cgMLST; outbreak; split k-mer.

MeSH terms

  • Berlin / epidemiology
  • Disease Outbreaks
  • Genome, Bacterial
  • Germany / epidemiology
  • Gram-Positive Bacterial Infections* / epidemiology
  • Hospitals
  • Humans
  • Polymorphism, Single Nucleotide
  • Vancomycin-Resistant Enterococci* / genetics