Whole genome sequence data of Chromobacterium violaceum WCH4, a human pathogenic strain from Sabah, Malaysia

Chromobacterium violaceum is a gram-negative, facultative anaerobic bacillus which is commonly found in soil. It can cause mild diarrhoea upon infection but can progress, although rarely, to fatal multi-organ failure and death. Here we report the whole genome sequence data of Chromobacterium violaceum WCH4 strain, a pathogenic strain that was obtained from a 78 year old male patient suffering from an eye infection. This is a rare case of human infection of the bacteria. Blood culture report as well as 16S rRNA confirmed the presence of C. violaceum WCH4. DNA sequencing using the Illumina HiSeq 4000 system revealed a genome size of 4,637,406 bp with a GC-content of 64.89%. We identified 4,572 protein coding sequences (CDS), 78 transfer RNA (tRNA) genes, and 3 ribosomal RNA (rRNA) genes. The CDS included 1,261 hypothetical proteins and 3,311 proteins with functional assignments. We also identified seven putative genes involved in efflux pump and conferring multidrug antibiotic resistance. The genome data has been deposited at NCBI under the accession number JAFBBB000000000 and consist of full annotated genome and raw sequence data. Our data resource will assist in further downstream analysis and understanding of the mechanism of rare human infection caused by Chromobacterium violaceum WCH4 strain.


a b s t r a c t
Chromobacterium violaceum is a gram-negative, facultative anaerobic bacillus which is commonly found in soil. It can cause mild diarrhoea upon infection but can progress, although rarely, to fatal multi-organ failure and death. Here we report the whole genome sequence data of Chromobacterium violaceum WCH4 strain, a pathogenic strain that was obtained from a 78 year old male patient suffering from an eye infection. This is a rare case of human infection of the bacteria. Blood culture report as well as 16S rRNA confirmed the presence of C. violaceum WCH4. DNA sequencing using the Illumina HiSeq 40 0 0 system revealed a genome size of 4,637,406 bp with a GC-content of 64.89%. We identified 4,572 protein coding sequences (CDS), 78 transfer RNA (tRNA) genes, and 3 ribosomal RNA (rRNA) genes. The CDS included 1,261 hypothetical proteins and 3,311 proteins with functional assignments. We also identified seven putative genes involved in efflux pump and conferring multidrug antibiotic resistance. The genome data has been deposited at NCBI under the accession number JAFBBB0 0 0 0 0 0 0 0 0 and consist of full annotated genome and raw sequence data. Our data resource will assist in further downstream analysis and understanding of the mechanism of rare human infection caused by This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAFBBB0 0 0 0 0 0 0 0 0 ( www.ncbi.nlm.nih.gov/nuccore/ JAFBBB0 0 0 0 0 0 0 0 0). The version described in this paper is version JAFBBB010 0 0 0 0 0 0. All the details about the genome sequencing data are available on NCBI under BioProject accession number PRJNA698279 and can be accessed using the following link ( https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA698279).

Value of the Data
• We present here the genome sequence data of Chromobacterium violaceum WCH4, which is a strain exhibiting rare human infection. • The genome sequence data will be useful for medical researchers to perform comparative genomic studies of clinically and non-clinically strains of the bacteria. • Genome data can be used to identify antibiotics resistance genes and to perform downstream cluster analyses to identify placement on the phylogenetic tree. • We are concerned of recent cases of C. violaceum fatalities in Sabah. This work highlights that accurate bacterial identification and prompt treatment is essential to prevent serious consequences.

Data Description
Chromobacterium violaceum is a gram-negative, facultative anaerobic bacillus which is commonly found in soil and can cause mild diarrhoea upon infection but can progress, although rarely, to fatal multi-organ failure and death with some strains being antibiotic resistant [1][2][3][4] .
A 78 year old male patient was admitted to a public hospital in Kota Kinabalu, Sabah, for severe fever, diarrhoea with an eye infection. We isolated the pathogen from the intravitreal tap and confirmed its presence through 16S rRNA Sanger sequencing (Sequence given in Supplementary data-S1). The data was analysed using Blastn ( https://blast.ncbi.nlm.nih.gov/Blast.cgi ) with  NCBI database and resulted with a positive hit to Chromobacterium violaceum with 99% query cover, 99.64% identity and 0.0 E value for the top 10 hits (Blast results are given in Supplementary data-S1). Subsequently, we performed whole genome sequencing of the pathogen. Here, we present the data on the whole genome sequence of the C. violaceum strain WCH4, which provides an initial glimpse of its pathogenicity. The data has been deposited to GenBank and can be viewed at www.ncbi.nlm.nih.gov/nuccore/ JAFBBB0 0 0 0 0 0 0 0 0 under the accession JAF-BBB0 0 0 0 0 0 0 0 0 (Additional details are in the Specification Table).
In brief, a total of 5,061,114 paired reads of a 2 × 150-bp insert-size library using the NEBnext Ultra kit (New England Biolabs, NEB #E7645) were generated from the Illumina HiSeq 40 0 0. These were then assembled into 37 contigs with a genome size of 4,637,406 bp at 325x coverage. The average G + C content was 64.89% and the N50 length, which is defined as the shortest sequence length at 50% of the genome was 434,055 bp ( Table 1 ).
This genome has 4,572 protein coding sequences (CDS), 78 transfer RNA (tRNA) genes, and 3 ribosomal RNA (rRNA) genes. The annotated genome contained 1,261 hypothetical proteins and 3,311 proteins with functional assignments ( Table 2 ). Furthermore, annotated data on functional assignments included 994 proteins with Enzyme Commission (EC) numbers [5] , 828 with Gene Ontology (GO) assignments [6] , and 730 proteins that were mapped to KEGG pathways [7] . In addition, the PATRIC annotation includes two types of protein families [8] , i.e. 4,272 proteins that belong to the genus-specific protein families (PLFams), and 4,318 proteins that belong to the cross-genus protein families (PGFams). A pictorial display of the genomic features of Chromobacterium violaceum WCH4 is given in the circular map of the genome in Fig. 1 .
In addition, a subsystem analysis with the annotated genome was performed to determine the set of proteins that are part of a specific biological process or structural complex [9] . The RAST server-based annotation (using PATRIC) of the C. violaceum WCH4 genome resulted in a total of 284 subsystems representing 1,945 genes ( Fig. 2 ). The distribution of the genes based on the subsystem category indicated that the top six highest genes were assigned to metabolism (746 genes), followed by energy production (264 genes), protein processing (230 genes), cellular processors (189 genes), stress response, defence and virulence (152 genes) and membrane transport (142 genes). The colours of the CDS can be mapped back to the subsystem categories given in Fig. 2 .   Fig. 2. An overview of the subsystem categories assigned to the genes predicted in the genome of Chromobacterium violaceum WCH4. The whole-genome sequence of the strain WCH4 was annotated using the RAST server.

Table 3
Data on antimicrobial resistance (AMR) mechanism and genes identified from the Chromobacterium violaceum WCH4 genome.

GdpD, PgsA
Regulator modulating expression of antibiotic resistance genes

EmrAB-TolC, OxyR
Furthermore, we provide a dataset on seven antimicrobial mechanisms and its corresponding 35 antimicrobial resistance (AMR) genes that was identified from the annotated genome ( Table 3 ). These include antibiotic inactivation enzymes (2 genes), Antibiotic targets (20 genes), antibiotic target protection protein (1 gene), efflux antibiotic resistance (7 genes), genes conferring resistance via absence (1 gene), protein altering cell wall charge conferring antibiotic resistance (2 genes) and regulator modulating expression of antibiotic resistance genes (2 genes). The details of the RAST-based annotation are given in the supplementary data (S2). The genomic data reported here will pave the way for further study of the mechanism of pathogenicity of Chromobacterium violaceum WCH4.

Bacterial strain isolation and identification
A specimen was obtained from an intravitreal tap of the patient. Initial blood culture report showed the growth of C. violaceum on MacConkey and blood agar, with its characteristic violet pigment. In addition, DNA sequencing of the 16S rRNA region on the ABI 3130 Genome Analyzer (Applied Biosystem, USA) confirmed the presence of C. violaceum strain WCH4. For whole genome sequencing, DNA was then isolated from pure bacterial culture using the conventional Phenol-Chloroform protocol [10] .

Genome sequencing, assembly and annotation
The genomic DNA was converted into sequencing-ready library using the NEBnext Ultra kit (New England Biolabs, NEB #E7645). The library was then sequenced on the Illumina Hiseq 40 0 0 (2 × 150-bp paired-end reads). We obtained approximately 5 million reads with a total of 1.5 Gb. The quality of the raw reads was analyzed by FastQC v0.11.9 software [11] . The sequences were then analysed at the Pathosystems Resource Integration Center (PATRIC) web server ( https://www.patricbrc.org ). The reads were was assembled using Unicycler v0.4.8 [12] , an assembly pipeline for bacterial genomes at PATRIC. Filtering and polishing of the assembly was done using Pilon version 1.23 [13] . The genome was annotated using RAST tool kit (RASTtk) [14] through the PATRIC web server [15] . For functional assignments of proteins, we mapped proteins to Enzyme Commission (EC) [5] , Gene Ontology (GO) [6] , and KEGG pathways [7] . PATRIC annotation was used to assign genus-specific protein families (PLFams), cross-genus protein families (PGFams) [8] and subsystems [9] . In addition, a genome circular map was created using the 'circular viewer' functionality implemented in the PATRIC web server [15] . Classification of antimicrobial resistance (AMR) mechanism and genes was according to k-mer-based detection method, which utilizes PATRIC's curated collection of representative AMR gene sequence variants [15] .

Ethics Statement
This study was registered with the National Medical Research Register, Ministry of Health Malaysia (NMRR ID: No. 19-48-45702).

CRediT Author Statement
Vijay Kumar Subbiah: Conceived and designed the experiments, Wet lab experiment,Data analysis and interpretation, Manuscript preparation, Contributed reagents/materials/analysis tools; Zulina Mazlan: Conceived and designed the experiments, Wet lab experiment, Manuscript preparation, Contributed reagents/materials/analysis tools; Nur Nashyiroh Mastor: Wet lab experiment, Data analysis and interpretation, Manuscript preparation, Contributed reagents/materials/analysis tools; Mohammad Zahirul Hoque: Conceived and designed the experiments, Data analysis and interpretation, Manuscript preparation. All authors read and approved the final manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.