PlasClass improves plasmid sequence classification

PLoS Comput Biol. 2020 Apr 3;16(4):e1007781. doi: 10.1371/journal.pcbi.1007781. eCollection 2020 Apr.

Abstract

Many bacteria contain plasmids, but separating between contigs that originate on the plasmid and those that are part of the bacterial genome can be difficult. This is especially true in metagenomic assembly, which yields many contigs of unknown origin. Existing tools for classifying sequences of plasmid origin give less reliable results for shorter sequences, are trained using a fraction of the known plasmids, and can be difficult to use in practice. We present PlasClass, a new plasmid classifier. It uses a set of standard classifiers trained on the most current set of known plasmid sequences for different sequence lengths. We tested PlasClass sequence classification on held-out data and simulations, as well as publicly available bacterial isolates and plasmidome samples and plasmids assembled from metagenomic samples. PlasClass outperforms the state-of-the-art plasmid classification tool on shorter sequences, which constitute the majority of assembly contigs, allowing it to achieve higher F1 scores in classifying sequences from a wide range of datasets. PlasClass also uses significantly less time and memory. PlasClass can be used to easily classify plasmid and bacterial genome sequences in metagenomic or isolate assemblies. It is available under the MIT license from: https://github.com/Shamir-Lab/PlasClass.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods
  • DNA* / classification
  • DNA* / genetics
  • DNA, Bacterial / classification
  • DNA, Bacterial / genetics
  • Genome, Bacterial / genetics
  • Plasmids* / classification
  • Plasmids* / genetics
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • DNA, Bacterial
  • DNA

Grants and funding

DP is supported in part by an Edmond J. Safra PhD Fellowship (https://safrabio.cs.tau.ac.il/), and in part by an Israel Ministry of Immigrant Absorption PhD fellowship (https://www.gov.il/en/departments/general/research_students_scholarship). RS is supported in part by grants from the Israel Science Foundation (ISF - https://www.isf.org.il/#/) grant 1339/18, the US - Israel Binational Science Foundation (BSF - https://www.bsf.org.il/), and the US National Science Foundation (NSF - https://www.nsf.gov/) grant 2016694. IM is supported in part by ISF grant 1947/19 (ISF - https://www.isf.org.il/#/) and ERC Horizon 2020 research and innovation program grant 640384 (https://ec.europa.eu/programmes/horizon2020/en). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.