Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments

BMC Genomics. 2016 Nov 16;17(1):925. doi: 10.1186/s12864-016-3250-9.

Abstract

Background: Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range.

Methods: To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment.

Result: We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions.

Conclusion: Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.

Keywords: Association rule mining; Detecting hot spots; Host range of influenza.

MeSH terms

  • Algorithms
  • Animals
  • Birds
  • Genome, Viral*
  • Hemagglutinin Glycoproteins, Influenza Virus / genetics
  • Hemagglutinin Glycoproteins, Influenza Virus / metabolism
  • Host Specificity
  • Humans
  • Influenza A virus / genetics*
  • Influenza A virus / isolation & purification
  • Influenza in Birds / genetics
  • Influenza in Birds / pathology
  • Influenza in Birds / transmission
  • Orthomyxoviridae Infections / pathology
  • Orthomyxoviridae Infections / transmission*
  • Orthomyxoviridae Infections / virology
  • Swine
  • Viral Proteins / genetics
  • Viral Proteins / metabolism
  • Virus Internalization
  • Zoonoses / transmission

Substances

  • Hemagglutinin Glycoproteins, Influenza Virus
  • Viral Proteins