Send to

Choose Destination
Bioinformatics. 2019 Jun 1;35(12):1987-1991. doi: 10.1093/bioinformatics/bty938.

Pan-genomic analysis provides novel insights into the association of E.coli with human host and its minimal genome.

Yang ZK1,2,3,4, Luo H1,2,3, Zhang Y4, Wang B4, Gao F1,2,3.

Author information

Department of Physics, School of Science.
Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China.
SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin, China.
SinoGenoMax Co., Ltd./Chinese National Human Genome Center, Beijing, China.



Bacteria can usually acquire certain advantageous genes that enable the bacteria to adapt to rapidly changing niches, thereby leading to a wide range of intraspecific genome content and genetic redundancy. The minimal genome of Escherichia coli, which is the most important bacterial species, and the association between E.coli and its human host are worthy of further exploration.


We used gene prediction and phylogenetic analysis to reveal a rich phylogenetic diversity among 491 E.coli strains and to reveal substantial differences between these strains with respect to gene number and genome length. We used pan-genomic analysis to accurately identify 867 core genes, in which only 243 genes are shared by essential genes. This analysis revealed that core genes mainly provide essential functions to the basic lifestyle of E.coli, and accessory genes are likely to confer selective advantages such as niche adaptation or the ability to colonize specific hosts. By association analysis, we found that E.coli strains in non-human hosts may more easily utilize foreign genetic materials to adapt to their surroundings, but the population in human hosts has higher demands for the control of population density, indicating that highly accurate quorum-sensing behavior is very important for harmony between E.coli and its human host. By considering core genes and previous deletions together, we proposed a potential direction for further reduction of the E.coli genome.


The data, analysis process and detailed information on software tools used in this study are all available in the supplementary material.


Supplementary data are available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center