Format

Send to

Choose Destination
Dis Model Mech. 2018 Dec 13;11(12). pii: dmm034546. doi: 10.1242/dmm.034546.

Identifying mouse developmental essential genes using machine learning.

Author information

1
Division of Evolution and Genomic Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PT, UK.
2
Department of Agriculture, Food and Environmental Sciences, Marche Polytechnic University, Ancona 60121, Italy.
3
Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK andrew.doig@manchester.ac.uk Kathryn.Hentges@manchester.ac.uk.
4
Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PT, UK.
5
Division of Evolution and Genomic Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PT, UK andrew.doig@manchester.ac.uk Kathryn.Hentges@manchester.ac.uk.

Abstract

The genes that are required for organismal survival are annotated as 'essential genes'. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets.

KEYWORDS:

Essential genes; Essentiality database; Mouse knockout; Supervised machine learning

Conflict of interest statement

Competing interestsThe authors declare no competing or financial interests.

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center