Send to

Choose Destination
Bioinformatics. 2020 Mar 17. pii: btaa191. doi: 10.1093/bioinformatics/btaa191. [Epub ahead of print]

A Blind and Independent Benchmark Study for Detecting Differentially Methylated Regions in Plants.

Author information

Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
Centre for Integrative Biological Signalling Studies (CIBSS), University of Freiburg, Freiburg, Germany.
Plant Cell Biology, Faculty of Biology, University of Marburg, Marburg, Germany.
Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria.
Centre for Biological Signaling Studies (BIOSS), University of Freiburg, Freiburg, Germany.



Bisulfite sequencing (BS-seq) is a state-of-the-art technique for investigating methylation of the DNA to gain insights into the epigenetic regulation. Several algorithms have been published for identification of differentially methylated regions (DMRs). However, the performances of the individual methods remain unclear and it is difficult to optimally select an algorithm in application settings.


We analyzed BS-seq data from four plants covering three taxonomic groups. We first characterized the data using multiple summary statistics describing methylation levels, coverage, noise, as well as frequencies, magnitudes and lengths of methylated regions. Then, simulated data sets with most similar characteristics to real experimental data were created. Seven different algorithms (metilene, methylKit, MOABS, DMRcate, Defiant, BSmooth, MethylSig) for DMR identification were applied and their performances were assessed. A blind and independent study design was chosen to reduce bias and to derive practical method selection guidelines. Overall, metilene had superior performance in most settings. Data attributes such as coverage and spread of the DMR lengths were found to be useful for selecting the best method for DMR detection. A decision tree to select the optimal approach based on these data attributes is provided. The presented procedure might serve as a general strategy for deriving algorithm selection rules tailored to demands in specific application settings.


Scripts that were used for the analyses and that can be used for prediction of the optimal algorithm are provided at Simulated and experimental data are available at Supplementary Information is available at Bioinformatics online.

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center