Send to

Choose Destination
PeerJ. 2019 Mar 7;7:e6562. doi: 10.7717/peerj.6562. eCollection 2019.

Lightweight data management with dtool.

Author information

Computational Systems Biology, John Innes Centre, Norwich, UK, United Kingdom.
Contributed equally


The explosion in volumes and types of data has led to substantial challenges in data management. These challenges are often faced by front-line researchers who are already dealing with rapidly changing technologies and have limited time to devote to data management. There are good high-level guidelines for managing and processing scientific data. However, there is a lack of simple, practical tools to implement these guidelines. This is particularly problematic in a highly distributed research environment where needs differ substantially from group to group and centralised solutions are difficult to implement and storage technologies change rapidly. To meet these challenges we have developed dtool, a command line tool for managing data. The tool packages data and metadata into a unified whole, which we call a dataset. The dataset provides consistency checking and the ability to access metadata for both the whole dataset and individual files. The tool can store these datasets on several different storage systems, including a traditional file system, object store (S3 and Azure) and iRODS. It includes an application programming interface that can be used to incorporate it into existing pipelines and workflows. The tool has provided substantial process, cost, and peace-of-mind benefits to our data management practices and we want to share these benefits. The tool is open source and available freely online at


Bioinformatics; Data management; Data processing; Reproducibility

Conflict of interest statement

The authors declare there are no competing interests.

Supplemental Content

Full text links

Icon for PeerJ, Inc. Icon for PubMed Central
Loading ...
Support Center