Format

Send to

Choose Destination
Front Neuroinform. 2015 Mar 16;9:4. doi: 10.3389/fninf.2015.00004. eCollection 2015.

A scalable neuroinformatics data flow for electrophysiological signals using MapReduce.

Author information

1
Division of Medical Informatics, School of Medicine, Case Western Reserve University Cleveland, OH, USA.
2
Division of Medical Informatics, School of Medicine, Case Western Reserve University Cleveland, OH, USA ; Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University Cleveland, OH, USA.
3
Department of Neurology, School of Medicine, Case Western Reserve University Cleveland, OH, USA.
4
Department of Electrical Engineering and Computer Science, School of Engineering, Case Western Reserve University Cleveland, OH, USA.

Abstract

Data-driven neuroscience research is providing new insights in progression of neurological disorders and supporting the development of improved treatment approaches. However, the volume, velocity, and variety of neuroscience data generated from sophisticated recording instruments and acquisition methods have exacerbated the limited scalability of existing neuroinformatics tools. This makes it difficult for neuroscience researchers to effectively leverage the growing multi-modal neuroscience data to advance research in serious neurological disorders, such as epilepsy. We describe the development of the Cloudwave data flow that uses new data partitioning techniques to store and analyze electrophysiological signal in distributed computing infrastructure. The Cloudwave data flow uses MapReduce parallel programming algorithm to implement an integrated signal data processing pipeline that scales with large volume of data generated at high velocity. Using an epilepsy domain ontology together with an epilepsy focused extensible data representation format called Cloudwave Signal Format (CSF), the data flow addresses the challenge of data heterogeneity and is interoperable with existing neuroinformatics data representation formats, such as HDF5. The scalability of the Cloudwave data flow is evaluated using a 30-node cluster installed with the open source Hadoop software stack. The results demonstrate that the Cloudwave data flow can process increasing volume of signal data by leveraging Hadoop Data Nodes to reduce the total data processing time. The Cloudwave data flow is a template for developing highly scalable neuroscience data processing pipelines using MapReduce algorithms to support a variety of user applications.

KEYWORDS:

MapReduce; cloudwave signal format; electrophysiological signal data; epilepsy and seizure ontology; epilepsy research

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center