Format

Send to

Choose Destination
See comment in PubMed Commons below
BMC Bioinformatics. 2017 Mar 2;18(1):142. doi: 10.1186/s12859-017-1559-2.

Reactome pathway analysis: a high-performance in-memory approach.

Author information

1
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK.
2
Open Targets, Wellcome Genome Campus, Hinxton, UK.
3
Fundación Investigación INCLIVA, Universitat de València, Valencia, Spain.
4
Instituto de Medicina Genomica, Valencia, Spain.
5
Escuela Técnica Superior de Ingenierías, Universitat de València, Valencia, Spain.
6
Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain.
7
NYU Langone Medical Center, New York, USA.
8
Ontario Institute for Cancer Research, Toronto, Canada.
9
Department of Molecular Genetics, University of Toronto, Toronto, Canada.
10
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK. hhe@ebi.ac.uk.
11
State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine; National Center for Protein Sciences, 102206, Beijing, China. hhe@ebi.ac.uk.

Abstract

BACKGROUND:

Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples.

RESULTS:

Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user's sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree.

CONCLUSION:

Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub ( https://github.com/reactome/ ).

KEYWORDS:

Data structures; Over-representation analysis; Pathway analysis

PMID:
28249561
PMCID:
PMC5333408
DOI:
10.1186/s12859-017-1559-2
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for BioMed Central Icon for PubMed Central
    Loading ...
    Support Center