Format

Send to

Choose Destination
Bioinformatics. 2013 Sep 1;29(17):2186-94. doi: 10.1093/bioinformatics/btt359. Epub 2013 Jul 4.

Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature.

Author information

1
Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, USA. rxx@case.edu

Abstract

MOTIVATION:

Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease-phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease-manifestation (D-M) pairs (one specific type of disease-phenotype relationship) from the wide body of published biomedical literature.

DATA AND METHODS:

Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M-specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing.

RESULTS:

In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments.

CONCLUSIONS:

The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing.

AVAILABILITY:

http://nlp.case.edu/public/data/DMPatternUMLS/

PMID:
23828786
PMCID:
PMC4068009
DOI:
10.1093/bioinformatics/btt359
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center