Skip to main page content Skip to main page content

Data

Description
NCBI disease corpus is a collection of 793 PubMed abstracts fully annotated at both mention and concept levels.
Description
BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.
Description
tmVar Corpus contains 500 PubMed articles manually annotated with mutation mentions of various kinds.
Description
The weakly-labeled corpus used in (Peng et al., 2016) consists of 18,410 abstracts and 33,224 CID relations. The raw data was extracted from curated data in the CTD-Pfizer collaboration with document-level annotations of drug-disease and drug-phenotype interactions. We applied tmChem and DNorm to recognize and normalize chemical and disease mentions, respectively. To maximize recall, we also applied a dictionary look-up method with a controlled vocabulary (MeSH). Finally, we filtered those without CID relations in the title/abstracts as some asserted relations are only in the full text.
Description
The dataset contains a collection of 705,915 PubMed Phrases (Kim et al., 2018) that are beneficial for information retrieval and human comprehension.