Stopwords for biomedical literature: [DOWNLOAD]
|
- PubMed Phrases are coherent text segments that are beneficial for information retrieval and human comprehension.
- PubMed Phrases, an open set of coherent phrases for searching biomedical literature, S. Kim, L. Yeganova, D. C. Comeau, W. J. Wilbur and Z. Lu, Scientific Data, 5, 180104, 2018.
|
Word embeddings for PubMed: [DOWNLOAD]
- Word vectors (in word2vec binary format) trained on all PubMed abstracts (Mar. 2016).
- Bridging the gap: incorporating a semantic similarity measure for effectively mapping PubMed queries to documents, S. Kim, N. Fiorini, W. J. Wilbur and Z. Lu, Journal of Biomedical Informatics, 75, pp. 122-127, 2017 (original version in arXiv).
|
NCBITextLib: [LINK]
- Software library for building a large-scale data infrastructure for text mining.
|
Meshable dataset (MeSH terms): [DOWNLOAD]
Meshable dataset (MeSH term/subheading combinations): [DOWNLOAD]
- MeSH terms (sorted by document set size) and their top 100 topic terms with alpha scores. The PubMed set used to obtain topic terms was collected in Oct. 2015.
- Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms, S. Kim, L. Yeganova and W. J. Wilbur, Bioinformatics, 32(19), pp. 3044-3046, 2016.
|
Cystic fibrosis, Deafness, DiGeorge syndrome, Autism and Hypertrophic cardiomyopathy datasets: [LINK]
- PubMed IDs used for evaluating the PAV-EM thematic clustering algorithm.
- Summarizing topical contents from PubMed documents using a thematic analysis, S. Kim, L. Yeganova, and W. J. Wilbur, Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 805-810, 2015.
|
Feature generation tool for DDIExtraction corpora: [LINK]
- The tool creates a feature set for the DDIExtraction 2013 corpus.
- Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach, S. Kim, H. Liu, L. Yeganova, and W. J. Wilbur, Journal of Biomedical Informatics, 55, pp. 23-30, 2015.
|
Disease, CellLine and Reptiles datasets: [DOWNLOAD]
- Gene names annotated for PubMed documents.
- Classifying gene sentences in biomedical literature by combining high-precision gene identifiers, S. Kim, W. Kim, D. Comeau, and W. J. Wilbur, NAACL 2012 Workshop on Biomedical Natural Language Processing (BioNLP), pp. 185-192, 2012.
|