Automatic Analysis and Annotation of Document Keywords in Biomedical Literature


As a document retrieval system, PubMed aims at providing efficient access to millions of scientific documents. For this purpose, it relies on matching keywords and semantic representations of PubMed documents to user queries. One type of semantic representation used in MEDLINE citations is known as Medical Subject Heading (MeSH) indexing terms, which are assigned by professional human indexers at the National Library of Medicine. Alternatively, author keywords, provided by authors when submitting an article, capture the essence of the topic of a document from the authors perspective. Last but not least, readers have their own opinions about what words are of importance to an article, which may or may not agree with either MeSH terms or author keywords of the same article.

Goals and Objectives

Our overall goal is to develop automated techniques to analyze and annotate various important document keywords in biomedical literature using different perspectives from curators, authors, and readers in the context of document indexing and retrieval. Furthermore, it is our goal to develop machine learning approaches for automated prediction of such important document keywords.

Team Members

Research Highlights

Selected Publications