PubMed Query Log Analysis for Improving User Access
Introduction
Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI.
Goals and Objectives
Our overall goal is to improve user access to various molecular biology data including the biomedical literature in NCBIs online Entrez databases. Through the analysis of query logs, the first goal of this research is to better understand Entrez users including their information needs and search strategies. Furthermore, we aim to develop and evaluate search aids for assisting users to formulate queries and explore search results, including the presentation of links between different Entrez databases.
Team Members
Research Highlights
- PubMed Query Log Analysis:
One important resource for understanding and characterizing patrons of search engines is the transaction logs. Our investigation of user interactions through one month of PubMed logs focused on analyzing user needs, different aspects of queries (e.g. length), and user search habits [Dogan et al., Database 2009]. Not only can log analysis help PubMed search as a whole, it can play an important role in developing tools for improving the links between different Entrez databases. Through our analysis of PubMed logs, we learn that people search certain biomedical concepts more often than others and that there exist strong associations between different concepts [Neveol et al., JBI 2011].
- PubMed Query Suggestions:
Based on the PubMed query log analysis, we discovered that it is common for PubMed users to repeatedly modify their queries
(search terms) before retrieving documents relevant to their information needs. Hence, we developed an automatic search aid in
query formulation, namely Related Queries (RQ), which focuses on finding popular queries that contain the initial user search
term with a goal of helping users describe their information needs in a more precise manner. This work has been integrated into
PubMed since January 2009. Automatic assessment using clickthrough data show that each day, the new feature is used consistently
between 6% and 10% of the time when it is shown, suggesting that it has quickly become a popular new feature in PubMed [Lu et al., AMIA 2009; Lu and Wilbur JBI 2008].
- Related Journals:
With the explosion of biomedical literature and the evolution of online and open access, scientists are reading more articles from a wider variety of journals. Thus, the list of core journals relevant to their research may be less obvious and may often change over time. To help researchers quickly identify appropriate journals to read and publish in, we developed a web application (http://www.ncbi.nlm.nih.gov/IRET/Journals) for finding related journals based on the analysis of PubMed log data [Lu et al., Bioinformatics, 2009].
Selected Publications
- Neveol et al.,
Semi-automatic semantic annotation of PubMed Queries: a study on quality, efficiency, satisfaction,
J Biomed Inform 2011.
Free Access
- Dogan et al.,
Understanding PubMed user search behavior through log analysis,
Database, 2009.
Free Access
- Lu et al.,
Identifying related journals through log analysis,
Bioinformatics, 2009.
Free Access
- Lu et al.,
Finding query suggestions for PubMed, AMIA, 2009.
Free Access
- Lu & Wilbur,
Improving accuracy for identifying related PubMed queries by an integrated approach,
J Biomed Inform, 2008.
Free Access