From: Lin, Jimmy (NIH/NLM/NCBI) [E] Sent: Tuesday, December 09, 2008 3:57 PM To: NLM/NCBI List ncbi-seminar Cc: jimmylin@umd.edu Subject: Seminar: Tues 12/6,2pm: Is searching full text more effective than searching abstracts?(Jimmy Lin) -------------------- SEMINAR ANNOUNCEMENT -------------------- Is searching full text more effective than searching abstracts? Jimmy Lin (University of Maryland/NCBI) When: Tuesday, December 16, 2008, 2pm Where: 38A B2 NCBI Library With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: Is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on abstracts and full text using a collection of 162k articles from the TREC 2007 genomics track evaluation. In this talk, I focus on two aspects of full-text retrieval: effectiveness and efficiency. Experiments illustrate the value of full text, particularly if retrieval is performed at the level of paragraphs. These results suggest that effective retrieval algorithms should perform finer-grained analysis of full-text articles. The benefits of full-text come at considerable cost in terms of the increased volume of data that need to be processed. To this end, Google's MapReduce programming model provides a convenient framework for taking advantage of multiple machines in a cluster. I'll discuss experiments that focus on the practicality of scaling to ever-increasing datasets.