From: Lu, Zhiyong (NIH/NLM/NCBI) [E] Sent: Monday, March 10, 2008 9:12 AM To: NLM/NCBI List ncbi-seminar Subject: NCBI seminar 10am Thursday March 13 Time: 10am Thursday March 13 Location: 8th Floor Conference Room Speaker: Adam Lee (Host: Zhiyong Lu) Title: A Framework for Discovering Associations from the Annotated Biological Web Abstract: During the last decade, biomedical researchers gained access to the entire human genome, reliable high-throughput biotechnologies, and affordable computational resources and network access. In combination, these new tools created a new model for biomedical research that no longer uses computational tools merely to monitor research, but instead exploits these tools to acquire knowledge and make discoveries. We have developed a tool to discover meaningful patterns across resources and ontologies. These patterns, corresponding to associations of pairs of CV (controlled vocabulary) terms, may yield actionable nuggets of previously unknown knowledge. Moreover, the bridge of associations across CV terms will reflect the practice of how scientists annotate data. We execute a protocol to follow hyperlinks, extract annotations, and generate background LSLink (Life Sciences Link) datasets of term-links. We then mine the term-links to find potentially meaningful associations. We use two classes of metrics to identify significant associations of pairs of CV terms. The first class is based on the LOD (logarithm of the odds) ratio and is a measure of the extent to which a specific association of CV terms deviates from one resulting from chance alone (a random association). The second class of metric is based on the hypergeometric distribution; it gives a quantification of the level of one's surprise at finding over-representation for a particular pair of CV terms in a user dataset, in comparison to the background dataset. This is the ongoing research with Louiqa Raschid and Padmini Srinivasan. About the speaker: Woei-jyh (Adam) Lee received his BS degree from the National Taiwan University in 1993 and his MS degree from the New York University in 1998. He worked on distributed objects and fault tolerance at the AT&T Labs - Research in 1997. He investigated network software and management at the Bell Laboratories Research from 1998 till 2000. He visited the Integrated Media Systems Center at the University of Southern California specializing in continuous media streaming and multimedia networking from 2002 to 2003. He also contributed in protein domain parsing and boundary prediction at the National Cancer Institute in 2004. He is currently pursuing a PhD degree in the Center for Bioinformatics and Computational Biology and the Department of Computer Science in the University of Maryland at College Park. His research interests include bioinformatics and computational biology, data management and information integration, systems simulation and performance evaluation. He is a member of the ACM, the IEEE, the ISCB and the ISENG.