Speaker: Elo Leung, Ph.D. George Mason University Thursday, January 17, 2008 11AM B2 NCBI Library Title A Text-Guided Clustering Approach to Capturing Spatiotemporal Patterns of Gene Expression During Drosophila melanogaster Embryogenesis Abstract The purpose of this study was to capture the progression of spatiotemporal expression in transcripts of genes during Drosophila melanogaster embryogenesis and provide an external measure of reliability of the predictions. Microarray data provided measurements of each gene, but this data type is known to be difficult to accurately assess: depending on how measurements are calibrated and noise cut-offs are assigned there will inevitably be a large number of false positives or false negatives. If the investigator intends to apply the results to a diagnostic test then false negatives will be a greater problem, and if a full understanding of the participating gene networks is desired then false positives will obscure the picture. To neutralize these problems the common response is to submit the samples to follow-up wet lab assays. This is not an avenue open to the computational scientist who has access only to the data. Thus some other method of validating the importance of genes selected in the microarray data analysis is needed. Since there exists a large body of literature examining genetics, molecular biology and biochemistry, stretching over decades for some organisms, this is clearly a resource available to a computational scientist. This experiment thus took as its method the direct extraction of information from text and structural annotation to compare against predictions from gene expression data. Results suggest that for a limited number of genes the spatiotemporal expression can be derived from text, in the absence of standard anatomical ontology, that predicts the exact developmental stage assayed, when information is mined across the on-line Drosophila literature. That is, the spatial or temporal annotations that we were able to derive from text matched the microarray gene expression measurements at the corresponding time-points during embryogenesis. The number of genes studied in the literature is far less than the number assayed in microarrays, but such accuracy of cross-referencing between text and microarray data was demonstrated to be useful for further inference of spatiodynamically-related genes by clustering the temporal expression patterns of genes during development of an anatomical structure.