Format

Send to

Choose Destination
Database (Oxford). 2014 Sep 10;2014. pii: bau089. doi: 10.1093/database/bau089. Print 2014.

A semi-automated methodology for finding lipid-related GO terms.

Author information

1
Department of Computer Science, National University of Singapore, Singapore 117417, NUS Graduate School of Integrative Science and Engineering, National University of Singapore, Singapore 117456, National Research Foundation, Singapore 138602, Department of Biochemistry, National University of Singapore, Singapore 117599 and Department of Pathology, National University of Singapore, Singapore 119074 Department of Computer Science, National University of Singapore, Singapore 117417, NUS Graduate School of Integrative Science and Engineering, National University of Singapore, Singapore 117456, National Research Foundation, Singapore 138602, Department of Biochemistry, National University of Singapore, Singapore 117599 and Department of Pathology, National University of Singapore, Singapore 119074.
2
Department of Computer Science, National University of Singapore, Singapore 117417, NUS Graduate School of Integrative Science and Engineering, National University of Singapore, Singapore 117456, National Research Foundation, Singapore 138602, Department of Biochemistry, National University of Singapore, Singapore 117599 and Department of Pathology, National University of Singapore, Singapore 119074.
3
Department of Computer Science, National University of Singapore, Singapore 117417, NUS Graduate School of Integrative Science and Engineering, National University of Singapore, Singapore 117456, National Research Foundation, Singapore 138602, Department of Biochemistry, National University of Singapore, Singapore 117599 and Department of Pathology, National University of Singapore, Singapore 119074 Department of Computer Science, National University of Singapore, Singapore 117417, NUS Graduate School of Integrative Science and Engineering, National University of Singapore, Singapore 117456, National Research Foundation, Singapore 138602, Department of Biochemistry, National University of Singapore, Singapore 117599 and Department of Pathology, National University of Singapore, Singapore 119074 wongls@comp.nus.edu.sg.

Abstract

MOTIVATION:

Although semantic similarity in Gene Ontology (GO) and other approaches may be used to find similar GO terms, there is yet a method to systematically find a class of GO terms sharing a common property with high accuracy (e.g., involving human curation).

RESULTS:

We have developed a methodology to address this issue and applied it to identify lipid-related GO terms, owing to the important and varied roles of lipids in many biological processes. Our methodology finds lipid-related GO terms in a semi-automated manner, requiring only moderate manual curation. We first obtain a list of lipid-related gold-standard GO terms by keyword search and manual curation. Then, based on the hypothesis that co-annotated GO terms share similar properties, we develop a machine learning method that expands the list of lipid-related terms from the gold standard. Those terms predicted most likely to be lipid related are examined by a human curator following specific curation rules to confirm the class labels. The structure of GO is also exploited to help reduce the curation effort. The prediction and curation cycle is repeated until no further lipid-related term is found. Our approach has covered a high proportion, if not all, of lipid-related terms with relatively high efficiency.

DATABASE URL:

http://compbio.ddns.comp.nus.edu.sg/∼lipidgo.

PMID:
25209026
PMCID:
PMC4160098
DOI:
10.1093/database/bau089
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center