Format

Send to

Choose Destination
See comment in PubMed Commons below
Prog Brain Res. 2006;158:83-108.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

Author information

1
The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30322, USA.

Abstract

The goal of this chapter is to introduce some of the available computational methods for expression analysis. Genomic and proteomic experimental techniques are briefly discussed to help the reader understand these methods and results better in context with the biological significance. Furthermore, a case study is presented that will illustrate the use of these analytical methods to extract significant biomarkers from high-throughput microarray data. Genomic and proteomic data analysis is essential for understanding the underlying factors that are involved in human disease. Currently, such experimental data are generally obtained by high-throughput microarray or mass spectrometry technologies among others. The sheer amount of raw data obtained using these methods warrants specialized computational methods for data analysis. Biomarker discovery for neurological diagnosis and prognosis is one such example. By extracting significant genomic and proteomic biomarkers in controlled experiments, we come closer to understanding how biological mechanisms contribute to neural degenerative diseases such as Alzheimers' and how drug treatments interact with the nervous system. In the biomarker discovery process, there are several computational methods that must be carefully considered to accurately analyze genomic or proteomic data. These methods include quality control, clustering, classification, feature ranking, and validation. Data quality control and normalization methods reduce technical variability and ensure that discovered biomarkers are statistically significant. Preprocessing steps must be carefully selected since they may adversely affect the results of the following expression analysis steps, which generally fall into two categories: unsupervised and supervised. Unsupervised or clustering methods can be used to group similar genomic or proteomic profiles and therefore can elucidate relationships within sample groups. These methods can also assign biomarkers to sub-groups based on their expression profiles across patient samples. Although clustering is useful for exploratory analysis, it is limited due to its inability to incorporate expert knowledge. On the other hand, classification and feature ranking are supervised, knowledge-based machine learning methods that estimate the distribution of biological expression data and, in doing so, can extract important information about these experiments. Classification is closely coupled with feature ranking, which is essentially a data reduction method that uses classification error estimation or other statistical tests to score features. Biomarkers can subsequently be extracted by eliminating insignificantly ranked features. These analytical methods may be equally applied to genetic and proteomic data. However, because of both biological differences between the data sources and technical differences between the experimental methods used to obtain these data, it is important to have a firm understanding of the data sources and experimental methods. At the same time, regardless of the data quality, it is inevitable that some discovered biomarkers are false positives. Thus, it is important to validate discovered biomarkers. The validation process may be slow; yet, the overall biomarker discovery process is significantly accelerated due to initial feature ranking and data reduction steps. Information obtained from the validation process may also be used to refine data analysis procedures for future iteration. Biomarker validation may be performed in a number of ways - bench-side in traditional labs, web-based electronic resources such as gene ontology and literature databases, and clinical trials.

PMID:
17027692
DOI:
10.1016/S0079-6123(06)58004-5
[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Support Center