Format

Send to

Choose Destination
J Neurosci Methods. 2014 Oct 30;236:19-25. doi: 10.1016/j.jneumeth.2014.08.001. Epub 2014 Aug 10.

Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

Author information

1
UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA. Electronic address: benson.irungu@uth.tmc.edu.
2
UT Center of Excellence on Mood Disorders, Department of Psychiatry and Behavioral Sciences, UT Houston Medical School, Houston, TX, USA.
3
The University of Texas Health Science Center at Houston, Department of Diagnostic & Interventional Imaging, Houston, TX, USA.

Abstract

BACKGROUND:

Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data.

NEW METHOD:

We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm.

COMPARISON WITH EXISTING METHODS:

t-SNE was evaluated against classical principal component analysis.

CONCLUSION:

Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders.

KEYWORDS:

Big data; Multimodal neuroimaging; Research domain criteria (RDoC); Unsupervised machine learning; t-Distributed stochastic neighbour embedding (t-SNE)

PMID:
25117552
DOI:
10.1016/j.jneumeth.2014.08.001
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center