Format

Send to

Choose Destination
Neural Comput. 2014 Nov;26(11):2379-94. doi: 10.1162/NECO_a_00661. Epub 2014 Aug 22.

High-dimensional cluster analysis with the masked EM algorithm.

Author information

1
UCL Institute of Neurology and UCL Department of Neuroscience, Physiology, and Pharmacology, University College London, London WC1E 6DE, U.K. s.kadir@ucl.ac.uk.

Abstract

Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for next-generation, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a "masked EM" algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data.

PMID:
25149694
PMCID:
PMC4298163
DOI:
10.1162/NECO_a_00661
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Atypon Icon for PubMed Central
Loading ...
Support Center