Format

Send to

Choose Destination
See comment in PubMed Commons below
Cortex. 2014 Jun;55:122-9. doi: 10.1016/j.cortex.2013.05.008. Epub 2013 Jun 14.

Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.

Author information

1
Stroke and Dementia Research Centre, St George's, University of London, Cranmer Terrace, London SW17 ORE, UK. Electronic address: pgarrard@sgul.ac.uk.
2
Stroke and Dementia Research Centre, St George's, University of London, Cranmer Terrace, London SW17 ORE, UK.
3
UCSF Memory and Aging Center, Sandler Neurosciences Center, 675 Nelson Rising Lane, Suite 190, San Francisco, CA, USA.

Abstract

Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction.

KEYWORDS:

Discourse; Information gain; Laterality; Machine learning; Semantic dementia

PMID:
23876449
PMCID:
PMC4072460
DOI:
10.1016/j.cortex.2013.05.008
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science Icon for PubMed Central
    Loading ...
    Support Center