Send to

Choose Destination
See comment in PubMed Commons below
Cortex. 2014 Jun;55:43-60. doi: 10.1016/j.cortex.2012.12.006. Epub 2012 Dec 21.

Automated classification of primary progressive aphasia subtypes from narrative speech transcripts.

Author information

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. Electronic address:
Rotman Research Institute, Baycrest Centre, Toronto, Ontario, Canada.
Department of Speech-Language Pathology, University of Toronto, Toronto, Ontario, Canada; Toronto Rehabilitation Institute, Toronto, Ontario, Canada.
School of Rehabilitation Sciences, University of Ottawa, Ottawa, Ontario, Canada.
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
L.C. Campbell Cognitive Neurology Research Unit, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada; Department of Medicine (Neurology), University of Toronto, Toronto, Ontario, Canada.


In the early stages of neurodegenerative disorders, individuals may exhibit a decline in language abilities that is difficult to quantify with standardized tests. Careful analysis of connected speech can provide valuable information about a patient's language capacities. To date, this type of analysis has been limited by its time-consuming nature. In this study, we present a method for evaluating and classifying connected speech in primary progressive aphasia using computational techniques. Syntactic and semantic features were automatically extracted from transcriptions of narrative speech for three groups: semantic dementia (SD), progressive nonfluent aphasia (PNFA), and healthy controls. Features that varied significantly between the groups were used to train machine learning classifiers, which were then tested on held-out data. We achieved accuracies well above baseline on the three binary classification tasks. An analysis of the influential features showed that in contrast with controls, both patient groups tended to use words which were higher in frequency (especially nouns for SD, and verbs for PNFA). The SD patients also tended to use words (especially nouns) that were higher in familiarity, and they produced fewer nouns, but more demonstratives and adverbs, than controls. The speech of the PNFA group tended to be slower and incorporate shorter words than controls. The patient groups were distinguished from each other by the SD patients' relatively increased use of words which are high in frequency and/or familiarity.


Machine learning; Narrative speech; Natural language processing; Progressive nonfluent aphasia; Semantic dementia

[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Support Center