Analysing Language Disorders: The Lexical Quantification of Aphasic Speech

  1. 1. David I. Holmes

    Faculty of Computer Science and Mathematics - University of the West of England

  2. 2. Sameer Singh

    University of the West of England

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The assessment and therapy of language disordered patients has been an important topic in
clinical research for some three decades. Speech
and language disorders often result as a direct
consequence of strokes, tumours, brain injuries
and neurogenic diseases and disorders. Language
disorders are distinct from speech disorders which
involve disturbances to the physical characteristics of speech and are diagnosed and treated
differently. The most commonly occurring form
of language disorder is known as “aphasia”, which
can result from impairment of the lexical, phonological, semantic and syntactic components of
language, the severity of which depends on the
extent and location of the lesion in the damaged
brain. Language disorders often co-exist with
speech disorders such as dysarthria and dyspraxia
which result from damage to the muscular control
of the speech mechanism and impairment of the
phonological system.
Aphasia is a complex disorder and most of the
taxonomic approaches proposed by researchers
over the years have been inconsistent and subjective. The two most common approaches are ‘classical’, where classification is based on the localisation of lesions in the brain, and ‘fluent-nonfluent dichotomy’ where patients are categorised
as fluent or non-fluent depending on their free
speech evaluation. The latter method is more useful for therapy but unfortunately the speech characteristics which classify patients are heavily dependent upon physical characteristics of speech
such as speech-tempo, instead of taking into account linguistic features. Linguistic criteria for
classifying patients as fluent or non-fluent are
urgently needed.
Agrammatism in speech is the most common
symptom across most of the aphasia categories.
Agrammatic speech is composed chiefly of openclass lexical items (nouns, verbs, adjectives) with
most of the closed-class lexical items (pronouns,
prepositions, articles) being either incorrectly substituted or omitted (Berndt and Caramazza, 1980).
Agrammatic speakers have a severely reduced
vocabulary, experience serious word-finding difficulties and have particular difficulties with certain grammatical structures. Their sentences are
short and broken, and are filled with redundant
stereotyped phrases. Agrammatic speech is made
with effort, telegraphic, slow and poor in both
grammar and lexical richness. Patients also exhibit
automatisms (“you know”) and perseverances (“II-I-I went there”).
At present, several extensive aphasia test batteries
are used in clinics all over the world to evaluate
patients’ performances on language tasks. These
tests do not, however, directly assess conversational skills and therefore fail to measure the ability
of patients to communicate effectively in a social
environment as opposed to their performance under a constrained environment.
This paper looks at the problem of the quantification of the conversational speech of aphasic patients, on the basis of linguistic measures, and
proposes an ‘index of performance’ which may be
used by speech therapists as a measure of the
efficacy of the therapy programme.
Data collection
Conversational speech from a total of 100 subjects
was recorded; seventy of these were patients
(agrammatic aphasics) and the remaining thirty
were ‘normal’ (unimpaired) adults split into two
control sets of fifteen. The first set (type-I) came
from a relatively high educational background and
were either working in, or had retired from, professional occupations, whilst the second set (typeII) was comprised of people with low educational
backgrounds currently working as cleaners, porters or cooks. All subjects were aged 50 or higher.
Each subject was recorded in private and asked
simple questions about their family, career, hobbies, etc. with minimal interruption from the interviewer. No recording lasted more than thirty minutes.
The raw data in the form of speech recordings was
then transformed into transcripts for lexical analysis. The words of the interviewer were erased,
interjections were ignored and a few unintelligible
utterances had to be omitted. Each transcription
consisted of at least 1,000 words, as recommended
by Andreason and Pfohl (1976), and was fed into
the Oxford Concordance Program to produce an
output-file for each subject consisting of word-frequency distributions and word listings from which
nouns, pronouns, adjectives and verbs were manually tagged.
Linguistic measures
It was tempting to borrow the majority of the
linguistic measures from stylometric studies of
written texts, but the very different nature of utterances in conversational speech caused us to settle
on the eight measures described below:
(i) Noun rate per 100 words.
(ii) Pronoun rate per 100 words.
(iii) Adjective rate per 100 words.
(iv) Verb rate per 100 words.
(v) Type-Token ratio.
(vi) ‘Clause-Like Semantic Unit’ (CSU) rate per
100 words. A CSU may be defined as a
string of words grammatically connected in
a meaningful form, and we use the term
“clause-like” since in agrammatic speech a
number of clauses are left unfinished.
(vii) Brunet’s W index. (Brunet, 1978)
(viii) Honore’s R statistic. (Honore, 1979)
These eight measures, chosen for their reliability
and effectiveness in quantifying the severity of
agrammatism in conversation, were accordingly
computed from the transcripts and OCP printouts
of all 100 subjects.
Multivariate analysis
A principal components analysis (PCA) was first
computed on the (100 x 8) standardized data matrix. In the plot of the data in the space of the first
two principal components, type-I and type-II ‘normals’ cluster closely together whilst patients exhibit wide variation and tend to lie to the left of the
‘normals’ which is the side of lower lexical richness. To investigate this clustering pattern, a discriminant analysis was then conducted on the two
groups of ‘normals’. This failed to reject the null
hypothesis that, in the populations from which the
samples are drawn, there is no difference between
the group means. Both types of ‘normals’ were
then combined into one group and a second discriminant analysis conducted on this enhanced group
and on the group of patients.
This time we can clearly reject the null hypothesis
of no difference between group means. Examination of the relative contributions of the eight variables to the discriminant function shows that the
most important variables in terms of their discriminating power between ‘normals’ and agrammatic aphasics are C-rate, A-rate, W and TTR. These
results are supported by distribution-free Mann
Whitney tests on individual variables for both
patients versus combined ‘normals’ and type-I
versus type-II ‘normals’.
Final index of performance
The major attraction of producing a final index of
performance (FIP) for the lexical ability of agrammatic patients is the ability to state improvement
or performance comparisons in quantitative terms.
We propose that the FIP be derived from the
discriminant scores of the subjects. In this case the
weightings would be the unstandardized discriminant function coefficients for the eight variables.
The discriminant scores are suitably scaled so that
the FIP values lie in the (0-100) range for our
subjects, low scores representing severely impaired patients, both in lexical and syntactic terms.
The distinction between patients and ‘normals’ is
clearly visible from a plot of FIP values as is the
relative consistency of the lexical and syntactic
proficency of the ‘normals’ and the huge variation
in the performance of patients.
In conclusion, to test the FIP, follow-up studies
were conducted on a few patients who, in the year
between visits, had had time to make some recovery from their strokes with the help of speech
therapy. The FIP values had increased, thereby
successfully giving the speech therapist an objective quantitative measure of the conversational
speech of the patients which showed the efficacy
of the therapy and recovery program. We hope that
FIP values will now be used for conversational
assessment in the management of aphasia.
Andreason, N.J. and Pfohl, B. “Linguistic Analysis of Speech in Affective Disorders”, Arch.
General Psychiatry, Vol. 33, pp 1361–1367,
Berndt, R. and Caramazza, A. “A Redefinition of
the Syndrome of Broca’s Aphasia: Implications for a Neuropsychological Model of
Language”, Applied Psycholinguistics, Vol. 1,
pp 255–278, 1980.
Brunet, E. “Le Vocabulaire de Jean Giraudoux”,
Structure et Evolution, Geneve: Slatkine,
Honore, A. “Some Simple Measures of Richness
of Vocabulary”, Association for Literary and
Linguistic Computing Bulletin, Vol. 7, pp
172–177, 1979.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at University of Bergen

Bergen, Norway

June 25, 1996 - June 29, 1996

147 works by 190 authors indexed

Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (

Conference website:

Series: ACH/ICCH (16), ALLC/EADH (23), ACH/ALLC (8)

Organizers: ACH, ALLC