The Intelligent Detection of Second Language Learner Errors

Michael Levison; Greg Lessard; Derek Walker

Authorship

1. Michael Levison

Department of Computing and Information Science - Queen's University
2. Greg Lessard

Department of French - Queen's University
3. Derek Walker

Department of Computing and Information Science - Queen's University

Parent session

LING (c), Tibor Nagy

Original URL

http://web.archive.org/web/19991001055448/http://lingua.arts.klte.hu/allcach98/abst/abs29.htm

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The concept of error analysis in second language (L2) acquisition received much attention in the 1970's ([13], for example), but underwent something of an eclipse in the 1980's with the advent of communicative and holistic approaches to language teaching Reports of its death, however, are perhaps premature ([9]). There is a growing body of empirical work which demonstrates the utility of explicit instruction in L2 acquisition (for example [4],[5]), and accumulating evidence which favours the usefulness of explicit corrective feedback (for example [2]). At the same time, while pen-and-paper error analysis is undeniably tedious and fraught with error, the role of computers offers an alternative, either in the form of corpus analysis, or more interestingly, in the form of controlled elicitation and analysis of L2 productions (see for example [8]).
In this paper we mention a system designed to provide a wide range of teaching materials to L2 learners, and describe a complex mechanism for error analysis. Systems of this kind are discussed in [1],[7],[11],[12],[14], and [6], among many others. Space constraints preclude us from providing a full review here.
VINCI is a language generation system which, given data files describing a natural language, generates utterances of the kind the user has specified. In fact, the system can generate clusters of related utterances, typically transformations of a common root. These clusters may be the question and answer in a drill exercise for language learners, the stimulus and alternative responses in a linguistic experiment, the machine's segments of a machine/human dialogue, and so on. (See [10] for details.)
Our purpose here is to explore approaches to the intelligent detection of second-language (L2) learner errors. In this context, VINCI generates a question or stimulus together with a correct or anticipated response, displaying the former while retaining the latter internally. It then awaits an actual response from the learner. The learner enters a response within a text editing environment, and signals when the task is completed. The machine normalizes the response to eliminate variations of spacing and layout. It is then ready to compare actual and anticipated responses.
In a naive system, this comparison may be restricted to exact equivalence. The learner is simply told "right" or "wrong". Such a system ignores alternative, potentially correct, responses involving synonyms or syntactic variants, and offers no feedback to help in the learning process. In contrast, a human teacher will not only pinpoint what is wrong with the response, but may attempt to diagnose the underlying cause of an error, and advise the learner about possible misconceptions.
Experiments have shown that second-language learners rarely make errors at random. Rather, they supplement their knowledge of the language with malrules -- supposed rules which are actually incorrect -- which they have inferred consciously or subconsciously from their limited experience of the language, or perhaps from their own native tongue. In fact, for a given language, the same malrules may be common to many L2 learners. It is this kind of information that an experienced teacher makes use of to help a learner, and that we would ultimately like to emulate.
A simple improvement to the naive system has the instructor (the person who formulates the exercise) specify both a "correct" response and several others which result from anticipated learner errors. The actual response can then be compared with each of these to determine which error, if any, the student has made. Each alternative may trigger a different report explaining what the error is, and the common reasons why it occurs. In practice, this approach is limited in scope, because it can be applied only where the number of potential errors is fairly small and fixed, and can be anticipated in advance. If multiple errors are possible, the number of alternatives explodes rapidly.
At the opposite extreme, we might require that the system "parses" the student response in the hope of discovering what the learner has done wrong. Unfortunately, such parsing involves two major difficulties. First, by its very definition, an erroneous response is generally NOT an utterance of the language, and therefore cannot be parsed with respect to the language grammar. To overcome this, the common malrules must be somehow embedded in the language description, and parsing must occur with respect to this augmented grammar.
The second and more serious difficulty is that in order to fulfill the purposes for which it was designed, the VINCI system offers features well beyond the limitations of a context-free syntax, including syntactic transformations (which play a major role in describing some malrules), words which point at other words, and many others. This means that the parsing of utterances with reference to a VINCI description is not amenable to the standard parsing techniques used in other areas.
From a theoretical standpoint, any language generator can be turned into parser, subject to some questions about the termination of the process. VINCI is no exception. If we have the syntax tree and the other data structures which lead to the correct response, and if the malrules indicate alternative paths which the student may have followed, we can have the system generate all possible erroneous responses which result from the malrules, and compare each with the student's actual response. If one (or more) of them matches it, the malrules which triggered this particular response potentially indicate what the student has done. The problem with this approach is that it is infeasible. Even a limited set of malrules applied to a generation tree in a systematic fashion yields billions of alternatives, well beyond the capacity of any real computer.
In practice, therefore, we require an approach which surpasses the limits of a small fixed error set, while keeping the error analysis within feasible bounds. For this purpose we have designed an algorithm which is based on approximate string matching, and which takes a two-level approach to the comparison of actual and correct response.
Approximate String Matching
The problem of approximate string matching compares two similar but not identical strings, such as metaphor/metafor, to determine some minimal set of editing operations (say, delete "p", insert "f", transpose ...) for converting one into the other. This problem has been studied extensively in computer science, and several algorithms are known for its solution. (See for example [3] ,[15].) In carrying out such an algorithm, the computer will often compare some letter of one string with a letter of the other to decide if they are equal or not -- an operation which in this case is trivial.
In our error-detection process, the upper level uses an approximate string matching algorithm to match the string of *words* in the correct response to the string of *words* in the student reply. In this way, it detects word insertions, word deletions and changes of word order. Typically, these might result from a misunderstanding of syntax. If several alternative syntactic forms are reasonable, these can be included as separate correct responses.
The lower level of our process arises when the approximate string matching algorithm tries to compare a *word* of the correct string with a *word* of the actual response. Rather than merely look for an exact match, it tries to take account of phonetic errors, typos, problems in morphology, and even the use of synonyms and malwords. (We use malword to describe an error in which the learner substitutes a plausible non- word for the correct one: "conservatif" for "conservateur", etc.) For example, we might consider the following correct/error pairs to "match":
"metaphor"/"metahpor" (typing error)
"metaphor"/"metafor" (phonological error -- student knows the word, but
not its spelling)
"soit"/"est" (failure to use subjunctive)
"grand"/"gros" (synonym -- not, of course, an error)
"_crivain"/"_criteur" (malword)
"irai"/"allerai" (improper morphology -- student is unaware
of the irregularity of "aller")
Comparing Words
Let us take a closer look at comparing words. Suppose C is a word of the "correct" sentence, and S the student word with which it is being compared. Comparison involves the following steps, which are tried in order until any step decides that S matches C.
(0) Are C and S actually the same?
(1) Do C and S differ only in regard to a morphology change?
(i.e. did the student type the wrong inflected part of the right
word, perhaps because he/she didn't know the verb should be
in the subjunctive mood?)
(2) Are C and S lexically related?
(i.e. is S a synonym or a malword for C?)
(3) Are C and S phonetically similar?
(4) Do C and S differ only by a typo?
The order is partly deliberate, partly arbitrary. Many morphological and most phonological errors can also be interpreted as typos, but in such cases, we tend to prefer the "higher level" explanation. So we regard "parle"/"parlent" as a morphological or even phonological error, but "parlnet"/"parlent" as a typo. The typo must therefore be the last explanation to be considered. The order of (1) and (2), on the other hand, appears to be irrelevant. Actually, as we note below, we have also to consider combinations of the steps to determine whether, for example, a morphology error occurred in a synonym.
We will examine each of these steps in reverse order.
Step 4 (typos) This step merely employs an approximate string matching algorithm at the word level. This finds editing operations which convert S to C. If their number is below some threshold, S is regarded as a typo variant of C.
Step 3 (phonological errors) In this step, we investigate whether the student seems to know the correct word, but has misspelled it, replacing it by a phonologically equivalent form. By including phonemic information in the language description, we can easily arrange for VINCI to generate a phonemic representation of the each word C of expected response. We have then to analyze S to see whether it matches C phonologically.
This step in not included in the current implementation.
Step 2 (lexical errors) This covers two possibilities:
(i) that the student has used a proper synonym for the "correct" word,
(ii) that the student has used a common malword.
We can even admit further options, such as possible synonyms not often used by native speakers (pragmatically surprising), and so on.
The mechanism for the two cases is the same. The VINCI pointer mechanism allows lexical entries to point at lists of synonyms and common malwords. S is compared to each such word pointed at by C. The comparison of S with these variants should itself involve the processes used for steps 1, 3 and 4. In other words, we ask: did the student commit a typo, etc. while intending to enter this synonym?
We can also use this step to handle certain variants/errors which might be considered morphological. For example, the verb "pouvoir" (present tense: "peux", "peut") might have a synonym "pouvoir_alt" (present tense: "puis", "puit"); and the verb "aller" might have a malword "aller_alt" conjugated as an -er verb. Indeed, for an irregular verb, a single malword may used to gather a range of improper components.
Step 1 (morphological errors) By morphological error we imply that S is the result of the student's inflecting the correct word improperly. This may be because they have the wrong attributes (they didn't know that the verb should be subjunctive or that the adjective must agree with its noun), because they followed a morphology malrule (they "know" the wrong rule to conjugate some verb), or for a variety of other reasons. Some morphology malrules may be dealt with in step 2. In this step we currently consider the possibility that S is a properly inflected part of C, but not the right one. For this purpose, the different forms of C are enumerated systematically, and S is compared with each. Once again, the comparison can involve the proceses of steps 3 and 4 to allow for typos or phonological errors.
In fact, this step can also be used to discover morphology errors which do not yield valid parts of C, especially those which apply to many different words. To do this, we simply add extra parts to the morphology of C, which cannot be reached in normal conjugation, but will appear in the systematic enumeration.
Symptoms and Diagnosis
The steps described above reveal symptoms of learner problems, but they do not provide a diagnosis. In particular, human language teachers often need to observe student performance over time in order to discover the key behind the errors observed. The application of the environment described here to language learners will be a topic of future research. On the basis of the results obtained, we will build a diagnostic system which attempts to track and categorize errors made over one or more sessions, in order to provide higher-level diagnosis and, potentially, remedial work.
References
1. Allen, J.R. (1996) The Ghost in the Machine: Generating Error Messages in Computer Assisted Language Learning Programs. CALICO Journal 13/2-3, pp. 87-103.
2. Carroll, S., Swain, M. (1993) Explicit and Implicit Negative Feedback: An Empirical Study of the Learning of Linguistic Generalizations. Studies in Second Language Acquisition 15, pp. 357-386.
3. Crochemore, M., Ritter, W. (1994) Text Algorithms. Oxford: Oxford University Press.
4. Doughty, C. (1991) Second Language Instruction Does Make a Difference: Evidence from an Empirical Study of SL Relativization. Studies in Second Language Acquisition 13, pp. 431-469.
5. de Graaf, R. (1997) The Experanto Experiment: Effects of Explicit Instruction on Second Language Acquisition. Studies in Second Language Acquisition 19, pp. 249-276.
6. Hart, R.S. (1994) Improved Algorithms for Identifying Spelling and Word Order Errors in Student Responses. Institution: Illinois Univ., Urbana. Language Learning Lab.
7. Heift, T., McFetridge, P. (1994) The Intelligent Workbook. In: Educational Multimedia and Hypermedia, 1994. Proceedings of ED-MEDIA '94 World Conference on Educational Multimedia and Hypermedia (Vancouver, British Columbia, Canada, June 25-30, 1994).
8. Hulstijn, J.H. (1997) Second Language Acquisition Research in the Laboratory: Possibilities and Limitations. Introduction to a special issue of Studies in Second Language Acquisition, 19, pp. 131-143.
9. James, C. (1994) Don't Shoot My Dodo: On the Resilience of Contrastive and Error Analysis. IRAL 32/2, pp. 179-200.
10. Levison, M., Lessard, G. (1996) Using a Language Generation System for Second Language Learning. Computer-Assisted Language Learning 9/2-3 pp. 181-189.
11. Nagata, N. (1993) Intelligent Computer Feedback for Second Language Instruction. Modern Language Journal 77/3, pp. 330-339.
12. Nagata, N., Swisher, M.V. (1995) A Study of Consciousness-Raising by Computer: The Effect of Metalinguistic Feedback on Second Language Learning. Foreign Language Annals 28/3, pp. 337-347.
13. Richards, J.C. (1974) Error analysis : perspectives on second language acquisition. London: Longmans.
14. Schwind, C.B. (1995) Error Analysis and Explanation in Knowledge Based Language Tutoring. Computer Assisted Language Learning 8/4, pp. 295-324.
15. Stephen, G. (1994) String Searching Algorithms. Singapore: World Scientific.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/

References: http://web.archive.org/web/19990225164509/http://lingua.arts.klte.hu/allcach98/abst/jegyzek.htm

Attendance: ~60 (https://web.archive.org/web/19990128030244/http://lingua.arts.klte.hu/allcach98/listpar3.htm)

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

The Intelligent Detection of Second Language Learner Errors

1. Michael Levison

2. Greg Lessard

3. Derek Walker

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"