Applying Machine Translation Techniques to the Evaluation of Pedagogical Grammars

  1. 1. Simon Berry

    The Robert Gordon University

  2. 2. Arturo Trujillo

    The Robert Gordon University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

This paper describes a methodology for the evaluation of translation and usage rules of the type
usually found in pedagogical grammar texts; the
methodology is applied to two types of such rules
written from different perspectives, and the conclusions drawn from this exercise are presented.
The work is of relevance to teachers and compilers
of learner grammars, since it allows the comparison of various ways of explaining linguistic phenomena. Techniques from Machine Translation
(MT) and Natural Language Processing (NLP) are
used as part of the evaluation methodology.
1 Introduction
Assessing the quality of rules, explanations and
idiosyncrasies found in grammar texts or self-study books for foreign language learners can involve
considerable amounts of time and effort. It normally requires very experienced language teachers and lecturers compiling vast amounts of
student errors over a number of years in order to
revise and extend existing usage rules.
In order to determine more quickly and objectively
the accuracy of the rules and information provided
in traditional grammar texts – for example those
of Butt and Benjamin (1994), Grevisse (1993),
Byrne and Churchill (1986) – we have applied
techniques from Machine Translation (MT) and
Natural Language Processing (NLP) to the evaluation of rules, guidance notes and lists of exceptions
as found in such texts. We think evaluations of this
type are useful for the production of adequate
teaching and educational material for undergraduate and adult education.
2 The Methodology
The evaluation method presented in this paper
comprises an experimental MT system to which
are added suitably coded rules as found in the
grammar texts under consideration. The MT system is wide coverage in the sense that it attempts
to translate arbitrary texts; this is achieved by
allowing human intervention during the translation process. A diagram of the methodology is
presented below:
MT Engine
(SL = source language; TL = target language)
3 Machine Translation
The system design adapts techniques from lexicalist MT (Whitelock 1994; Trujillo and Berry 1996;
Poznanski et al. 1995). The principal tenet of this
approach is that machine translation may be achieved by equating sets of source and target lexical
elements. Thus, the system described consists of
SL analysis, lexical transfer and TL generation.
However our analysis and generation mechanisms
are more restricted than those for standard approaches.
The lexicalist MT approach contrasts with other
MT approaches which either construct language
independent, interlingua representations, or involve complex transformations of SL structures into
TL structures (for a review of MT, see Hutchins
and Somers (1992)).
4 Prototype MT System
The base MT engine proposed here consists principally of a part-of-speech tagger (Brill 1994), a
morphological analyser (Antworth 1990; Koskenniemi 1983), a bilingual English-French dictionary, a monolingual French dictionary and a pattern
matching mechanism incorporating the rules to be
evaluated. The user interacts with the system to
disambiguate possible translations. A full description of the system may be obtained as Berry and
Trujillo (1995).
5 Approach to Evaluation
Since the present goal is the evaluation of the
quality, explicitness and applicability of rules
found in grammar texts, we present some of the
types of rules found in such works.
1) One typical kind of rule involves the use of
generic or broad semantic classes for explaining conditions of usage (Byrne and Churchill
SL Transfer Generation
Human Intcration
Grammar Rules
Analysis TL
Insert a definite article in French
• before abstract nouns used in a general
• before names of substances used in a
general class
• with names of languages
2) Another kind of rule also uses semantic classes but accompanies them with lists of exceptions (Butt and Benjamin 1994:419-420):
The Spanish preposition “a” follows almost
any verb of motion. Note:
i) is omitted after verbs of motion which
are followed by “aquí”, “acá”, ...
ii) in Spain, “entrar” (to enter) is usually
followed by “en” instead of “a”.
3) A third type of rule involves associating the
word with an equivalent word or phrase in the
learner’s language (Payne 1987:51):
Hungarian “van” frequently corresponds to
English “there is”:
• “Van benzin?” (Is there any petrol?)
• “Sok szabad asztal van a teraszon” (There
are lots of free tables on the terrace)
4) Finally, rules can involve a description of the
linguistic context in which a word or construction should be used (Dunn and Yanada
One of the uses of the Japanese honorific
system is to express one’s respect for the
person spoken about. It is used on formal
occasions, when the parties do not know
each other very well, when speaking to or of
one’s own or somebody else’s parents or
grandparents, or one’s superiors or one’s
Of these, we have evaluated the first two types,
principally because they were the most relevant to
our particular study, namely the use of the definite
article in French. Furthermore, rules of types 1)
and 2) lend themselves more readily to implementation since they use information which can be
made explicit through known techniques.
6 Rule Representation
The representation paradigm implemented consists of patterns operating over texts labelled with
part-of-speech, semantic class and/or translations
depending on the stage of processing. The representation of the selected grammar and guidance
rules is as follows. For type 1) rules above, semantic class information is compiled from monolingual dictionaries (Boguraev and Briscoe 1989; Trujillo and Plowman 1991; Briscoe et al. 1993)
which can then be incorporated into patterns corresponding to the rule being evaluated. An example of such a pattern might be:
N(language) ==> {le,la} N’
where N(language) matches a noun with semantic
class language, and the output has an article inserted before it.
The representation of type 2) rules also exploits
patterns, ordered from most specific (attempted
first) to most general (attempted last). Specific
rules correspond to exceptions while general rules
correspond to default advice presented in the
grammar text. For the example above, we have:
enter ==> entrar en
V(motion) ==> V’ a
Case 3) rules are also included as patterns expressing the required equivalence:
van ==> there is
In our MT prototype, the patterns are applied after
lexical translation has taken place. This allows the
patterns to take into account both English (i.e. SL)
and French (i.e. TL) information.
7 An Example Translation
To clarify the operation of the system, consider the
translation of the following text into French:
I prefer French
The following bilingual lexicon entries and patterns are necessary:
1. I ~ Je
2. prefer ~ préférer
3. French ~ français
4. N(language) ==> {le,la} N’
Before applying the patterns, the input is tagged
with part-of-speech labels and semantic categories
(derived from a monolingual dictionary):
I/PRON prefer/V
The first step is lexical translation. The result is:
I/PRON/Je prefer/V/préférer
These strings are then matched against the left
hand side of rules, using the right hand side as
output. The label with a prime indicates the location of corresponding matched items on the left
hand side. For example, matching of rule 4 above
may be depicted as:
N(language) ==> {le,la} N’
Rule Output:
{le,la} French/N(language)/français
The complete translation, after deletion of the
English words and parts-of-speech is:
Je préférer {le,la} français
Since the evaluation of morphological rules is not
of primary concern, the output is not well-formed
with respect to gender. Rather, emphasis is placed
on the inclusion/omission of lexical elements, and
on the choice of particular words for a given
source text.
Once texts are translated, they are inspected for
correct and incorrect translations arising from the
particular rule being evaluated. This gives a measure of the quality of the rule; this measure may be
used for comparison with other rules.
8 Applying the Methodology
We have applied the above methodology to the
evaluation of rules for the translation and usage of
the French definite article, using English as the
source language. Rather than considering one particular grammar text, we identified two strategies
adopted by grammarians and applied the proposed
methodology to both. The two strategies were:
• present rules for when to insert the article;
• present rules for when to omit the article.
Two rule sets were developed, one for each strategy; both sets included general rules and more
specific rules as exemplified in the previous example. Rules describing when to insert the article
(Byrne and Churchill 1986:21-25; Farrer
1990:123-37) fell under category 1) in section 5.
The other type of rule consisted of more specific
and idiosyncratic examples of when to omit the
article (Grevisse 1993:570-71; Bescherelle
1990:50-56). Such rules fell mainly under category 2) in section 5. In evaluating both sets of rules,
the article from the English source text was deleted
and the rules were left to operate over the “articleless” raw translations. The first set of rules inserted articles based mostly on semantic categories,
while the second set introduced them only when
specific rules indicating when to omit the article
had been unsuccessfully tried.
9 Results
Both sets of rules were tested against three corpora: a work of fiction, a work of non-fiction and a
randomly chosen unseen article from a journal.
This last corpus represented a genre of text on
which our rule sets had not been tested. The following results were obtained (measured as the proportion of article insertions or omissions by the
rules compared with those of a human translator):
Rule set 1) 168/272=62% 283/373=76% 90/119=76%
Rule set 2) 229/272=84% 342/373=92% 115/119=97%
In this table, X / Y represents the fact that the
definite article was correctly processed X times
out of Y occurrences. The size of each corpus is
indicated on the table; thus the fiction corpus
contained 2,400 words which gave rise to 272 sites
at which a definite article could have been inserted
in French. In the English corpora, the definite
article represented 6% of the total number of
words, while in the French corpora it represented
It is clear that case 2) rules performed better than
those in case 1). In fact, a chi-squared test shows
that the observed differences between the performance of the two sets of rules is significant at
0.005. In terms of size, obviously a larger text
sample would have resulted in much more definitive results, but for the purposes of rule testing (in
contrast to stylistic studies) the tests described
here reflect a real advantage for a given type of
Given the behaviour and distribution of the French
article, these results would suggest that at least
from a computational perspective, it is better to
indicate when to omit an article than when to
include it. It is worth noting that the rules did not
take into account the existence of a definite article
in the English original text. While this might influence the finer distinctions on article insertion in
French, the majority of decisions may be made
based purely on target language considerations.
10 Relation to Other Work
The compilation and analysis of errors by language students is well established in applied linguistics (Corder 1973:256-95; Johansson 1975:41-52;
Ellis 1990:45-46). Similarly, pedagogical grammars continue to be used for language instruction
(Krzeszowski 1975; Stern 1990:94). The present
work applies techniques from MT and NLP to the
evaluation and possible improvement of such
grammars in a controlled and relatively objective
Computer Assisted Language Learning (Ahmad et
al. 1985) techniques may also be applicable to the
evaluation of pedagogical grammars. In this case,
one would incorporate the rules to be tested into
the system and evaluate them as part of the overall
learning instruction process. However, our approach, because of its comparatively wider range of
input texts (i.e. the system’s corpus orientation
makes it easier to expand its linguistic coverage),
can at least offer a complementary mechanism for
this type of evaluation.
In the specific area of article translation, this work
resembles that of Bond et al. 1995 most closely
since their work is also based on a battery of rules
applied in an hierarchical fashion. The prototype
system presented here, however, introduces less
assumptions about the translation process, and
therefore allows the effect of particular rules to be
considered with minimal interference from other
system components.
11 Conclusions
We have developed and tested a mechanism for
the semiautomatic evaluation of certain types of
rules as found in grammar texts for language learners. Techniques from lexicalist MT and corpusoriented NLP were used to automate and facilitate
this task.
Since one of the overall aims is to improve the
quality of materials for human use, the results
presented here can only be indicative of possible
avenues for improvement. They do not address the
psychological or pedagogical issues faced by
language learners, for which other types of experiments may be appropriate. One problem faced
by any attempt to evaluate grammar texts of the
type considered here is that rules in such texts are
geared towards non-experts and learners; as such,
they must provide sufficient generality and flexibility to be of maximum benefit. Unfortunately,
these very characteristics hamper adequate evaluation of their effectiveness. Furthermore, the
way in which rules are interpreted by individual
learners is hard to determine, and even harder to
express as a computer program. This work therefore has interpreted the rules in grammar texts as
literally as possible in order to factor out any
additional knowledge that may be brought to bear
on their interpretation. Still, some conclusions
may be drawn from this work.
One conclusion presented in the paper was that it
is more profitable to describe when to omit the
French article rather than when to insert it, particularly when explaining French grammar to anglophones.
Possible extensions to this work could include
refinements to the prototype MT engine and the
evaluation of other rules such as those for preposition usage. However while more complex translation rules would require a more powerful translation engine, the lexicalist approach to machine
translation advocated here would be well suited to
further experimentation as it assumes little regarding the configurational aspects of language, and
therefore can be used to evaluate a range of grammar and translation rule types.
Finally, the resemblance of the rules we have used
to those found in expert systems (Gonzalez and
Dankel 1993) may suggest further paradigms and
infrastructure for investigation.
We wish to thank David Crossen, Jan Ijdens,
Murray Hill, Evan Antworth and Stephen McConnel; Eric Brill for making available the code of his
tagger, and our two anonymous reviewers for useful feedback.
This research was partially supported by the Nuffield Foundation (NUF-URB95) and The Robert
Gordon University.
Ahmad, K., G. Corbett, M. Rogers and R. Sussex,
(1985), Computers, Language Learning and
Language Teaching, Cambridge University
Press, UK.
Antworth, E. (1990), PC-KIMMO: a two-level
processor for morphological analysis, Occasional Publications in Academic Computing
No. 16. Dallas, TX: Summer Institute of Linguistics.
Berry, S. and A. Trujillo, (1995), The Definite
Article in English to French Machine Translation, Technical Report No. 95/10, SCMS, The
Robert Gordon University, October, Aberdeen, UK. Available from
Bescherelle. (1990), La Grammaire Pour Tous,
Boguraev, B. and E. Briscoe (eds.) (1989), Computational Lexicography for Natural Language Processing, Longman, Harlow, Essex,
Bond, F., K. Ogura and T. Kawaoka, (1995), Noun
phrase reference in Japanese-to-English machine translation, In Proceedings of the 6th
International Conference on Theoretical and
Methodological Issues in Machine Translation (TIM- 93), Katholieke Universiteit Leuven, Belgium, pp. 1–14.
Brill, E. (1994), A Report of Recent Progress in
Transformation Error-Driven Learning, Proceedings of the Tenth National Conference on
Artificial Intelligence (AAAI-94), Seattle,
Briscoe, E., A. Copestake and V. de-Paiva, (1993),
Inheritance, Defaults and the Lexicon, Cambridge University Press, Cambridge, UK.
Butt J. and C. Benjamin, (1994), A New Reference
Grammar of Modern Spanish, Edward Arnold, London.
Byrne, L. and E. Churchill, (1986), A Comprehensive French Grammar – Third Edition, Blackwell.
Corder, S. P., (1973), Introducing Applied Linguistics, Penguin, Harmondsworth, Middlesex, UK.
Dunn, C. J. and S. Yanada, (1958), Teach Yourself
Japanese, Hodder and Stoughton, Sevenoaks,
Ellis, R., (1990), Instructed Second Language Acquisition, Basil Blackwell, Oxford, UK.
Farrer, H., (1988), A French Reference Grammar,
Oxford University Press.
Grevisse, M., (1993), Le Bon Usage – 12 édition,
Editions DUCULOT.
Gonzalez, A. J. and D. D. Dankel, (1993), The
Engineering of Knowledge-Based Systems,
Prentice Hall, NJ.
Hutchins W. J. and H. L. Somers, (1992), An
Introduction to Machine Translation, Academic Press, London.
Johansson, S., (1975), Problems in Studying the
Communicative Effect of Learner’s Errors,
Studies in Second Language Acquisition, Vol.
1(1):41–52, Indiana University Linguistics
Club, 310 Lindley Hall, Bloomington, IN.
Koskenniemi, K., (1983), Two-level morphology:
A general computational model for word-form
recognition and production, Publication No.
11, Department of General Linguistics, University of Helsinki.
Krzeszowski, T. P., (1975), English Reference
Grammar for Polish Learners, Studies in Second Language Acquisition, Vol. 1(1):85–94,
Indiana University Linguistics Club, 310
Lindley Hall, Bloomington, IN.
Poznanski V., J. L. Beaven and P. Whitelock,
(1995), An Efficient Generation Algorithm for
Lexicalist MT, In Proceedings of the 33rd
Annual Meeting of the Association for Computational Linguistics, June, Boston, MA.
Payne, J., (1987), Colloquial Hungarian, Routledge & Kegan Paul, London.
Stern, H., (1990), Analysis and Experience as
Variables in Second Language Pedagogy, In
The Development of Second Language Proficiency, B. Harley, P. Allen, J. Cummins and
M. Swain (eds.), Cambridge University Press,
UK, pp.93–109.
Trujillo, A. and S. Berry (1996), Connectivity in
Bag Generation, In Proceedings of the 16th
International Conference on Computational
Linguistics – COLING-96, August, Copenhagen, Denmark.
Trujillo, A. and D. Plowman, (1991), Automation
of Bilingual Lexicon Compilation, In Proceedings of MT Summit III, July, Washington DC,
pp. 51–54.
Whitelock, P., (1994), Shake-and-Bake Translation. In C. J. Rupp, M. A. Rosner, and R. L.
Johnson, editors, Constraints, Languages and
Computation, Academic Press, London, pp.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at University of Bergen

Bergen, Norway

June 25, 1996 - June 29, 1996

147 works by 190 authors indexed

Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (

Conference website:

Series: ACH/ICCH (16), ALLC/EADH (23), ACH/ALLC (8)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None