Computer-aided Learning of Turkish Morphology

Izzet Pembeci; Cem Bozsahin; Deniz Zeyrek

Authorship

1. Izzet Pembeci

dept. of Computer Engineering - Middle East Technical University
2. Cem Bozsahin

dept. of Computer Engineering - Middle East Technical University
3. Deniz Zeyrek

dept. of Foreign Language Educ. - Middle East Technical University

Parent session

LING (c), Tibor Nagy

Original URL

http://web.archive.org/web/19991001155235/http://lingua.arts.klte.hu/allcach98/abst/abs35.htm

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction
This study reports on the development of a computational learning environment for learning Turkish morphology. The tool is developed to provide practice for non-native learners of Turkish and can also be used by students of linguistics to understand the complex nature of an agglutinating language. It supplies a self-regulated learning environment to the learners where they can experiment with Turkish word production and analysis at their own pace and will, with or without the instructor. In the following, we describe some aspects of Turkish morphology, the learning environment, and the underlying computational framework.
2. Aspects of Turkish Morphology
Turkish is an agglutinative language applying the principle of stacking suffix to suffix. It establishes a rich nominal and verbal suffixation in both the inflectional and derivational systems. However, suffixation cannot be characterized as a mere attachment of morphemes to a word in Turkish since the forms identifying a morpheme alternate. Alternation is phonologically conditioned and can be stated in terms of vowel harmony. In succint terms, vowel harmony in Turkish is the condition where the nature of the vowel in a suffix depends on that of the vowel in the preceding syllable, namely, whether it is Rounded, Unrounded, Front or Back. Thus, the PLURAL suffix in the noun bardaklar (glasses) is -lar, containing a BACK vowel just as the preceding one, whereas it is -ler in piller (batteries), as it precedes a FRONT vowel.

bardak-lar‘ glasses’
pil-ler ‘piller’
glass-PLU
battery-PLU

In a similar fashion, the ABLATIVE marker alternates between -den/-dan, the DATIVE between -e/-a:

ev-den
‘from the house’
house-ABL

Almanya-dan
‘from Germany’
Germany-ABL

iº-e
‘to work’
work-DAT

ay-a
‘to the moon’
moon-DAT

The rules for suffixation are not confined to vowel harmony only. The following examples illustrate the systematic consonant variations, where the initial consonant of the suffix assimilates to the final consonant of the word.

kalp-te ‘in the heart’
sokak-tan ‘from the street’
heart-LOC
street-ABL
kır-da ‘in the country’
kadın-dan ‘from the women’
Country-LOC
woman ABL

The Turkish word can be inflected for tense, person, case and number at the same time. Thus, a word can also function as a sentence. For instance the Turkish equivalent of the English sentence ‘We were at the bookseller’s’ is a single word, kitapçı (bookseller) inflected for case, tense and number, respectively.
kitapcı-da-(y)dı-k 'We were at the bookseller's'
bookseller-LOC-COPULA-1p
3. Learning environment
The tool described in this paper is aimed to serve as a supplementary teaching aid to be used either independently by the student or under the supervision of the course instructor. In either case, the tool is expected to give an opportunity to learners of analyzing and generating Turkish words of varying internal complexity. Also, since Turkish words can function as a sentence, the practice in generation and analyzing them helps students to understand the syntactic structures involved in Turkish sentences. The tool thus aims to develop an understanding of how Turkish, a typical example of agglutinative language works.
As a teaching aid for non-native speakers, the tool helps to understand the internal structure of Turkish words by guiding users to analyze words into their constitutent morphemes or generate words by adding appropriate morphemes to them. From the language teacher's viewpoint, this type of practice requires providing explicit rules of grammar, which has been de-emphasized by some communication oriented language teaching methods [ 9],[10]. However, research on second language acquisition has shown that attention to grammatical rules may enhance students' performance, at least in the short run. Observations of classrooms have further revealed that language teachers typically present rules after eliciting student opinions on a particular grammatical structure, often in the form of problem-reformulation followed optionally by exemplification by the teacher or students [ 4]. Thus, the language teacher using the tool described here is not really expected to have presented explicit rules throughout the course but assumed to have done sufficient rule presentation concerning the morphological structure of words.
The visual environment for the lab setting is World Wide Web access to a two-level morphological model of Turkish. Java programming language serves as the interface between the user's view of Turkish morphology (buttons for roots, derivations, and inflections) and the tool's view of morphotactics and morphophonemics. Use of Web and Java facilities allow learners with little or no computer experience to experiment with the system. The two-level model of Turkish morphology consists of a unification-based regular grammar for handling morphotactics, and a finite-state transducer for modeling phonological alternations in morphemes. These are described briefly in the next section.
The learning tool can be used in two different modes, the analysis and the generation mode. In both modes, the learner is guided by the buttons about the part-of-speech of the source word, e.g., nominal generation, verbal analysis, etc. Both paradigms provide target part-of-speech information, as well. For inflections, the target category is the same as that of the source. For derivations, the target category is made evident in the title of the button, e.g., ‘noun-to-verb derivations’ for the ‘len’ suffix.
In the analysis mode, the learner is expected to enter a surface form and a possible analysis for that form. The system informs the user about the success of the analysis. For example,

Evlerimizde
'in our houses'
ev-PLU-POSS1p-LOC
correct analysis

The abstract labels for morphemes, e.g., plural. are randomly placed on the screen as buttons.
The learner specifies his/her understanding of the morphotactics by selecting the morphemes in a certain order. If the analysis fails, the learner has the option of looking at the right analysis or making another attempt. Analyses may fail due to order violations (1a) or incorrect mapping of sound changes in the given phonological environment (1b).
(1)

*evimizlerde
b. *evlermizde
ev-POSS1p-PLU-LOC
ev-PLU-POSS1p-LOC
(possessive before number)
(no vowel elision)

In the synthesis mode, the learner picks out a root and a sequence of suffixes, and asked to provide the surface form for that sequence. This mode only evaluates the learners’ knowledge on phonological alternations. There is on-line help for explaining the labels for morphemes and their allomorphs without reference to the context in which a particular morph is used.
4. Computational Framework
The underlying morphological model is based on two-level view of morphology [6]. Morphological parsers such as PC-Kimmo [ 1],[2] , Keci [ 5] and Lexc [ 7] deal with the ordering constraints in a similar fashion, i.e., as a finite-state automaton (FSA). A FSA for this purpose can be thought of as a graph whose nodes are alternation names and whose arcs are possible continuations from a given alternation. Although this method seems to be quite sufficient for morphological description of a large family of languages, it is difficult at best and sometimes insufficient for cases when the language has both prefixes and suffixes, and the presence of a prefix(suffix) requires the presence of a suffix (prefix), as in Gusii [ 3]. A more general framework for morphotactic description is linear context-free grammars which are powerful enough to deal with this kind of phenomena and constrained enough to be a proper level for word grammars. Our word grammar uses Kimmo-2's [ 2] grammar writing facilities to simplify the morphotactic component of the morphological analyzer.
Another aspect of the morphological model we developed is that it is a feature-based description of Turkish morphology. Feature specifications simplify grammar writing by putting the burden of complex morphotactics on feature specifications. Describing complicated phenomena by cumbersome continuation classes leaves almost no hope for the system to report an ordering violation to the learner in an intelligible way. It also makes the system very hard to maintain and extend. Using features, continuation classes can be reduced drastically--in principle to two: derivations and inflections--and morphotactics can be written as traditional grammar rules. Feature values are combined by the process of unification, the failure of which signifies feature mismatch. For instance, in Turkish, the -ki suffix (relativizer) may be attached to any genitive or locative marked noun (2a), but not to nouns in other cases (2b). A unification-based grammar with only two continuation classes would report the problem in (2b) as a feature violation (on case feature) in an otherwise successful parse, but a finite-state network of morphotactics would simply halt in a non-accepting state without being able to report the source of the error.
(2)

a. ev-ler-de-ki
‘those in the houses’
b. *ev-ler-e-ki
house-PLU-LOC-REL

house-PLU-DAT-REL

Computational models for phonological alternations fashion themselves after generative phonology. Keci relies on ordered rules, whereas two-level phonology allows rules to be specified in any order. This is made possible by compiling all rules into a single rule by considering all feasible pairings of the surface form and the lexical form under all environments. We made use of a rule compiler (twolc) [ 8] to exploit this capability.
5. Conclusion
We aim to provide both the language learner and the linguistics students with an individualized learning environment. The tool helps students to do traditional exercises faster; also, and more importantly, it gives them an opportunity to understand the morphological structure of the language by analyzing and generating as many words as they want. From the computational standpoint, the facilities provided by two-level morphology for grammar writing allows two-way exercises and an extendible lexicon. The Web interface makes the system widely available outside the laboratories for private use. With these properties, the system is ready for application in the course curriculum for Turkish language teaching and a linguistics course. We intend to report our experience with this system after the first year of trial in a course offered in our university.
References
1. Antworth, E.L. (1990). PC-KIMMO: A Two-level Processor for Morphological Analysis. Dallas: Summer Institute of Linguistics.
2. Antworth, E.L. (1995). PC-KIMMO2 Manual, SIL.
3. Creider C., J. Hankamer, and D. Wood (1996). Mathematical Linguistics and Formal Language Theory, ms., UC Santa Cruz.
4.Crookes, G. and Chaudron, C. (1991). Guidelines for Classroom Teaching. Marianne Celce-Murcia (Ed.) Teaching Fnglish as a Second or Foreign Language. Newbury House. pp 46-67.
5. Hankamer, J. (1986). Finite-state Morphology and Left to Right Phonology. Proceedings of WCCFL 5, Stanford.
6. Koskenniemi, K. (1983). Two-level Morphology for Morphological Analysis. International Joint Conference on Artificial Intelligence, pp. 683-85.
7. Karttunen, L. (1993). Finite-state Lexicon Compiler. Technical Report ISTL-NLTT-1993-04-02, Xerox PARC, California.
8. Karttunen, L., K.R. Beesley (1992). Two-level Rule Compiler. Technical Report ISTL-92-2, Xerox PARC, California.
9. Krashen, S.D. (1982). Principles and Practice in Second Language Acquisition. Oxford: Pergamon Press.
10. Larsen-Freeman, D. (1986). Techniques and Principles in Language Teaching. Oxford University Press.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/

References: http://web.archive.org/web/19990225164509/http://lingua.arts.klte.hu/allcach98/abst/jegyzek.htm

Attendance: ~60 (https://web.archive.org/web/19990128030244/http://lingua.arts.klte.hu/allcach98/listpar3.htm)

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

Computer-aided Learning of Turkish Morphology

1. Izzet Pembeci

2. Cem Bozsahin

3. Deniz Zeyrek

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"