Using FoxPro in Educational and Academic Settings

Béla Hollósy

Authorship

1. Béla Hollósy

Department of English Linguistics, Institute of English and American Studies - Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Parent session

Linguistic Analysis of Large Corpora , Laszlo Hunyadi

Original URL

http://web.archive.org/web/19980716093117/http://lingua.arts.klte.hu/allcach98/abst/abs22.htm

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Introduction
The MA program in English Language and Literature has four tracks for specialisation in the third and fourth year. The Linguistics Track saw the introduction of Computational Lexicography four years ago, Computational Linguistics three years ago, Corpus Linguistics two years ago, and Database Management for Linguistic and Lexicographic Purposes as well as Computational Morphology last year, all initiated and taught first by the present author. Plans for the near future include Computer-aided Text Analysis, CALL, and Machine Translation. The number of undergraduates attending any of the above courses in the Computing Room has invariably been between 18 and 20, which has meant an efficient use of the computing resources (20 IBM networked PCs, with a Barco Projector and Internet access) available. Alongside with the new computational methodologies resorted to in the course of teaching, the author also manages an original research and development project (A Dictionary of Academic English) involving methods in computational lexicography, computational linguistics and corpus linguistics, and finds that FoxPro 2.6 for Windows has proved to be one of the most useful computational tools in both areas, partly because of its efficient and fast manipulation of data in database tables relationally linked to one another, partly because of its flexible and powerful programming capabilities for string handling [1], [14], [32]. Even in the days of object-oriented programming, a relational database management system serves useful purposes, especially if textual data analysis and text handling involving string and substring manipulation are at stake [5].
2. Using FoxPro in an Educational Environment
2.1. String Handling and Programming in FoxPro
2.1.1. Developing String Handling Skills
To facilitate students' access to the programming capabilities of FoxPro, they are gradually introduced to the most important string and text handling commands and functions, first in the interactive Command Window, later on in programs. These goals are served by the author's manuscript "An Introduction to Text Handling and Analysis", a guide which illustrates the use of memory variables, arrays, commands, and functions with original, linguistically oriented examples, and introduces students to the basics of structured programming. In addition to handouts illuminating the theoretical, empirical, and computational issues involved, they are regularly given Worksheets (altogether 12), which present them with increasingly complex problems. When they have done a Worksheet, they are presented with the solutions (hard copies as well as electronic files) and may compare their own results with them. This method ensures that students progress step by step and their instructor can give class/group/individual help as needed.
Command Language Syntax and Basic Programming in the FoxPro implementation of the XBase programming language form the introductory part of each computational course (invariably two contact hours over 14 weeks) and take up 3 weeks. This is followed by work with tables: creating tables, data input, indexing, querying and relationally linking tables (a further two weeks). Subsequently, each course branches off into directions dictated by the specificity of the subject matter (but differences begin with data input after the third week).
2.1.2. Developing Programming Skills
2.1.2.1. Computational Lexicography
In a course on Computational Lexicography [10], [15], [19], [24], [25], [26], [29], [30], [39] for example, students learn how to set up master database tables for different types of lexicographic projects (monolingual, bilingual, learner's dictionaries, thesauri, etc.) A master database table for a monolingual learner's dictionary database will typically have the following fields: headword, homonym, part of speech order code, part of speech label, sense number, valency, and definition, which will be indexed, except for the last two fields. Then other separate tables are set up to handle pronunciation and pronunciation variants, spelling and spelling variants, grammar codes, inflections, collocations, usage notes, style labels, cross-references, etc. Each student has a specific project (usually with some notional area as the target, such as gardening, sports, medicine, college life, linguistics, hobbies, travelling, shopping, cooking, etc., but also some formal arrangements, such as prefixed words, derivatives, compounds, idioms, phrasal verbs, etc.) which has some standard part, such as setting up a master table and a collocation table (the latter with additional fields such as collocation type order code, collocation type, collocation, collocation definition, collocation example, collocation example definition, illustrative sentence, of which the first three are indexed) and linking them relationally. They have to make sure they cater for differences between idioms, grammatical and lexical collocations by coding within the collocation type field. The relational link is created on a combination of 4 indexed fields (headword, homonym, part of speech label and sense number). Then, depending on their individual projects they set up other tables managing different types of information and linking them to the master table as child tables. The emphasis is not on the number of records keyed in but on giving sufficient lexical variation as to homonyms, parts of speech, different senses, phrases of various types, etc. to prove that students can handle their model with efficiency and ease. Even though FoxPro is eminently suitable for the efficient handling of model building, data storage and manipulation - which has been confirmed when large amounts of data (over 200 000 records had to be handled in a scholarly project run by the author) - it lacks certain features that a linguist would like to have access to. For example, indexing or sorting can only be done in either descending or ascending alphabetical order. If one wishes to look at the data in other ways, such as reverse-alphabetical order (all-important when one investigates suffixed forms), order by length of keys or frequency of occurrence (which are standard features in high-end concordancers, such as Micro OCP, for example), one has to do some programming to achieve one's goal. Some students opt for projects involving programming tasks of this kind.
FoxPro lacks the left-right justification feature, so when a dictionary database is ready and an automatic entry writing program is resorted to, the final output will have a rugged right-hand side. Only the ablest students try their hands at writing their own automatic entry generation program, of course but one student in particular, who has continued her Business Trictionary program as part of her PhD work, has made good progress in that task.
2.1.2.2. Corpus Linguistics
FoxPro programming techniques are used to good effect in Corpus Linguistics [2], [7], [28], [33], [37], [38] classes as well. Naturally, one may deal with corpora as text files by annotating them or marking them up in SGML code and using concordancers or other text analysis tools to retrieve patterns from them for many different purposes. This is one line of approach. Another, equally feasible method is to append text files to database tables to enable database handling tools, query procedures and dedicated programs to reveal their structural patterns utilising information drawn from database tables about grammatical categorisation and subcategorization of words, phrases and patterns. Students deal with a number of texts illustrative of Academic English that were scanned in earlier, and to facilitate conversion to database format, the ASCII files have a sequential naming convention (e.g. scien*.txt or educ*.txt, acad*.txt, univ*.txt, where the asterisk stands for numbers starting from 1 up to a theoretical 999). Thus, instead of having to open every single file and to append data from it to a database table with the 'delimited with tabs' option separately, the following program automates the task (Comments, invisible to the program, are provided after the double ampersand sign. End-of-line semicolons legitimate linebreaks.):
CLEAR && Clears the screen.
SELECT 1 && Selects work area number 1.
USE C:\fpw26\course\course1.dbf && Opens a database.
GO TOP && Goes to the top of the database.
DELETE ALL && Marks all records for deletion in the database.
PACK && Packs the database. (Records marked for deletion are removed.)
CLOSE DATA && Closes the database.
SELECT 1 && Selects work area number 1.
USE C:\fpw26\course\course1.dbf && Opens a database.
bf="c:\fpw26\corplin\scien " && Creates a memvar with the file path ;
and the four letters of the name of the file.
fc=0 && Creates a numerical memory variable and initialises it. 'fc' ;
is mnemonic for 'file counter'.
bc=0 && Creates a numerical memory variable and initialises it. 'bc' ;
is mnemonic for 'batch counter'.
@ 3,3 SAY "Path and batch filename:" && Prints this message in the Main FoxPro;
Window
@ 4,3 SAY "[N.B. The '.txt' extension will be automatically added.]" && Prints ;
this message in the Main FoxPro Window
@ 6,3 GET bf && Gets the memory variable bf
READ && Reads the alphabetical part of the file name provided by ;
the user.
@ 8,3 SAY "Number of files to be appended:" && Provides this message.
@ 10,3 GET fc && Gets the memory variable fc
READ && Reads the number provided by the user.
@ 12,3 SAY "Number of file starting the batch:" && Provides this message.
@ 14,3 GET bc && Gets the memory variable bc.
READ && Reads the number provided by the user.
DIMENSION s(fc) && Creates an array with the name <s> and ;
specifies the number of elements it has through the value in fc.
FOR i=1 TO fc && Starts a cyclical operation, the number of cycles ;
specified by the value in the memory variable fc.
s(i)=ALLT(bf)+ALLT(STR(bc))+".txt" && Assigns character ;
type values to the elements of the array by concatenation with ;
the numerical bit incremented by one at each step and converted ;
to character type.
APPEND FROM s(i) FIELDS TEXT FOR !EMPTY(TEXT)=.T. ;
DELIMITED WITH TAB && Appends records, with the exception of empty ;
ones, into the database field text from the text files specified above.
bc=bc+1 && Increments the filr counter by one.
ENDFOR && Ends the cycle.
BROWSE && Browses the database.
Scanned texts consist of lines rather than sentences as discrete units, that is why the most important task after appending ASCII text files into a database table as records is to manipulate the records programmatically to arrive at sentences as records from lines of text. This is done via a series of operations, the first being the removal of chunks of text after a punctuation mark, such as the full stop, the question mark, and the exclamation mark, to the beginning of the next record. This is carried out recursively, making sure that chunks of text are added to subsequent records only if the length of the new record does not exceed 254 characters, the maximum possible length of a record in FoxPro. This initial step is then followed up by a suite of dedicated programs that are designed to correct the errors that necessarily occur in the first global step by rejoining chunks of text to the end of previous records if, e.g. the separation occurred at the boundary of an abbreviation or the record starts with a small letter, etc. In this way, the database is reduced to its third or fourth in size, while the resulting sentence length records, which, in most cases, will be longer than the original lines, will serve the purposes of text retrieval much better. Of course, some sentences are longer than the maximum length of a record. In such cases, a separate routine can be written to take care of retrieval from adjacent records representing parts of a sentence. Students analyse the programs, compare them with similar tokenisation programs in the literature [9], [12], [13] and may propose improvements. Also, they may be asked to perform a similar task using the find and replace facility with pattern matching in Microsoft Word with a view to converting lines to sentences as units before appending the text to a database table in FoxPro [20].
Following this, text retrieval operations are carried out where collocations of various types are extracted. The students are presented with some sample programs which they run, analyse and rely on for creating their own text retrieval procedures, which will finally be evaluated by the instructor.
2.1.2.3. Computational Morphology
Whereas in a Corpus Linguistics course the emphasis is on units larger than the word, in a course on Computational Morphology [31], [36] the internal structure of words is in focus. Affixation, concatenative morphology, the interaction of phonological and morphological rules tend to be the areas dealt with. One of the programs written specially for the course extracts words ending in a group of suffixes stored in a database table from a large textual database and outputs the results in another database table. Students experiment with setting up different types of affix lists in tables and learn how to manipulate raw texts with the aid of morphological queries.
2.1.2.4. Educational Benefits
There are over ten databases (textual and word list databases) as well as over sixty programs, procedures (including user-defined functions) stored in the various subdirectories on the network drive and on the hard disk drives of student machines in the Computing Room available for students to run, analyse and vary or improve on for the purposes of computationally oriented courses. Students may opt for a single course or may take any combination of them in the span of two years. FoxPro 2.6 for Windows has proved invaluable in providing a common ground for different computational courses and in equipping students with the minimum of programming skills to do simple or complex but original morphological, syntactic, textual and lexicological tasks. The interactive nature of their work, the immediate feedback or explanation they get via the data projector from their instructor, and the possibility to explore new avenues when working on their own all add up to something entirely new in our curriculum structure and have important educational as well as academic consequences. The number of students writing their MA theses on some linguistic topic with a computational angle is on the increase. Two of my graduate students, Gabriella Szakal and Agoston Toth have applied with a poster for ALLC/ACH'98.
3. Some Academic Aspects of Database Management and Programming in FoxPro
3.1. The rationale for using database management systems for linguistic and lexicographic computing
In spite of the fact that there is some partly justified scepticism in the literature [27, pp. 194-195], relational database management offers the right tools for model building in a number of fields, as e.g. in computational lexicography [37, p. 148]. It is argued here that there are parallel solutions to linguistic and lexicographic data handling problems, such as e.g. 1) text annotation or markup (using, e.g. SGML) [22], [35] coupled with parsers and concordancers, 2) hypertext in an authoring environment (e.g. Hypercard, Toolbook), 3) fixed-length field database management systems (e.g. Access, FoxPro, DBase IV, Paradox, Clipper), 4) free-form database management systems (e.g. AskSam 3.0). Each of these solutions can offer certain features that the others lack but there is also a lot of overlap in output if not in method. (On the intricate link between corpora, concordances and databases [27].
3.2. Some advantages and disadvantages of using FoxPro
The various tools and resources offered by a relational database management system, such as FoxPro 2.6 for Windows well serve the purposes of not just business or scientific data storage and manipulation but also those of humanities-based computational work in general and linguistic as well as lexicographic projects in particular. The most important features that make FoxPro ideal as a PC-based database management system are as follows: 1) a powerful implementation of the XBase Language, 2) a streamlined incorporation of the SQL standard for queries, 3) fast data and table handling operations, 4) the capability of relationally linking tables to yield complex databases, 5) flexible and efficient hierarchical indexin, 6) the possibility of managing very large corpora, 7) a user-friendly interface, 8) the possibility of creating a self-contained application in the form of a compact executable file, 9) the possibility of importing work done to Visual FoxPro 3.0 or 5.0, an object-oriented sequel to Fox-Pro 2.6 for Windows.
While the advantages of using FoxPro far outweigh the disadvantages, unfortunately the program lacks some important features: 1) different formatting of data within records, 2) a versatile text editor with word processing capabilities, 3) simultaneous justification of text on the left AND on the right side, 4) reverse-alphabetical ordering, 5) pattern matching with regular expressions, 6) a data dictionary (this is rectified in Visual FoxPro). This means either that certain operations need to be carried out via links to other Windows programs or dedicated user-defined functions and procedures have to be written to overcome the difficulties concerned.
3.3. Using FoxPro as a database management system of corpora and a programming engine for a Dictionary of Academic English
The present author is engaged in managing a lexicographic project aiming at creating a Dictionary of Academic English. Academic English [16], [17], [21], [40] is carefully distinguished from Pragmatic English on the one hand and Rhetorical English on the other hand. The dictionary is envisaged in three formats (an alphabetical dictionary, a notional thesaurus and a functional guide). The output of this dictonary project is expected to be a CD as well as a printed version. This work involves the modules of model building [3], [5], [23], corpus creation [4], [34], tokenisation [9], [12], [13], lemmatisation [6], [15], [30], text retrieval [3], [6], [7], homonym separation [39], sense disambiguation [33], [34], and entry writing among other things. Nearly all operations involved are handled within FoxPro. The model is a relationally linked set of database tables (master core database, master collocation database as well as other databases handling pronunciation, usage, grammar, etc.) It is necessary to devise the entire flowchart for the various data-entry, data validation, editing and other auxiliary operations. As the container databases with the raw textual data keep growing (currently just over 200,000 records), frequent test runs of varied collocation pattern retrieval are needed, accompanied by statistical evaluations. (On computational tools to handle collocations see [3], [7], [8], [11], [21]. As editorial work on the master databases progresses, an automatic entry generator is needed, which will produce both the test runs of various entries and the final format of the printed version.
REFERENCES
1. Microsoft FoxPro 2.6 for Windows. Language Reference Relational Database Management System for MS-DOS and Windows (1989-1993) Microsoft Corporation
2. Aijmer, Karin Altenberg, Bengt (1991) (eds.) English Corpus Linguistics Longman London
3. Atkins B.T. Sue (1992) Tools for computer-aided corpus lexicography: the Hector Project in Kiefer (1992) pp. 1-59
4. Atkins, Sue Clear, Jeremy and Ostler, Nicholas (1992) Corpus Design Criteria Literary and Linguistic Computing Vol. 7, No. 1 pp. 1-16
5. Bowers, David S. (1993) (ed.) From Data to Database Second edition Aspects of Information Technology Chapman & Hall London, etc.
6. Butler Christopher S. (1985) Computers in Linguistics Basil Blackwell Oxford
7. Butler, Christopher S. (1992) (ed.) Computers and Written Texts Applied Language Studies Blackwell Oxford UK and Cambridge USA
8. Clear, Jeremy (1993) From Firth Principles Computational Tools for the Study of Collocation in Baker et al (eds.) Text and Technology John Benjamins Publishing Company Philadelphia/Amsterdam pp. 271-292
9. Coniam, David (1993) A Prototype Boundary Marker in Baker et al (eds.) Text and Technology John Benjamins Publishing Company Philadelphia/Amsterdam pp. 253-270
10. Courtin, Jacques Dujardin, Daniele and Kowarski, Irene (1992) "PILAF": Software Tools for Lexicography and Linguistic Research in Kiefer (1992) pp. 113-121
11. Fontenelle, Thierry (1992) Collocation acquisition from a corpus or from a dictionary: a comparison in Tommola et al (eds.) EURALEX'92 Proceedings I-II Part I Tampere pp. 221-228
12. Grefenstette, Gregory (1996), Approximate Linguistics in Kiefer et al (ed.) (1996) pp. 83-96
13. Grefenstette, Gregory Tapanainen, Pasi (1994) What is a Word, What is a Sentence? Problem of Tokenization in Kiefer (1994) pp. 79-87
14. Grommes, Bob Beem, W. et al (1993) Inside FoxPro 2.5 for Windows New Riders Publishing Carmel, Indiana
15. Hartmann, R.R.K. (1983) (ed.) Lexicography: Principles and Practice Applied Language Studies Academic Press, Inc. London, etc.
16. Hollósy, Béla (1989) On the Need for a Dictionary of Academic English BUDALEX'88 Proceedings Papers from the EURALEX Third International Congress Budapest, 4-9 September 1988 pp. 535-542 Akadémiai Kiadó
Budapest
17. Hollósy, Béla (1993) Reflections on Academic Discourse from a Lexicographic Point of View in Hollósy et al (eds.) (1993) pp. 17-24
18. Hollósy, Béla Korponay, Béla and Laczkó, Tibor (1993) Studies in Linguistics II. A Supplement to the Hungarian Journal of English and American Studies Debrecen
19. Hollósy, Béla (1994) Corpora, Concordancing and Database Management for Lexicography in Studies in Linguistics III pp. 111-123 Debrecen
20. Hollósy, Béla (1996) Compiling a Dictionary of Academic English (Progress Report) in English Studies and the Curriculum Proceedings of the First TEMPUS-JEN Mini Conference Debrecen pp. 33-45
21. Howarth, Peter Andrew (1996) Phraseology in English Academic Writing. Some implications for language learning and dictionary making Lexicographica Series Maior 75 Max Niemeyer Verlag Tübingen
22. Ide, Nancy Véronis, Jean et al (1992) Principles for encoding machine readable dictionaries in Tommola et al (eds.) (1992) pp. 239-246

23. Ide, Nancy and Véronis, Jean (1996) Modelling Lexical Databases in Research in Humanities Computing 4 Selected Papers from the ALLC/ACH Conference, Christ Church, Oxford, April 1992 Clarendon Press Oxford, pp.193-206
24. Kiefer, Ferenc Kiss, Gábor and Pajzs, Júlia (1992) Papers in Computational Lexicography Complex '92 Research Institute for Linguistics, Hungarian Academy of Sciences Budapest
25. Kiefer, Ferenc Kiss, Gábor and Pajzs, Júlia (1994) Papers in Computational Lexicography Complex '94 Research Institute for Linguistics, Hungarian Academy of Sciences Budapest
26. Kiefer, Ferenc, Kiss, Gábor and Pajzs, Júlia (1996) Papers in Computational Lexicography Complex '96 Research Institute for Linguistics, Hungarian Academy of Sciences Budapest
27. Kirk, John M. (1994) Corpus - Concordance - database - VARBRUL Literary and Linguistic Computing Vol. 9, No. 4 pp. 259-265
28. Leitner, Gerhard (ed.) (1992) New Directions in English Language Corpora. Methodology, Results, Software Developments Mouton de Gruyter Berlin, New York
29. Masereeuw, Pieter C. Serail, Iskandar (1992) DictEdit: a computer program for dictionary data entry and editing in Tommola et al (eds.) (1992) pp. 257-264
30. Meijs, Willem (1992) Computers and Dictionaries in Butler (ed.) (1992) pp. 141-165
31. Ritchie, Graeme D. Black, A. W., Russell, Graham J., and Pulman, Stephen G. (1992) Computational Morphology Practical Mechanisms for the English Lexicon ACL-MIT Press Series in Natural Language Processing A Bradford Book The MIT Press Cambridge, Massachusetts London, England
32. Siegel, Charles (1993) Mastering FoxPro 2.5 Special Edition Sybex San Francisco, Paris, Düsseldorf, Soest
33. Sinclair, John (1992) The automatic analysis of corpora in Svartvik (ed.) (1992) pp. 379-397
34. Sinclair, John (1987) (ed.) Looking Up An account of the COBUILD Project in lexical computing and the development of the Collins COBUILD English Language Dictionary HarperCollins Publishers London
35. Sperberg-McQueen, C.M. - Burnard, Lou (eds.) (1994), Guidelines for Electronic Text Encoding and Interchange (TEI P3) Text Encoding Initiative, Chicago, Oxford
36. Sproat, Richard (1992) Morphology and Computation ACL-MIT Press Series in Natural Language Processing. A Bradford Book The MIT Press Cambridge, Massachusetts London, England

37. Stubbs, Michael (1996) Text and Corpus Analysis Blackwell Publishers Oxford
38. Svartvik, Jan (1992) (ed.) Directions in Corpus Linguistics Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991 Trends in Linguistics Studies and Monographs 65 Mouton de Gruyter Berlin, New York
39. Svensén, Bo (1993) Practical Lexicography Principles and Methods of Dictionary-Making Translated from the Swedish by John Sykes and Kerstin Schofield Oxford University Press Oxford New York
40. Swales, John M. (1990) Genre Analysis. English in academic and research settings Cambridge University Press Cambridge

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"

Hosted at Debreceni Egyetem (University of Debrecen) (Lajos Kossuth University)

Debrecen, Hungary

July 5, 1998 - July 10, 1998

109 works by 129 authors indexed

Conference website: https://web.archive.org/web/19991022041140/http://lingua.arts.klte.hu/allcach98/

References: http://web.archive.org/web/19990225164509/http://lingua.arts.klte.hu/allcach98/abst/jegyzek.htm

Attendance: ~60 (https://web.archive.org/web/19990128030244/http://lingua.arts.klte.hu/allcach98/listpar3.htm)

Series: ACH/ALLC (10), ACH/ICCH (18), ALLC/EADH (25)

Organizers: ACH, ALLC

Using FoxPro in Educational and Academic Settings

1. Béla Hollósy

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1998

"Virtual Communities"