Fun with Unix Tools

poster / demo / art installation
Authorship
  1. 1. Eileen Fitzpatrick

    LInguistics Department - Montclair State University

  2. 2. Steve Seegmiller

    LInguistics Department - Montclair State University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Fun with Unix Tools

Eileen
Fitzpatrick

Montclair State University
fitzpatr@sapir.montclair.edu

Steve
Seegmiller

Montclair State University

1999

University of Virginia

Charlottesville, VA

ACH/ALLC 1999

editor

encoder

Sara
A.
Schmidt

The Unix operating system provides a set of flexible text processing tools that
offer the user features beyond those of standard concordancers including the
ability to compare and manipulate different types of text. We show how simple
tools can combine to accomplish sophisticated tasks, using examples from
lexicography and phonology.
The creation of an English-Karachay (a Turkic language) dictionary involves
checking elicitations from native speakers, grammars, and glossaries against a
series of authentic texts for accuracy, citations, and words that do not appear
in our sources. Unix tools allow us to find words in context in the manner of
traditional concordancers, but they also permit the creation of three lists --
words only in the dictionary, words only in the texts, and words in both -- that
enable us to decide which words should be omitted from and which words entered
into the dictionary.
Spanish text-to-speech systems are set at a faster rate than English systems. One
reason for this might be that Spanish words and phrases are longer than their
English counterparts. Unix tools allow us to estimate average syllable length
per word by counting vowels in English and Spanish pronouncing dictionaries and
showing that, on average, Spanish words have twice as many syllables as English
words. Word counts of parallel corpora show that Spanish also has more words per
text than English.
The operations discussed here involve standard commands available on any Unix (or
Linux) system, do not require extensive training to use, and are re-usable for
widely varying applications.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1999

Hosted at University of Virginia

Charlottesville, Virginia, United States

June 9, 1999 - June 13, 1999

102 works by 157 authors indexed

Series: ACH/ICCH (19), ALLC/EADH (26), ACH/ALLC (11)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None