LEXSTATS: A program for the statistical analysis of word frequency distributions

poster / demo / art installation
Authorship
  1. 1. Harald Baayen

    University of Nijmegen, Max Planck Institute for Psycholinguistics - University of Nijmegen

  2. 2. Fiona J. Tweedie

    Department of Statistics - University of Glasgow

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


LEXSTATS: A program for the statistical analysis of
word frequency distributions

Harald
Baayen

University of Nijmegan
Max Planck Institute for Psycholinguistics
baayen@mpi.nl

Fiona
J.
Tweedie
Department of Statistics University of
Glasgow
fiona@stats.gla.ac.uk

1999

University of Virginia

Charlottesville, VA

ACH/ALLC 1999

editor

encoder

Sara
A.
Schmidt

Various computationally intensive statistical models are available for the
analysis of word frequency distributions (e.g., Carroll, 1967; Sichel 1975,
and Chitashvili and Baayen, 1993). These models provide linguists and
lexicographers with elegant means for obtaining sample-size invariant
characteristic textual measures, for extrapolating the development of the
vocabulary beyond sample sizes larger than the observed text size, and for
estimating the population vocabulary size.
Thusfar, these models have not been used widely, which is not surprising
given the absence of software implementing these models. At the conference,
we will present the beta version of LEXSTATS, a user-friendly GUI interface
to a series of C programs that implement a wide range of word frequency
analyses. LEXSTATS and the underlying C code will become available as
freeware under the GNU software license.
We will illustrate LEXSTATS by applying it to word frequency distributions of
various kinds of texts as well as to word frequency distributions of a range
of morphological categories.

References

J.
B.
Caroll

On Sampling from a Lognormal Model of Word Frequency
Distribution

H.
Kucera

W.
N.
Francis

Computational Analysis of Present-Day American
English

Providence
Brown University Press
1967
406-424

R.
J.
Chitashvili

R.
H.
Baayen

Word Frequency Distributions

G.
Altmann

L.
Hreibicek

Quantitative Text Analysis

Trier
Wissenschaftlicher Verlag Trier
1993
54-135

H.
S.Sichel
On a Distibution Law for Word Frequencies

Journal of the American Statistical Association

70

542-547
1975

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1999

Hosted at University of Virginia

Charlottesville, Virginia, United States

June 9, 1999 - June 13, 1999

102 works by 157 authors indexed

Series: ACH/ICCH (19), ALLC/EADH (26), ACH/ALLC (11)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None