Patterns in a Rare Term: The Case of Parrots

  1. 1. Kevin J. Keen

    University of Manitoba

  2. 2. Paul Fortier

    University of Manitoba

Advances in computer technology, particularly in the area of graphics, mean that the person applying statistical techniques is no longer limited to testing hypotheses, which all too frequently simplify the underlying literary reality beyond recognition in order to achieve elegance in the statement of a statistical hypothesis. Software such as Minitab and JMP-IN facilitate the use of statistics as an exploratory tool, in a manner much more congenial to the aims and interests of scholars of literature.The French noun "perroquet" (parrot), and its pluralcertainly qualify as a rare term, given that they appear a total of 771 times in the 114,521,745 words of the ARTFL database. This tempts one to extend the coverage to the extent possible by adding the term "perruche" (parakeet). It can be quickly demonstrated that the distribution pattern of "perruche" is statistically independant from that of "perroquet". In other words such an extension of the semantic field is not justified.In dividing up the period covered by the holdings of the ARTFL database (1600-1964), it quickly becomes apparent that simply taking equal sized slices of the temporal continuum does not produce a good fit with the periodicity of French history. In any case, an equal number of words or even of texts is not found in equal temporal slices. Since relative frequencies are thus required, it makes sense to organise the data in terms of important periods of French history, changes of reign in the pre-revolutionary period, changes in regime after it.Once relative frequencies of occurrence of the word perroquet(s) have been determined for each period, as well as the realtive number of texts in each period which use the word at all, analysis can proceed. The tool chosen was control charts, a technique developed for industrial quality control, and now widely available in commercial statistical software. The underlying distributional model is the Poisson distribution (even though parrots do not eat fish), certainly the most appropriate for such a rare term.Application of the model to the data reveals three main high points in the use of the word perroquet(s). First is the period from the death of Louis XIV to the fall of Napoleon, which corresponds with a period of French dominance in Europe and colonial rivalry with England. A second series of high points extends from 1850 to 1879, again a period of colonial expansion, following the turmoil attendant on the fall of Napoleon and a series of revolts and revolutions aimed at working out a new socio-political system in France. Less easily understood is the third peak during the years 1908-1926, covering as it does the end of the "Belle Epoque", the First World War, and the first half of the roaring twenties.Although not able to explain all the patterns in the data, this exploratory analysis does indicate the influence of political and societal reality on the language of literature. Even the results that do not seem immediately to correspond to this reality are valuable because they foreground the area most worthy of further study. In any case the usefulness of control charts elaborated in terms of the Poisson distribution for the analysis of rare terms in a large database is demonstrated by our results.
