Human Aging in the Trésor de la Langue Française

  1. 1. Paul Fortier

    Centre on Aging - University of Manitoba

In one of the many satyrical passages of his novel La Nausée, Jean-Paul Sartre has his narrator describe the historianÂ’s craft: “Dans notre partie, nous nÂ’avons affaire quÂ’à des sentiments entiers sur lesquels on met des noms génériques comme Ambition, Intérêt” [“In our area, we deal only with uncomplicated feelings on which we stick generic names like Ambition, Interest” my translation] (Sartre, 1938, p. 8). This same temptation to over-simplify can be identified in computer-aided studies of literature. We deal in the statistics of spelled forms: after all, grouping words, particularly in a language like French, under their dictionary forms would cause a lot of extra work, and we can always claim that any upgrading introduces human error. This paper describes progress in a large-scale project which accepts the possibility of human coding error to deal with semantically and morphologically complex sets of words designating phenomena of interest, in spite of the added complexity at the computing and data processing level.


A series of projects being carried out at the University of Manitoba is examining how human aging, Alzheimer disease and other forms of senile dementia are portrayed in writings produced in France between 1789 and the 1950s. The choice of France is suggested by the fact that from the 17th century until the present day, France has been the European country with the largest proportion of old people in the population. The period covered begins with the Revolution of 1789 which removed from France the last vestiges of feudal privilege and began the transformation into a modern liberal industrial state, and ends with the 1950s when the introduction of the common use of antibiotics extended life expectancy quite sharply, thus changing the biological reality underlying the concepts of “old” and “aging”.

Study of this period is facilitated by the existence of the Trésor de la Langue Française database. This database was set up by a committee of scholars originally to reflect elite usage of the French language to facilitate the preparation of a new dictionary of modern French. A concerted effort was made to include important writings of all types, and to distribute the texts chosen in an homogenous fashion over the period in question. All in all, 1079 texts were included in the database, divided into 388 works of non-fiction prose, 379 novels, 183 plays and 129 collections of poetry. Almost 74 million words of text are contained in the database, making it one of the largest coherent corpora – as distinguished from a collection of texts – available to researchers to-day (see Imbs, 1971, vol. 1, preface). It can reasonably be taken to be a reflection of thinking by those who had their hands on the levers of power in France between 1789 and the 1950s.

Approximately 110,000 potential allusions to aging are found in the database. After one has removed occurrences of the vocabulary of aging applied to concepts and things, approximately 60,000 allusions to human aging can be found. Preliminary analysis of the data shows that towards the middle of the 19th century the outlook on aging shifted from positive to negative (Fortier et al. 1997). Further analysis is ongoing but faces a paradox: once potential allusions to human aging have been converted by human intervention into authentic allusions to human aging, statistical analysis is precluded by the fact that the data are now no longer comparable from one text to the next.


The approximately 64 words potentially evoking aging can be searched in the ARTFL database using 41 search strings submitted to the philologic software. The software returns each word in approximately 300 characters of context. These results have been downloaded and converted to wordprocessor format for all the texts in the original ARTFL database. These downloaded word-processor versions of the potential vocabulary of aging have now all been disambiguated by research assistants, a very time-consuming but not particularly interesting process.

Figure F1
Figure 1: Potential Allusions to Human Aging, and Authentic Allusions to Human Aging: 1789-1859


Results have now been cross-checked and verified for the first 345 texts, covering all genres in the period 1789-1859. In this period, the texts recorded total 26,600,008 words. Individual texts range from 643,636 words to 1,159 words in size. Within this twenty-six million words, 34,055 are part of the vocabulary of aging, and the number of words concerning aging varies between 1406 and 0 in any given text. The vocabulary of human aging occurs 14,552 times in these texts, varying between 963 and 0 in individual texts.

Since the size of the texts varies so widely, it makes sense to normalize the results in terms of a base of 100,000 words. That is to say the raw frequencies for each text are divided by the number of words in the text, and then multiplied by 100,000 so that the number of words in the text not have an overwhelming influence on the results. When the set of normalized frequencies is sorted into descending order of relative frequency of the potential vocabulary of aging in each text, and the frequencies of the potential vocabulary of aging are plotted along with the number of allusions to human aging in each text, the appropriateness of the approach used becomes evident (see Figure 1). Although the top line (potential human aging) is relatively smooth, the bottom line (authentic allusions to human aging) is quiet jagged, showing that although one frequency is necessarily a subset of the other, there is quite a bit of variation in the proportion of words that evoke human aging from one text to the next.

Further Analysis

Once one has identified all the common words evoking human aging in a given text, the task is not entirely finished. Once a person, whether identified by a proper noun like “M. Dupont”, or by a common noun like “father”, has been clearly labelled as old in a text, every time that person is alluded to in the text becomes an allusion to aging. My assistants are currently upgrading texts that have been checked and verified, by adding these allusions to the vocabulary of human aging on a text by text basis. Such modifications will further differentiate texts in terms of number of allusions to human aging, but will facilitate providing a more accurate image of how aging was presented in the period being studied.

The database of extended allusions to human aging will be used as input to a process of identifying allusions to senile dementia in the Trésor de la langue française database, for senile dementia must be found where the vocabulary of human aging and the vocabulary of dementia co-occur.


1. Fortier, Paul et al. 1997. “Change Points: Aging and content words in a large database.” Literary and Linguistic Computing 12, 1:15:22.
2. Imbs, Paul. 1971-94. Le Trésor de la langue française: Dictionnaire de la langue du xixe et du xxe siècle. 16 vols. Paris: C.N.R.S.

Conference Info



Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

  • Language: English
