National Research Unversity Higher School of Economics
The goal of the study is to show links between lexical and social diachronic change. The study is conducted in the culturomics framework (Michel et al., 2011). In contrast to the Big data approach the study promotes the idea of medium data, i.e. amount of data which allows both to make quantitative and qualitative analysis (Bonch-Osmolovskaya, 2015). The main characteristics of the medium data are:
The reliability of sources, which metadata can be filtered manually
The sufficiency of the data amount for reliable statistical measures
The possibility of additional semantic mark-up
The research is based on the data from Russian National Corpus (ruscorpora.ru) (see Plungian, Sitchinava, 2003). The study pursues changes of context frequencies for the lexeme road in the period from 1800 till 2000, and correlates the observations with social and economic progress as well as change in conceptual language space
Choice of concept
Russia is a big country, so transportation has been traditionally a critical problem. The choice of the word
road for culturomics study is based on our expectations of the concept’s centrality for the economy, society and culture in Russia of the 19
Road appears to be a productive sign in terms of semiotics of art (Tchepanskaya, 2003,), that’s why I expected to collect numerous relevant contexts both in fiction and nonfiction. At the same time
road in Russian has several meanings, the nature of its polysemy has been treated a lot in previous works (Arutiunova, 1999). We can distinguish three basic meanings which are contrasted by the position of Observer (Paducheva, 2006) – the one that percepts the
road. The first meaning is
road as a physical object, a line on the ground the observer sees while standing on it. It can be characterized by the quality of its surface or surrounding landscapes (i.e. dirty road). The second meaning is
road as a vector, a line on a map, that connects two points (i.e central road). The observer operates in this case with the abstract idea of the road’s topology. The third meaning is metonymical and it stands for the travel-event the Observer experiences while moving along the road (i.e. tedious road). Finally due to semiotic abundance
road is frequently used in metaphorical sense (i.e. life path = “road of life”). At the same time, the first three meanings present the most important parameters that determine mobility of population: quality of roads, connectedness between localities and time and quality of journey. Therefore, it seems insufficient to track frequency change of
road occurrences in the corpus in general, but it is important to distinguish how the frequency of different meanings has been changing.
Different meanings of
road can be captured by attributive constructions as adjectives usually refer to only one sense. The corpus has been divided into 7 time periods from 1800 to 2000. To make the sub-corpora comparable the 19
th century has been divided into two periods of 50 years and the 20
th century into five periods of 20 years. The contexts, containing constructions of adjective plus
road has been extracted from every sub-corpus. The noisy entries has been removed, the data has been lemmatized and normalized as ipm. As a result, I obtained a database with 15000 constructions, containing more than 1500 unique adjectives.
On the next step, all the adjectives have been categorized by 20 semantic domains. The domains correlate with four basic meanings of
road defined below but render more specific characteristics of different
road parameters. The most frequent construction “zheleznaya doroga” (literary
metal road, meaning railroad) has been selected in a separate category.
Then I applied hierarchical clusterization to the data of 20 categories, see Figure 1
Figure 1: Clusterization of semantic categories for adjectives describing road
The data of the categories in one cluster has been summarized and then plotted on the graph (see Figure 2)
The data allows plenty of research scenarios, comparing different domains, such as, for example:
sources of domain contents and diachronic change of the distribution of sources (for example, fiction or nonfiction)
widening or narrowing of the category through time (how many adjectives are in the category), as a well as persistence of the content to time and economical or social changes
substitution of one adjective to another (for example, all the changes connected with replacement of horses to cars) and its time frame
migration of an adjective from one adjective to another
In this abstract, I will focus on the most prominent changes of cluster graphs. As Fig 2 shows the railroad (RR) cluster and the direction and centrality(D and C) cluster are the most distinctive in their behavior. In the beginning of the 19
th century, the existence of big central roads from one town to another completely determined mobility opportunities of Russian population. We see that more than 50% of all the occurrences of
road are associated with D and C attributes ( Warsaw road, big road – as a specific term of central road). In the 1851, the railroad between Moscow and St.Peterburg has been open and this fact nicely correlates with the crossing of the RR and D and C graph in the period of the 1850s. The intensive growth of the RR cluster in the second half of the 19
th century reflects not only the growth of railroad communications in Russia but also great conceptual influence of the railroad innovation, which can be also traced in numerous literary pieces of this period such as Tolstoy’s Anna Karenina or Dostoevsky’s Idiot. The intriguing fact about RR cluster is its consistent fall in the 20
th century that may of course correlate with developing automobile transportation. The sharp fall of RR cluster in the 1960s corresponds to the growth of civil airlines; see Figure 3 that demonstrates quite the opposite trend for air transportations starting from the 60s
Figure 3: Graph of air transportation, line with triangles marking passenger traffic
The most important generalization that arises from the observations above is that in the 20
th century the topological (vector) meaning of
road is consistently fading while the reference to a road as a physical object on the contrary increases in frequency. In other words, while economic and industrial growth results in diversity of mobility means,
road as a concept in lexical space has changed the balance of its meanings reducing the
connection idea. At the same time the metonymic usage of
road as a journey has been increased in the 20
th century as a well as the metaphorical usage, the both categories are very similar in their data values so they have formed one common cluster. In 1960s, the
connection idea is transferred from direct usages of D and C cluster to figurative usage of Journey and Metaphor cluster. This means that we can document the moment when the idea of
connection is separated from the physical movement along the road. The tedious road now is sitting in the airport for many hours waiting for your flight.
Arutiunova, N. D. (1999). Put’ po doroge I bezdorozhju [The way on the road and off the road]. In Arutiunova N.D., Shatunovskii I.B. (eds.),
Logicheskii analiz yazyka. Yazyki dinamicheskogo mira [Logical analysis of language. The Languages of the Dynamic World]. Dubna, pp. 3-17.
Bonch-Osmolovskaya, A. A, (2015). Medium data method for cultural studies: the case of gender studies in Russian National Corpus, Proceedings of Digital Humanities, Sydney.
Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., Aiden, E. L., et al. (2011). Quantitative analysis of culture using millions of digitized books.
Science, 331(6014): 176-82.
Paducheva, E. V. (2006). Nabludatel’: tipologiya I vozmozhnye traktovki [The Observer: Typology and Possible Interpretations]. Trudy mezhdynarodnoy konferentsii “Dialog-2006”. [Proceedings of International Conference “Dialog-2006”]. Moscow, RSUH, pp. 403-13.
Plungian, V. A., Sitchinava, D. V. (2004). Natsionalny korpus russkogo jazyka:oput sozdaniya korpusov tekstov sovremennogo russkogo jazyka [Russian national corpus: experience of creating corpora for contemporary Russian]. In Beliaeva et al. (eds),
The proceedings of the international conference “Corpus Linguistics-2004”, Saint-Petersburg, pp. 216-39.
Tchepanskaya, T. B. (2003).
Kultura dorogi v russkoi miforitual’noi traditsii XIX-XX vekov [Road culture in Russian mythic and ritual tradition of the 19th-20th centuries]. Moscow, Indrik Publ.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.