From Millions of Words to a Single Phrase: Examination of the TXM Software in Hebrew

poster / demo / art installation
  1. 1. Maxim Lengo

    Bar-Ilan University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Following the technological development during the second half of the 20th Century, linguistic researchers have begun for the first time to analyze corpora of big data. One research methodology, which was developed for lexicometry and text statistical analysis, is Textometry. I.e., the attempt to combine various statistical analysis techniques, such as factorial correspondence analysis (Benzécri, 1977) and hierarchical ascendant classification (Ward Jr, 1963), with full-text search techniques such as kwic concordances (Luhn, 1960), in order to trace the precise original editorial context of any textual event participating to the analysis.The TXM software (Heiden, 2010), which was developed in France as a modular platform of a new generation of textometrical research and which I adapted to Hebrew, gives the ability to analyze a large corpus of texts as in my research, by using tools and methods based on linguistics and discourse analysis, that is, decomposing the text into factors and elements, carrying out statistical analysis, identifying the hidden social patterns, and then restoring the corpus to its original mode.As a Ph.D. student in the discipline of Social Sciences, my research focuses on the image repair theory (Benoit, 2015) as an ensemble of strategies such as evading responsibility and reducing offensiveness, used by individuals, organizations and groups in order to repair their image during times of crisis. It examines the ways in which rhetorical measures are used in online verbal exchanges among users of the online social networks who attempt to repair their personal image. To achieve my goal, I use the TXM software to analyze a corpus of more than eight million words in 365 Facebook posts, which were published by the Israeli Prime Minister Benjamin Netanyahu during his current affairs, and more than 285,000 comments, made by the users. Netanyahu's Affairs are four police investigations in which he is involved as a suspect or has given a testimony.During the poster session, I will present a review of some tools I used with the TXM software during the digitized analysis process in my Ph.D. research and the way they allowed me to recognize the following key phrase used by Netanyahu: "They have the media, we have you". Among these tools are Progression, which is the frequency of occurrence of one term throughout the text corpus; Co-occurrence, which is the frequency of occurrence of two terms in a text corpus alongside each other in a certain order; and Specificity, which is the score a term is given based on its occurrence in the corresponding part of the corpus relative to the one in the entire corpus, and indicates whether it is overused, underused or useless.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at Data for this conference were initially prepared and cleaned by May Ning.

Conference website:


Series: ADHO (15)

Organizers: ADHO