An hypothesis of formalization of literary data for text analysis: a case study on Karl Kraus' writings

paper
Authorship
  1. 1. Daniela Alderuccio

    ENEA/UDA

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


An hypothesis of formalization of literary data for
text analysis: a case study on Karl Kraus' writings

Daniela
Alderuccio

ENEA/UDA (Italy)
alderuccio@casaccia.enea.it

2002

University of Tübingen

Tübingen

ALLC/ACH 2002

editor

Harald
Fuchs

encoder

Sara
A.
Schmidt

Introduction
The growing availability on the Web of literary heritage is going to make
easier humanistic researches, on the one hand facilitating access to
information sources and documents and on the other hand providing a
knowledge representation of texts, enabling its sharing and reuse. One of
the major problems to face in knowledge representation is the formalization
of literary data. The main difficulty is to capture the richness of word
meanings into an established form, which allows automatic data treatment,
preserving the essence of a thing anyway.
This challenge is related to the different nature of Computer Science and of
the Humanities. The former has its foundation in establishing a formal
representation of what exists (formal languages and modeling of reality);
the latter is based on interpretation, whose subjectivity escapes from
classification or rules. It is recognized that accuracy in literary analysis
is related to cultural background and literary sensibility, but the
underlying ambiguity of natural languages poses to researchers further
difficulties: a specific term may have different or contradictory meanings
and intepretations; authors frequently use different words or expressions to
refer to the same meaning
By developing common formalisms, Computer Science tools aim at reaching a
sharable agreement on world representation. Similarly, in order to give an
objective basis to concepts (starting point of the analysis), an application
of this formal approach in the literary domain may allow experts to define
and share a common vocabulary, to reach an agreement on word senses, thus
reducing ambiguity.
In the hypothesis proposed in this paper, the use of a reference tool (such
as an ontology»An ontology is a specification of a
conceptualization (…)That is, an ontology is a description (like a
formal specification of a program) of the concepts and relationships
that can exist for an agent or a community of agents« in T. Gruber.
»What is an ontology?« URL
(T. R. Gruber. A translation approach to portable
ontologies. Knowledge Acquisition,
5(2):199-220, 1993)) seems to offer a means to face this
challenging task with success: by keeping from misunderstanding in reading
texts and by limiting subjectivity in their analysis, the first expected
result is a better comprehension of literary phenomena; by improving
knowledge representation of a literary text, the second effect of
formalization is the retrieval of more relevant texts for research
purposes.

Application and Results
In the analysis of a literary phenomenon, some of the aspects to be
considered are:
the ambiguity of natural languages, that poses to experts problems
in order to limit subjectivity in interpreting texts;
and the heterogeneity of information sources to select
(historical, cultural, geo-political), that determines the need of
retrieving relevant documents for the analysis.

Identifying criteria able to deepen the study of a literary phenomenon and to
extract interesting documents on that subject, would be of great utility.
The adoption of a linguistic resources (namely the ontology of WordNet [11])
as reference tool, seems to be a viable idea in order to reach both
goals.
In order to test this approach in humanistic research, the "Dualism Truth vs.
Propaganda" [2] in Karl Kraus has been investigated, using WordNet, the
on-line reference system designed at the Cognitive Science Laboratory of the
University of Princeton, to model lexical memory. Kraus was an Austrian
intellectual and one of the bitterest satirists of fin-de-siècle Vienna, to
be compared with Jonathan Swift for his satiric vision and command of
language. He was a critic, a playwright, a poet, a journalist and the editor
of the magazine "The Torch" - Die Fackel [8]) - for about 36 years. Strongly
believing in a language as a medium to express the truth, one of his major
concerns was the German language and its misuse by the press. As a
journalist he believed in informing the public rather than overwhelming it
with propaganda: his main goal was to report facts, instead of interpreting
them. Referring to this informative function of journalism, he wrote: "My
duty is to say the Truth to Mankind" " Mein Pflicht ist es, den
Menschen die Wahrheit zu sagen", Kraus K.: Die Fackel, Band 11, no.
852-856 (May 1931), p. 95
Basing on Kraus' writings, the literary phenomenon under analysis has been
synthesized into four keywords: "Language", "Truth", "Journalism",
"Propaganda". The meanings of these selected terms have been defined using
WordNet concept disambiguation. Because in this lexical database English
nouns, verbs, adjectives and adverbs are organized into synonym sets called
synsets (each representing one underlying lexical concept), disambiguation
is based on lexical and semantic relationsLexical relationships:
synonimy, antonimy, polisemy. Semantic relationship: hyponymy,
hyperonimy. with other concepts.
Examination of WordNet definitions has led to: the exploration of keywords
meanings; the delimitation of their semantic fields; and the finding of
other related couples of opposing concepts such as: Truth vs.
Verisimilitude, Language vs. Paralanguage, Journalism vs. Propaganda. The
application of this ontology-based approach has been able to improve the
comprehension of the "Dualism Truth vs. Propaganda" in Karl Kraus
(1874-1936). As main consequence, by using WordNet it has been possible to
study the literary phenomenon under analysis, confirming the validity of
Kraus' position towards information problems and finding the core of the
antagonism between "Propaganda and Truth".
As far as the second goal of this research is concerned (that is to find more
relevant text for analysis), in order to apply the proposed approach, two
sets of Kraus’ aphorisms (Kraus, 1955) - »Writing and Reading« and »By
Night«[4] "Writing and Reading" and
"By Night" have been extracted from
"Dicta and Contradicta" (Sprueche und Widersprueche), a selection of
aphorisms appeared in "The Torch" and published in 1909. - have
been digitized. Then, by a human indexing operation performed using the
ontology contained in WordNet, it has been assigned to each aphorism a
category, based on semantic fields. The above selected keywords (»Language«,
»Truth«, »Journalism«) have been adopted as indicator of semantic fields.
Each aphorism has been labelled by the presence/absence of these fields.
Despite the fact that »By Night« has no occurrences of the keyword
»Journalism«, human analysis shows that it contains two relevant
aphorisms"Wort und Wesen: das ist die
einzige Verbindung, die ich je im Leben angestrebt habe"
Kraus K. Beim Wort genommen, p. 431; Detti e Contraddetti, p. 352;
"Zensur und Zeitung - wie sollte ich nicht
zugunsten jener entscheiden? Die Zensur kann die Wahrheit auf eine
Zeit unterdruecken, indem sie ihr das Wort nimmt. Die Zeitung
unterdrueckt die Wahrheit auf die Dauer, indem sie ihr Worte gibt.
Die Zensur schadet weder der Wahrheit noch dem Wort; die Zeitung
beiden", Kraus K. Beim Wort genommen, p. 443; Detti e
Contraddetti, p. 358 for the comprehension of the »Dualism Truth
vs. Propaganda« in Karl Kraus. In »By Night« the keyword »Journalism« is
absent, but it is present the word »Zeitung« = newspaper, an implicit form,
but semantically related to the keyword »Journalism«. If the goal of the
search were to find all sets of aphorisms where Language and Truth and Journalism occur, probably this set of aphorisms
would have been ignored, because not pertinent with the query. By defining
semantic fields and categorizing aphorisms using them, the proposed approach
has made possible to select »By Night« as a relevant document.

Conclusions
The achieved results show that literary data formalization based on
ontologies is able to improve the accuracy of literary research. By
including definitions of basic concepts in the domain (also in a
machine-interpretable form), by identifying relations among them and by
defining semantic fields, WordNet allows experts to share information in a
domain, to provide critical notes and comments on texts, and to interpret
them.
Furthermore, from this study emerges that defining the semantic field of
words (by applying definitions provided by an ontology) and indexing
documents by adopting a semantic categorization is an effective way of
representing the content of a text: the faculty to bring to light word
meanings, hidden in texts in an implicit form, improves the retrieval of
more relevant documents, matching humanistic research needs.

References

AA.VV.

Information processing & Management ─ An
International Journal

New York
Elsevier Science Ltd
37
2

2001

D.
Alderuccio

Dualism Truth vs. Propaganda in Karl Kraus. Methodology
for a computer-assisted literary analysis

Thesis

ENEA/University of Rome »La Sapienza«
2000

H.
Arntzen

Karl Kraus und die Presse

Muenchen
Wilhelm Fink Verlag
1975

T.
De Mauro

Capire le parole

Roma-Bari
Editore Laterza
1999

N.
Guarino

R.
Poli

The role of Ontology in the Information
Technology

Int’l J. Human-Computer Studies

43
5/6
623-965
Nov.-Dec. 1995

M.
Gruninger

M.
Ushold

Ontologies: principles, methods and
applications

Knowledge Engineering Review

The University of Edinburgh
11
2

June 1996

P.
Kipphof

Der Aphorismus im Werke von Karl Kraus

Phil. Diss.

Muenchen
1961

K.
Kraus

Die Fackel

Koesel Verlag
1968

K.
Kraus

Beim Wort genommen

Passau
Koesel Verlag
1955

transl. into Italian in Detti e Contraddetti.
Adelphi Edizioni, 1999; transl. into English by
Jonathan Mc Vity, in Kraus K., Dicta and
Contradicta, Univ. of Illinois Press, 2001

W.
Mieder

Karl Kraus und der sprichwoertliche Aphorismus

Muttersprache

89

97-115
1979

G.
A.
Miller

WordNet: a lexical data base for English

Communications of the ACM

38
11
39-41
1995

G.
A.
Miller
et al
WordNet: An on-line lexical database

International Journal of Lexicography
(special issue)

3
4

1990

J.
F
Sowa

Knowledge representation: logical, philosophical, and computational foundations

Pacific Grove, CA
Brooks Cole Publishing Co.
2000

E.
M.Voorhees
Natural Language Processing and Information Retrieval

Information extraction - Towards scalable adaptable systems

Berlin
Springer Verlag
1999

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2002
"New Directions in Humanities Computing"

Hosted at Universität Tübingen (University of Tubingen / Tuebingen)

Tübingen, Germany

July 23, 2002 - July 28, 2008

72 works by 136 authors indexed

Affiliations need to be double-checked.

Conference website: http://web.archive.org/web/20041117094331/http://www.uni-tuebingen.de/allcach2002/

Series: ALLC/EADH (29), ACH/ICCH (22), ACH/ALLC (14)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None