Text structure Modelling and Language Comprehension processes

Yllias Chali; Elsa Pascual; Jacques Virbel

Authorship

1. Yllias Chali

IRIT: Institut de Recherche en Informatique de Toulouse - CNRS (Centre national de la recherche scientifique)
2. Elsa Pascual

IRIT: Institut de Recherche en Informatique de Toulouse - CNRS (Centre national de la recherche scientifique)
3. Jacques Virbel

IRIT: Institut de Recherche en Informatique de Toulouse - CNRS (Centre national de la recherche scientifique)

Parent session

Semantic modelling of texts , Nicoletta Calzolari

Original URL

https://web.archive.org/web/19990204000531/http://www.hit.uib.no/allc/chali-ny.pdf

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1. Statement of the study
The present work is part of a project1
grouping
together linguistic, neuropsycholinguistic and
computer science researchers. The aim is [15]:
• to obtain a precise definition of text structure,
in order to use it in a system of natural
language generation.
• to study the impact of text structure on comprehension and memorization processes. The
tested populations are normal and pathologic
subjects (e.g. Alzheimer and right brain-damaged patients).
This paper is focused on the first aim: We propose
a model of text that is, at the same time, constrained by and needed for the second aim. This framework led us to examine the notion of text structure
in an original way, following from the nature of
the project: a class of neuropsycholinguistic experimental protocols, whose purpose is to evaluate
the text comprehension, is based on a system of
questions asked to the subjects. The relevance of
their answers allows to weigh up their comprehension and memorization.
However, these protocols are lacking of a formal
characterization of text: the questions features
cannot be controlled, and the set of the possible
questions cannot be defined. Hence an appropriate
text representation is needed for supporting this
specific problem.
For this reason, the proposed model is based on:
• a decomposition of the texts in elementary
sentences that support the questions, and on
• a questions/answers structure.
We applied this model on a particular text, that we
call the text of reference. This text has been built
up according to neuropsycholinguistic hypothesis,
relative to the comprehension/retrieval of the studied patients: it is a narrative text.
This model is aimed to be the basis of a generation
system of multiple versions of the same text, characterized by their structure differences.
The long-term project is to integrate this system
into a new experimental protocol. At present, the
neuropsycholinguist built up a text following its
hypothesis. Our idea is to enable the subject to
build up the text himself, using the generation
system: the beginning of the text is generated by
the system; after that, a question is asked to the
subject by the system; according to its answer the
following part of the text is generated, and so on
until the whole text is generated.
The final text results from a set of choices done by
the subject. This text defines a particular version,
that can be interpreted according to the same
neuropsycholinguistic hypothesis.
2. The model of text
2.1 The basis of the model
Our computational model takes into account:
• the text properties that are needed for verifying the neuropsycholinguistic hypothesis
[15],
• the text structure that are needed for automated text generation [12, 18, 10, 17, 14, 16].
The text structure representation makes explicit
the knowledge about the descriptions and the relations between the structure of:
• the elements of the sentences,
• the sentences, and
• the text units.
This representation is defined for being the basis
of the generation of several versions of the same
text. These versions are characterized by their
level of details: the text production is done at a
variable depth and development. Consequently,
this representation has to take into account elements that can be selected and structurally composed for defining one text version.
In view of the definition of a text structure representation responding to these needs, we appeal to
the following theoretical approaches:
• the surface sentences of the reference text are
decomposed into a set of elementary sentences, to which is associated a set of composition operators, following Harris’ approach
[6, 7].
• the coherence [13, 9] and the cohesion of the
text are achieved by the dependency relations
which hold between the utterances of the
text. These dependencies are semantical, and
consist in questions/answers relations which
logical types are implication, manner, time
and place.
• the constitutive elements are arranged into
text classes, formalized with a text grammar
[19, 18, 17].
2.2 Harris’ operators system
Harris ([6, 7]) formulates all the necessary and
sufficient properties and relations of the natural
language in terms of a mathematical system, and
gives a characterization of the expressions of the
language. He proves that the sentences have the
property to be decomposable, by a one-to-one and
calculable way, into a set of elementary sentences,
except in some cases of degeneracy; inversely he
defines a recursive process allowing to generate
all the sentences from a finite elementary set of
assertions (K) and by means of a finite set of
operators (f).
2.3 The textual base
Following this approach, the reference text is decomposed into a set of elementary sentences. This
set constitutes the textual base. The complex sentences are built up by transformation of the elementary sentences of the textual base, and with
help of their associated characterizations. For each
elementary sentence, these characterizations are
([15]):
• a syntactic schema in terms of elementary
constituents.
• a set of unary transformations (e.g. nominalization, pronominalization).
• a set of internal questions, to which the elementary sentence constitutes an answer.
These questions follow from the associated
syntactic schema.
Subordination relations hold on the set of elementary sentences for constituting complex sentences.
The discourse is built up by a recursive application
of operators whether on assertions (i.e. elementary
sentences), or on sentences obtained by operators
applications.
This text representation, in terms of elementary
sentences and associated transformational operators gives a description of the internal relations of
the sentences. We need to extend it for handling
relations between the elements of the text structure, in particular for:
• the coherence/cohesion notion holding between the text units.
• the ordering of the constitutive elements of
the text classes.
2.4 The semantic relation between sentences
The processing of the meaning in the transformational theory is linked to the acceptability notion.
It is a process for defining the set of the simple
elementary sentences (i.e. sentences with one
verb, subject and different objects). This set is
considered as the set of the minimal units of meaning [4, 7]. By means of transformations, it is used
for generating the complex sentences. The systematic application of the set of the unary and binary
transformations on the set of the simple and complex elementary sentences generates many text
variants (different by their meaning) of the same
basic set of elementary sentences.
Moreover, a text corresponds to a particular description of the world situation, and we have to
delimit the combinatory nature of the composition
operations that the elementary sentences can support. For solving the problem of the sentence
semantics in the text, we refer to the logic of
questions theory [1, 3, 2, 8, 11, 5]. The principle
of this theory is that the meaning of a sentence is
constituted by the different answers to the questions which are supported by this sentence. This
principle is stated, in an informal way, by Hamblin’s postulates [5]:
1 an answer to the question is a statement
2 knowing what counts as an answer is equivalent to knowing the question
3 the possible answers to a question are an exhaustive set of mutually exclusive possibilities.
In a text, and following this principle, a sentence
evokes only a part of the global description coming from the text, and the only way for this
sentence to interact with the other descriptive elements is through the questions/answers relations.
For this reason, we provide the set of the basic
elementary sentences with questions/answers relations, allowing them to compose the meaning of
the text. These relations link an elementary sentence, support of questions, to its set of answers.
Three types of relations are distinguished:
• temporal, answering to "when" question
• manner, answering to "how" question
• place, answering to "where" question
• implicational, answering to "why" question,
which is subdivided into:
• implicational motive, which answer is "because"
• implicational purpose, which answer is "in
the purpose of", or "in order to", etc.
47
2.5 The organization of the constitutive
elements of text
The large texts, with a complex structure, can be
divided into classes. At each class of text corresponds a description in terms of constitutive elements. The ordering of these elements can be
formalized by a text grammar [19, 18, 17]. For
example, the technical or scientific papers class
corresponds to the following description:
<introduction><section1>...<section n><conclusion>
Most of these constitutive elements can be themselves recursively decomposed into others constitutive elements, following the textology rules, until to reach elementary or non-decomposable
constitutive elements. Thus, a hierarchical structure is defined on the constitutive elements. According to this text structure, the set of the text
elementary sentences is divided into disconnected
subsets.
In the framework of our project, we consider a
class of narrative texts. In this class, the set of the
elementary sentences is partitioned into three disconnected subsets, corresponding to the three text
episodes: initial situation, event, final situation; in
addition, the subset relative to the event episode is
also subdivided into three subsets: anecdote, peripeteia, resolution.
3. Conclusion
The representation in terms of elementary sentences and associated characterizations, presented in
sections 2.2 and 2.3, takes into account only of the
internal elements of the sentences. It is suitable for
the generation of the complex sentences. Its extension presented in section 2.4 and 2.5, takes into
account the descriptions and the links between the
elements of the text structure, in particular the
relations between the sentence structures, and between the constitutive elements of the text. The
whole representation provides the base of a production system of the coherent texts with complex
structure.
This representation is both of syntactical and semantical nature. At the moment, we are working,
on the one hand, on the logic of questions for more
characterization of the semantic component, and
on the other hand on the mechanism of production
of text versions.
At short-term, the impact of this study is to consider the second aim of the project, namely to exhibit
the underlying interpretations of the differences of
text structures. These structures are represented by
our model. In return, the obtained results would
clear up more precisely the notion of text structure.
At long-term, a major impact could concern the
links between memory and language. In particular, this kind of work, may help to understand
better the comprehension of certain subjects (e.g.
Alzheimer and right brain-damaged patients), and
could help to evaluate more exactly the evolution
of the qualitative treatment of textual information.
References
[1] Aqvist, A New Approach to the Logic Theory of Interrogatives. Almqvist & Wiksell,
Uppsala, 1965.
[2] N. Belnap and T. Steel, The Logic of Questions and Answers. Yale, New Haven,
1976.
[3] M. Cresswell, On the Logic of Incomplete
Answers. Journal of Symbolic Logic,
(30):65-68, 1965.
[4] M. Gross, Sur la notion Harissienne de transformation et son application au français. Revue Langage, (99), 1990.
[5] D. Harrah, The Logic of Questions. In D.
Gabbay and F. Guenthner Eds., Handbook
of Philosophical Logic, volume II, pages
715-764. D. Reidel Publishing Company,
1984.
[6] Z. S. Harris, Structures Mathématiques du
Langage. Edition Dunod, Paris, 1971.
[7] Z. S. Harris, La génèse de l’analyse des
transformations et de la métalangue. Revue
Langage, (99), 1990.
[8] J. Hintikka, The Semantics of Questions and
the Questions Of Semantics: Case Studies in
the Interrelations of Logic, Semantics and
Syntax. North-Holland Amsterdam, 1976.
[9] J. R. Hobbs, On the Coherence and Structure
of Discourse. Technical Report Center For
the Study of Language and Information,
CSLI-85-37, 1985.
[10] E. H. Hovy, Approaches to the Planning of
Coherent Text. In W. R. Swartout, C. L.
Paris and W. C. Mann Eds., Natural Language Generation in Artificial Intelligence and
Computational Linguistics, pages 83-102.
Kluwer Academic Publishers, 1991.
[11] W. Lehnert, The Process of Questions Answering. Wiley, New York, 1978.
[12] W. C. Mann, Text Generation: The Problem
of Text Structure. In D. D. McDonald and L.
Bolc Eds., Natural Language Generation Systems, Symbolic Computation, pages 47-68.
Springer-Verlag, 1988.
[13] W. C. Mann and S. A. Thompson, Rhetorical
Structure Theory. In G. Kempen Eds., Natural Language Generation, New Results in
Artificial Intelligence, Psychology and Linguistics, pages 85-96. Martinus Nijhoff Publishers, 1987.
[14] M. W. Meteer, Expressibility and the Problem of Efficient Text Planning. Communication in Artificial Intelligence Series, Pinter
Publishers, 1992.
48
[15] J. L. Nespoulous and J. Virbel, Compréhension/mémorisation de textes de différentes
structures par sujets normaux et pathologiques. Technical Report, Programme de Recherche en Sciences Cognitives de
Toulouse, 1991.
[16] C. L. Paris, User Modeling in Text Generation. Communication in Artificial Intelligence Series, Pinter Publishers, 1993.
[17] E. Pascual, Représentation de l’architecture
textuelle et génération de texte. PhD thesis,
Université Paul Sabatier, Toulouse, 1991.
[18] E. Pascual and J. Virbel, Le problème de la
génération de textes structurés. Septième
Congrès Reconnaissance des Formes et Intelligence Artificielle, pages 1181-1188, Paris, 1989.
[19] G. Sabah, L’intelligence artificielle et le langage, processus de compréhension, volume
2, Edition Herms, 1989.
1. This project is supervised by Jean-Luc Nespoulous and Jacques Virbel. It takes place in the
Research Program in Cognitive Science of Toulouse, National Center of Scientific Research
(Cognisciences Program). It is financed by the
Ministry of Education and Research.

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1996

Hosted at University of Bergen

Bergen, Norway

June 25, 1996 - June 29, 1996

147 works by 190 authors indexed

Scott Weingart has print abstract book that needs to be scanned; certain abstracts also available on dh-abstracts github page. (https://github.com/ADHO/dh-abstracts/tree/master/data)

Conference website: https://web.archive.org/web/19990224202037/www.hd.uib.no/allc-ach96.html

Series: ACH/ICCH (16), ALLC/EADH (23), ACH/ALLC (8)

Organizers: ACH, ALLC

Text structure Modelling and Language Comprehension processes

1. Yllias Chali

2. Elsa Pascual

3. Jacques Virbel

ACH/ALLC / ACH/ICCH / ALLC/EADH - 1996