How to Build a Textual Data Model

  1. 1. Dino Buzzetti

    Università di Bologna (University of Bologna)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The purpose of considering a type of representation
of textual variation based on external structured information,
on the one hand, and a type of representation
of external interpretational variants based on an a
textualization of their comprehensive description, on the other, aims at finding proper ways of building an overall
data model for the processing of textual information of
both kinds, the representational (its expression) and the
interpretational (its content) one. For we think that sheer
representation, without processing, leaves a digital (i.e.
processable) edition essentially incomplete.
The basic principle of interconnection between the “image”
or expression of the text [Segre, 1985: 378] and its
meaning or information content can be stated as follows:
“there may be different ways of understanding what is
said and different ways of saying what is meant,” or, to
put it in other words, “the fixity and invariance of the
expression (or the content) respectively entail the indetermination
and variance of the content (or the expression).”
[Buzzetti, 2004: 180] As referred to structural
representations both of the the textual data (in terms of
“syntactic markup structures”) on the one side, and of
the corresponding information content (in terms of “objects,
properties, and relations” of specific semantic domains)
on the other, [Dubin, 2003: 2] the same principle
can be expressed by affirming “that the same markup
can convey different meanings in different contexts,”
and “that markup can communicate the same meaning in
different ways using very different syntax.” [Dubin and
Birnbaum, 2004: 1] Are there means of connecting the
two sides in a systematic way?
In his Kundige bok edition Malte Rehbein has shown
that it is possible to produce a two-tiered representation,
a marked-up transcription of the text and its variants, and
a database, that connects the definition of the several text
layers with further contextual information. Both representations
have an operational character, so that changes
in the the markup modify the database entries, whereas
the database entries produce newly marked-up views of
the text. In their turn, the Signum group at the Scuola
Normale Superiore in Pisa have shown that a topic map
representation of the content of Bruno’s Eroici furori can
be connected with a joint visualization of textual occurrences
and their semantically related notions. A change
in the topic map produces a reorganization of the intratextual
relations, and in reverse a new organization of
textual relations produces a different topic map.
Moreover, both case studies have implemented a comprehensive
representation of the variants, respectively
textual and interpretational, by means of the Multi-
Version Document data structure (MVD) introduced by
Desmond Schmidt. [Schmidt and Colomb, forthcoming]
The MVD data structure is a directed graph with
a start and an end node and each textual layer or interpretational
description is represented by a different path
on the graph. A topic map representation can be serialized
through the XTM standard and treated exactly in
the same way as a textual document. Both the editorial
and the interpretive practice imply a one-to-many relation
between the different textual or interpretational versions
and their comprehensive representation, or “logical
sum” [Thaller, 1993: 64 and Buzzetti, 2002: 77-79],
although in reverse order, as the editor goes to one single
reconstruction from the many witnesses of the text, and
the literary critic from one edition to its many interpretations.
And the MVD data structure seems to provide
a reliable and processable representation of the logical
sum of both textual and interpretational variants.
The two case studies deal with different texts, but by
using the same kind of representation for both textual
and interpretational variants of the same text, we think
of simplifying the task of mutally mapping them onto
each other according to the concept of a dynamic model
that can be elicited from the ambivalence of markup
[cf. Buzzetti, 2004 and Buzzetti, 2009]. As a diacritical
mark, markup is ambiguous and can be seen both as the
value and as the rule of a structuring operation. [Buzzetti
and McGann, 2006: 67-68] By applying this principle to
both the textual and the interpretational representations,
we can be able to use either of them as a set of instructions
to restructure and reorder the other one. A polysemic
textual fragment can be easily used to demonstrate
the working of such a model.
Can the MVD data structure provide a sound and effective
basis for mapping corresponding alternative paths
from the textual variation graph to the interpretational
variation graph and vice versa? The question is open.
But finding a proper data model for a viable computational
solution to this kind of mapping raises a crucial
challenge to any attempt of building a comprehensive
data model for textual information. And we hope that
providing a closer illustration of the problem may assist
in finding a solution.
[Buzzetti 2002] D. Buzzetti, Digital Representation and
the Text Model, in « New Literary History » 33:1 (2002),
pp. 61-87.
[Buzzetti 2004] D. Buzzetti, Diacritical Ambiguity and
Markup, in D. Buzzetti, G. Pancaldi, and H. Short (eds.),
Augmenting Comprehension: Digital Tools and the History
of Ideas, London-Oxford, Office for Humanities
Communication, 2004, pp. 175-188.
[Buzzetti, 2009] D. Buzzetti, Digital Editions and Text
Processing, in M. Deegan and K. Sutherland (eds.), Text
Editing, Print, and the Digital World, Aldershot, Ashgate, 2009, pp. 45-62.
[Buzzetti and McGann, 2006] D. Buzzetti and J. Mc-
Gann, Critical Editing in a Digital Horizon, in L. Burnard,
K. O’Brien O’Keeffe, and J. Unsworth (eds.),
Electronic Textual Editing, New York, The Modern Language
Association of America, 2006, pp. 51-71.
[Dubin 2003] D. Dubin, ‘Object mapping for markup
semantics,;’ in B.T. Usdin (ed.), Proceedings of the Extreme
Markup Languages 2003 Conference (Montreal,
Quebec, August 2003) <
papers/extreme/proceedings/xslfo-pdf/2003/ Dubin01/
EML2003Dubin01.pdf> (26 October 2008).
[Dubin and Birnbaum: 2004] D. Dubin and D. Birnbaum,
‘Interpretation beyond markup,’ in B.T. Usdin
(ed.), Proceedings of the Extreme Markup Languages
2004 Conference (Montreal, Quebec, August 2004),
(26 October 2008).
[Schmidt and Colomb, forthcoming] D. Schmidt and
R. Colomb, ‘A Data Structure for Representing Multiversion
Texts Online,’ International Journal of Human-
Computer Studies, forthcoming.
[Segre, 1985] C. Segre, Avviamento all’analisi del testo
letterario, Torino, Einaudi, 1985, p. 378.
[Thaller, 1993] M. Thaller, “Historical Information Science:
Is There Such a Thing? New Comments on an Old
Idea,” in T. Orlandi (ed.), Discipline umanistiche e informatica:
Il problema dell’integrazione, Roma, Accademia
Nazionale dei Lincei, 1993, pp. 51-86.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None