The "Document" in Document Architecture

  1. 1. Helena Francke

    Swedish School of Library and Information Science - University of Borås

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The concept of document architecture (DA) pops up once and again in computer discourse, often with slightly different meanings. Sometimes it refers to the architecture in document managing software, sometimes it denotes the organisation of document components, with a primary focus on the text. This presentation takes its point of departure in how document architecture is defined in the SGML ISO Standard (8879:1986) and what the term “document” implies there. It then goes on to make an overview of how the document concept has been discussed and employed in various ways within Library and Information Science (LIS) and poses the question of whether this discussion can serve to broaden the document concept in document architecture in a meaningful way.

The context in which this question is posed, and for which it is of particular importance, is my Ph.D. thesis, in which I attempt to make use of the metaphor “document architecture” as the basis for a model that can serve to analytically study documents with regard to media-specific characteristics (cf. Dahlström & Gunnarsson 2000). In some ways it bears resemblances to Katherine Hayles’ Media-Specific Analysis, MSA (Hayles 2002), although its focus is not primarily on literary works, and specifically not on the meaning of the text. Rather, the document is seen as a crucial unit in the organisation of knowledge, serving to contextualise and materialise our interaction with that evasive phenomenon we tend to term “information”. However, the DA approach shares MSA’s interest in how the material of the document influences the text, as well as MSA’s applicability in the analysis of different media.

Within LIS, the view of what constitutes a document covers a wide span. Smiraglia’s quite concise “document = item” (in Smiraglia 1992, quoted in Smiraglia 2001, p. 149), Ranganatan’s media-restrictive “a record on a more or less flat surface” (Ranganathan 1963, p. 41, quoted in Buckland 1997, p. 807) and Suzanne Briet’s encompassing “all concrete or symbolic indexical signs [indice, transl. comment], preserved or recorded toward the ends of representing, or reconstituting, or of proving a physical or intellectual phenomenon” (Briet 2003) are some examples of how the definitions may differ in explicitness and focus. Implicit in all of these quite diverse definitions is the physical or material aspect of the document. It is often seen as the material item that carries the abstract work; the object of an act of documentation. This is an aspect that is absent in the SGML Standard definition of document, which reads “A collection of information that is processed as a unit.” (ISO 1986, p. 10) Here, the view of the document lies closer to a logically defined concept and relates closely to the SGML document, which is in essence logical and hierarchical, and which brings to the fore the relationship between the notions of document and text.

Although obviously different, all of the definitions above, with the exception of Briet’s, fit into the same category in the taxonomy of document research suggested by Roger Pédauque and a number of his colleagues (Pédauque 2003); the category of research that privileges the form and material and structural aspects of the document. The wide span within the category is primarily due to the difference in medium of the documents that make up the conceptual basis in the different cases: the print document is described as “medium + inscription” (p. 5) and the electronic document as “structure + data” (p. 6). The category concerned here, Form, is one of three categories; the other two are Sign and Medium. Each of the categories is related to one aspect of a reading contract between a reader and a producer (p. 4). The document as form is concerned with “an object of communication governed by more or less explicit formatting rules that materialize [the] reading contract” (ibid) which is interpreted in terms of legibility. The other two aspects are intelligibility and sociability, the former having to do with the sign (or inscription) and its meaning, and the latter with the social function that the document as “a tangible element of communication between human beings” (p. 17) plays. Within LIS, document definitions can be found that may fit within one or more of these aspects, and the question is if the notion of a reading contract can be used in a meaningful way in connection to LIS document understandings to inform a model of document architecture, uniting the three aspects of the contract.

Another interesting idea when it comes to definitions is the notion of fuzzy sets. Often, definitions constitute an attempt to list the inherent properties that unite that which the concept comprises. Lakoff and Johnson (1981) argue that concepts are defined in social interaction and in language use. Therefore, “definition is not a matter of giving some fixed set of necessary and sufficient conditions for the application of a concept […]; instead, concepts are defined by prototypes and by types of relations to prototypes” (Lakoff and Johnson 1981, p. 125). We may speak of some documents as being close to the centre, and others as being further out towards the limits of the document concept set. Thus, an admittedly quite unscientific, but close at hand, assumption may situate a traditional printed book close to the centre of the set. It has several properties that we intuitively associate with a document – intentionality, materiality, storage potential, a communicative function, and artefactuality. This can also serve to explain why we may have some trouble accepting e.g. Chinese water poetry as a document (it lacks storage potential). It is placed a certain distance from the centre of the set, and shares only some of the prototype’s properties.

In the presentation, I will test the appropriateness of discussing the document concept in terms of a combination of Pédauque’s reading contract and the document as a fuzzy set. The context for the discussion will be a model of document architecture intended for the analysis of traditional and new media, which may in turn serve to highlight similarities and differences in how we may approach and work with documents in different media.

Different disciplines, and indeed schools within the disciplines, have very different answers to the question of “what is a document.” In conducting a conceptual discussion of the term “document”, especially in relation to electronic media, aspects of both computer science and a number of diverse humanities disciplines may prove highly relevant as sounding boards. This is one of my incentives for wanting to discuss these questions in the multidisciplinary environment at ALLC/ACH.

Works Cited
Briet, Suzanne (2003). “What is Documentation?” Transl. Ronald E. Day and Laurent Martinet. [Qu’est-ce que la documentation? Paris: Éditions Documentaires Industrielles et Techniques, 1951.] Detroit: Wayne State University. <>
Buckland, Michael K. (1997). “What Is a ‘Document’?” Journal of the American Society for Information Science 48.9, 804-809.
Dahlström, Mats and Mikael Gunnarsson (2000). “Document Architecture Draws a Circle: On Document Architecture and Its Relation to Library and Information Science Education and Research.” Information Research 5.2. <>
Hayles, N. Katherine (2002). Writing Machines. Cambridge, MA: MIT Press.
ISO - International Organization for Standardization (1986). Information Processing: Text and Office Systems: Standard Generalized Markup Language (SGML). 1st ed. International Standard ISO 8879-1986 (E).
Lakoff, George and Mark Johnson (1981). Metaphors We Live By. Chicago & London: University of Chicago Press.
Ranganathan, S. R. (ed.) (1963). Documentation and Its Facets. London: Asia Publishing House.
Pédauque, Roger T. (2003). “Document: Form, Sign and Medium, as Reformulated for Electronic Documents.” 3rd version, 8 July 2003. CNRS Information and Communication Science and Technology (STIC) Department. <>
Smiraglia, Richard P. (1992). Authority Control and the Extent of Derivative Bibliographic Relationships. Ph.D. Diss. University of Chicago.
Smiraglia, Richard P. (2001). The Nature of “A Work”: Implications for the Organization of Knowledge. Lanham, MD & London: Scarecrow Press.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info



Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC