Appropriate Use Case Modeling for Humanities Documents

  1. 1. Aja Teehan

    Maynooth University (National University of Ireland, Maynooth)

  2. 2. John Keating

    Maynooth University (National University of Ireland, Maynooth)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

1 Introduction This paper argues that the most appropriate methodology
to use when modeling
a historical document for
a software environment is one that focuses on modeling
for functionality. This functionality is derived from Use
Case modeling
that should be undertaken in consultation
with the User Group. The Use Cases are an expression
of the ecological model as they embody the use of
the document, by the User, in the software environment.
The encoding mechanism largely practiced within the
humanities computing community is represented by the
TEI (Text Encoding Initiative)[1], which seeks to provide
a set of guidelines for encoding humanities documents.
However, TEI offers no guidance in relation to
creating an encoding of a document that is supportive of
the software environment
that will host it, or the User.
We argue that modeling with recourse to the Logical, the
Physical and the Interaction classes enables not just the
generation of an appropriate encoding scheme, but also
the software to manipulate it. The argument is framed
in relation to the creation of a digital edition of an 18th
century Spanish Account Book manuscript.
2 Why Should Humanities Researchers
Employ Use Case Modeling?
The humanities computing community currently lacks a
formalised framework in relation to how to approach the
task of modeling documents to reside in software
TEI (Text Encoding Initiative) focuses mainly
on the skill of encoding using TEI, rather than the knowledge
as to how best to encode, which is independent of
any technical language. We would agree with the Kings
College Humanities Computing course designers, who
seeks to promote knowledge
based, rather than skills
based, learning[2]. It is much more diffcult, and more
valuable, to create a good model than it is to encode with
technical validity.
More and more the humanities researcher is involved
in encoding their documents
but remain uninvolved in software design. Unfortunately, this disjoin often results
in clunky systems that are not used by the community.
Bradley argues that ”HC might be more influential if
it moved its operations closer to traditional scholarly
methods”[3]. In order to ”be in a better position to develop
a model of the role of computers that does more
to support humanities research”[3] the encoder must
consider the logical, physical and interaction classes in
relation to their own domain when creating the Use Case
and dependent encoding model. Modeling how the document
is to be used, right down at the low level technical
encoding, is of paramount importance.
3 Use Case Models
A Use Case acts as a blueprint for the system design and
typically depicts the steps an actor takes while interacting
with the software in order to achieve some meaningful
goal or task, goal being higher level and task being lower
level. These explicit steps are expressed in a formalised
diagram using UML (Unified Modeling
Language) and
can be used by the software engineer to create a supportive
software environment. This is a higher level of
abstraction. The Use Cases model the ecological system,
which is then used to build a software environment that
encapsulate the functionality required by the researcher.
At a much lower level of abstraction, the researcher who
is involved in encoding their source must create
a model
of the source document that supports, and is part of, the
ecological system model.
The Alcalá Project was originally proposed as a digital
humanities project to mark a humanities collaboration
between the University of Alcalá de Henares (UAH),
Spain, and the National University of Ireland, Maynooth
(NUIM), Ireland.
The source was to be a Spanish eighteenth
century account book recording the monthly expenses
of the Royal Irish College of Saint George the
Martyr. In eight weeks, the source manuscript was chosen,
encoded and made available in a web based, dual
language, searchable and interactive environment. More
it was developed to aid the historian in answering
historically pertinent research questions that are
specifically prompted by the historical object, an account
book. In the remainder of this paper we will refer to this
digital artefact for examples. Please see fi gure 1.
In the digital edition of the Alcalá Account Book manuscript
a transcription and translation are provided for the
manuscript and are presented along with facsimile images
of the original manuscript on a page-by-page basis.
In addition, it is possible to transfer specific expenses to
a datasheet for accounting operations to be performed
upon them. This functionality was provided as a result of
Use Case analysis. A typical example of the goal a user
might wish to accomplish using the original manuscript
might be, “calculate how much was spent on bread at
the college in 1778”. At a lower level of abstraction, this
Use Case requires six steps be performed by the User. If
the User speaks only English then they must (1) select
the translation as the version to search, (2) enter bread
as the keyword for the search, (3) examine the returned
facsimiles from 1778, (4) select the entries in the account
book pertaining to bread using the checkbox, (5) transfer
the pertinent entries to the datasheet for calculation, and
(6) switch to the datasheet view to read the total. 4 A Model Framework: Logical, Physical
and Interaction Classes
The above Use Case is a good example of how the software
environment should support the User. However,
there is no detail about how the steps should be achieved
by the software, instead everything is from the User’s
point of view. Additional steps must be performed by the
software environment, for instance, the fi rst step now becomes
(1a) Interface presents translation, transcription
and facsimile image of fi rst page (1b) User selects translation
as the version to search (1c) Interface presents
translation on the screen. This poses a problem for the
researcher charged with encoding the document.
The above Use Case requires information from three different
classes: the logical,
the physical and the interaction
classes. The logical class is a model of the content of
the document, e.g. monthly expenses; the physical class
is a model of the document e.g. pages; the interaction
class is a model of how the User interacts with the document
and, by extension, how the User interacts with the
software environment’s representation of that document. The encoding, though only part of the ecological system
model, must support the functionality in the Use Case
and thus must support the three classes: logical, physical
and interaction.
Failure to support the interaction class
will result in the software engineer being unable to fulfil
the Use Cases. For instance, in the Alcalá Account Book
encoding scheme each page was labelled with a unique
page identifier, unlike the manuscript where only pages
with text were marked by an archivist. This allowed the
software engineer to return the exact page that a search
requested. Without prior knowledge of the Use Case,
“search for page number 10”, and how the software engineer
would implement that Use Case (the interaction
class), and thus his technical requirements, this could not
have been performed. Thus, the humanities researcher
who encodes must also be aware of the interaction class
and the requirements of the software engineer.
Furthermore, the encoding of the logical model must
also take cognisance of the User’s requirements. A single
document can be researched in many different ways,
for instance a historian may be interested in the social
history captured in the Alcalá Account Book manuscript
or they may be interested only in the prosopographical
information that can be gleaned from it. Choices need
to be made in representing the contents of the document.
In relation to encoding specifically, tags must be
created to give context to content, and segmentation of
the document must be performed to decide what should
be contextualised. These decisions should all be derived
from the Use Case. While encoding the information the
encoder must always be mindful that the Use Cases can
be fulfi
lled. For instance, in the Alcalá Account Book
encoding each of the expenses was labelled separately
and broken down into its description and the sum spent.
This allowed us to manipulate the fi gures separately on
the datasheet so that mathematical operations could be
performed. Without this separation of the sum spent we
would have been unable to contextualise the fi gures as
“money” and thus would have been unable to fulfill the
Use Case, “how much was spent on bread in 1778?”.
Furthermore, without recourse to the interaction class
of this Use Case, the intuitive interaction offered
by the clickable facsimile would have been foregone.
In this alternative scenario, the fi gures required
by the query are directly manipulated on the facsimile
image. The User can click on those manuscript account
book expenses that they wish to interact with. The expense
items are simultaneously selected on the facsimile,
in the translation
text, and in the transcription text; they
can then be sent to the datasheet for further manipulation.
This simultaneous selection imparts to the User the sense
that all the version are integrated, and are representative
of the original encoding. The interaction becomes more
intuitive, closer to the usability of the original document,
but enhanced. Although it is both possible and necessary
for a researcher skilled in humanities to be the primary
articulator of Use Cases that encompass both the physical
and logical classes, it is more diffcult to articulate
those parts of the Use Case that derive from this interaction
class and a dialogue should be opened here with
a practitioner knowledgeable of Computer Science and
Software Engineering.
5 Limitations of Use Cases
Use Cases have some well documented problems associated
with them [4]. The most pertinent problem is
that Use Cases can only be successfully used when the
modeler has a full understanding of the problem domain,
in this case, some humanities data or object. This limits
their usefulness to cases when it is possible
to fully
understand the humanities data or object in question before
the digitisation takes place. Data is always created
in some context and can thus be understood. This is not
necessarily true for humanities objects, such as novels.
These are less definable and thus are sometimes digitised
to aid in the investigation
of their meaning. Use Cases are
less useful when the aim of the digitisation is to promote
prima facie discovery or investigation of the humanities
object. To overcome problems associated with requirement
drift it would also be important to combine this
approach with an iterative design process, as opposed
to a sequential.
This would ensure that the Use Cases
could also be updated to reflect the most current set of
requirements for the project and help to avoid, ”the biggest
iteration of all, going back to the start”[5]. The Use
Case is just one tool available to the humanities computing
researcher from the arsenal of software engineering
paradigms, for instance Rapid Application Development
[6] and Participatory
Design (where the end Users are
actively involved as consultants in the design of the software
ecological system) would be valuable [7]. Situating
the design and use of Use Cases within these software
engineering paradigms would be even more beneficial to
the humanities computing community.
6 Are Use Cases Redundant?
Use Cases are sometimes considered to be the expression
of the obvious through highly formalised means,
the implication being that the administrative overhead
incurred is not justified for the benefit that is produced.
It may seem obvious to state that without identifying and
then isolating the required pieces of information
for a
question, you cannot answer that question. However, this
is a very basic, and very valuable, step that is missing
from many digitisation projects. For instance, a digital
repository of “The Chymistry of Isaac Newton” [8] of fers diplomatic transcription, normalised version and
correlated facsimile image for many documents, including
Newton’s most complete laboratory notebook. The
documents are fully keyword searchable. The “ultimate
goal is to provide complete annotations for each manuscript
and comprehensive interactive tools for working
with the texts”[9]. There is no doubt that this is a very
valuable source. However, the encoding does not provide
for the functionality that one would initially expect
of such a collection, nor can this functionality be added
later, without significant recoding. For instance, there is
no support in the encoding for implementing a contextual
search for logical model elements such as “experiment”,
“apparatus”, “chemical”, “method” or “conclusion". Instead
the element tags are drawn from the prose, figures,
linking, analysis, names/dates, and transcription tag sets
of the TEI. Though already a very rich and rewarding
source, it would benefit greatly from this type of functionality.
Furthermore, once the Use Cases were elicited
and described, the additional work involved in encoding
this type of functionality would have been minimal. It
may seem obvious to software engineering
experts that
to build a system one fi rst has to define precisely what
the requirements of that system are, but this is not the
widespread practice within humanities computing.
7 Conclusion
Use Cases do not, of themselves, provide automatic
quality and clarity of a digital
artefact, or of the encoding
built upon it. They function as a tool to aid the improvement
of the ecological software environment so that the
main requirements
of the User can be satisfied. The creation
and implementation of the Use Case still requires
skill and knowledge, and still depends completely on the
writer of the Use Case, the software engineer and the
encoder. However, this level of knowledge would be demonstrably
more valuable to humanities researchers than
the text-encoding skills currently being promulgated. In
relation to ascertainig the appropriate level of abstraction
McCarty posed the question, “For us in the digital
humanities, when and how does it matter that we know
directly what’s in the cellar?”[10] . We contend, “that
from the outset (when it matters) researchers should
know how, at least at a detailed pattern level, what they
want to do, now and in the future (how it matters).” [11]
This detailed pattern level is exemplifi
ed by the knowledge
of how to create Use Cases. The researcher who
wishes to operate at a lower level of abstraction and actually
encode the humanities document must fi rst have
this high-level knowledge. In order to create an encoding
scheme based around a document they should, in addition,
have knowledge of the logical, physical and interaction
classes. Only then can they appropriately
that knowledge at the skill-level in an encoding. Both the
problem domain (humanities research) and the software
engineering pattern required to create appropriate Use
Cases are very demanding of the researchers involved.
Both areas demand high levels of expertise and understanding.
Consequently, it is unusual to fi nd this level
of specialisation in one person. The solution is not to
promote the assimilation of software engineering skills
within humanities disciplines, but rather to promote the
dialogue between the experts at a suitable design and abstraction
level—that of the Use Case.
1. : Tei: P5 guidelines. Text Encoding
Initiative. Available Online. Accessed 10 March,
2. Jessop, M.: Teaching, learning and research in fi nal
year humanities computing student projects. Literary
and Linguistic Computing 20 (2005) 295–311
3. Bradley, J.: What you (fore)see is what you get:
Thinking about usage paradigms for computer assisted
text analysis. Text Technology 14 (2005) 1–18
4. Firesmith, D.G.: Use case modeling guidelines. Technology
of Object-Oriented Languages, International
Conference on (1999) 184
5. Dominick, P.G.: Tools and Tactics of Design. Wiley
& Sons (2000)
6. Martin, J.: Rapid Application Development. New
York: Macmillan (1991)
7. : Participatory design.
Computer Professionals for Social Responsibility. Available
Online. (2008) Accessed 10 March, 2009.
Newton, Issac. “Newton's Most Complete Laboratory
Notebook”. The Chymistry of Isaac Newton. Ed. Newman,
W.R. 11 February 2006. Available online. Accessed
10 March, 2009.
9. : The
Chymistry of Isaac Newton. Ed. Newman, W.R. 11 February
2006. Available online. Accessed 10 March, 2009.
10. McCarty, W.: Signs of times present and future.
Available Online. Humanist Discussion Group Vol. 22,
No. 218 (2008) Accessed 10 March, 2009.
11. Keating, J.: Signs of times present and future. Available
Online. Humanist Discussion
Group Vol. 22, No.
219 (2008) Accessed 10 March, 2009.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None