Mashing Texts: Supporting collections level text analysis

  1. 1. Peter Organisciak

    University of Alberta

  2. 2. Geoffrey Rockwell

    University of Alberta

  3. 3. Stan Ruecker

    University of Alberta

  4. 4. Susan Brown

    University of Guelph

  5. 5. Kamal Ranaweeram

    University of Alberta

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Introduction At the 2005 Summit on Digital Tools in the Humanities
the need for tools for the Exploration of Resources
[1] was identified as one of four opportunities
for humanities computing tools. As a critical mass of
evidence useful to humanities research becomes available
on the web, researchers need tools for gathering
the resources they need to ask questions, assembling
and editing the evidence into study collections, and then
analyzing those collections. This paper will discuss the
Mashing Texts project [2] that has followed a persona
usability design process to develop stories and a prototype
for how a collections analysis tool might work to
support humanities research. In particular, we will:
1. Demonstrate the JiTR (Just-in-Time Research) prototype
and how it can be used to assemble a collection,
edit items, and analyze them
2. Discuss the persona design process and the stories
we generated to describe potential use of such a tool.
In particular we will outline two different communities
of users that ran through the design process.
3. Discuss the technical and service implications of
such a tool which has to integrate with digital colcollections,
digital archives, and text analysis systems. We believe the humanities computing community can
make the case for such an environment and that we can
outline models for integration so that such a tool can
interoperate with the content archives and text analysis
tools available.
JiTR Demonstration
The JiTR (Just-in-Time Research) prototype is the first
pass at realizing the Mashing Texts project. The simple
story about JiTR is that it lets you manage collections of
digital items and run tools that either gather items, clean
them or analyze them. In JiTR you can create collections,
add items to collections (manually or with spiders/
scrapers), edit the items (automatically or manually) and
then pass a collection as a single “text” to a text analysis
tool elsewhere. We will demonstrate a rapid research
cycle of gathering, editing and analyzing a collection.
Mashing Texts embraces principles of Internet mashups
in conceptualizing a recombinant environment whereby
the resultant product is more valuable than the sum of its
parts. JiTR is designed so that other tools can be plugged
into it like search engines and spiders to generate text
collections on demand from within the environment. As
part of the project’s emphasis on malleability of data,
JiTR allows those collections to be organized by various
tags and labels. The final part of the demonstration will
show some of the analysis tools that JiTR offers.
Persona-Scenario Methodology
JiTR is just a working prototype designed to test a design
hypothesis about a type of tool that we think is needed.
The research in this project has been the extensive design
process. If the end purpose is user functionality,
why not prototype around stories about users and what
they could do? This is the idea behind the personas and
scenarios design process, a usability design process that
we have used on other projects like the TAPoR portal.
Personas are imagined possible users that “act as standins
for real users” [3]. Personas are examples of people
who would potential like to use the system. They are
created to stimulate thinking about people, rather than
exclusively concepts. Once realistic user personas are
created, usages scenarios are described, which consider
ways users could possibly employ the final product. Scenarios
move into specifics, detailing the steps that the
user would follow in working with the system. Eventually,
the various scenarios are prioritized into primary
and secondary uses so you know what types of tasks the
product needs to support. You can also use the scenarios
to audit the prototype.
In this project we started with two constituencies that
we came to call DEEP (Distributed Electronic Editing
Platform) and BROAD (which does not yet stand for
anything) .
• DEEP personas would use JiTR to collaboratively
edit a rich born-digital collection like Orlando. To
this end we worked with the Orlando team to check
that the personas, scenarios, and tasks we described
were true to their experience.
• BROAD personas would use JiTR to rapidly gather
evidence from the web to study contemporary issues.
We imagined various users who want to use
the web as their text and therefore want to gather
subsets of web documents into study collections.
For example, a linguist might want to gather web
pages where real users use a language pattern.
Our hypothesis is that both these very different constituencies
actually need the same sort of tool and we wanted
to see if we could design something that would serve
both, another form of mashing, if you will.
This paper will outline the steps of this design process as
we believe it is particularly suited to humanities computing.
We will end by presenting the priority personas and
what these imagined people want to do.
Technical and Political Implications
It is one thing to prototype and test an idea for another
tool; it is another to develop a production tool that others
can use. We are particularly conscious that a tool like
JiTR needs to work within the ecology of tools available.
To that end we had a parallel process in the project to
identify the other tools, frameworks, and standards that
JiTR needs to work with so as to develop viable architectural
specification for the development of a production
version. This involved identifying the points of articulation
between JiTR and other tools. An obvious example
is how JiTR should work with repository systems life
Fedora. While our prototype has its own MySQL database,
a production version should not manage the repository
of texts in a collection. Instead it should have the
ability to push and pull texts from a repository, whether
it be Fedora or another system. Likewise JiTR should not
include any tools, but should have a plug-in architecture
for tools from spiders to text mining tools. Our prototype
has tools built in, but a production system would have an
interface for managing processes.
As with any development project the design process is
partly about deciding what you aren’t going to do. We
believe that a missing layer needed between repository
tools and text analysis tools is a collaborative research collections management environment. At the end of the
paper we will present the designs of how we think a
full system could support the research work flow of our
two user constituencies. How would an editor develop
a workflow for the editing of electronic texts in JiTR?
How would a researcher interested in the discourse on
the web about high performance computing gather documents,
clean them and analyze them?
[1] Summit on Digital Tools for the Humanities, http://
[2] Mashing Texts is supported by a Social Science and
Humanities Research Council of Canada Research and
Development Initiative grant. The project is openly documented
[3] Calabria, Tina. “An introduction to personas and how
to create them”,

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None