An Approach to Treating Videos as Academic Documents

Stewart Arneil; Greg Newton

Authorship

1. Stewart Arneil

University of Victoria
2. Greg Newton

University of Victoria

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

As the scope of what constitutes a text expands to
include time-oriented media such as video, it is becoming
increasingly popular to incorporate multimedia
in to the contexts of research and teaching. As audiences
become more discriminating so too have designers and
developers begun to push the limits of technology to accommodate
these new appetites for more media, more
features, more collaboration- more everything. Applications
to collect and disseminate media to an engaged
audience are becoming so sophisticated that we have
entered in to a realm where our technical reach sometimes
outstrips the users’ capacity to imagine its use; a
situation occasionally referred to as an answer without a
question. But we did have a question: how best to take
several hundred videotapes of high-profile guest lecturers
collected over the course of 20 odd years and turn
them into an engaging, easy to use webbased application
that provided users with more than just a talking head?
Our office provides specialized software development
service to, and collaborates in research with, faculty
members in at least a dozen disciplines. Our preference
is to abstract from the immediate task and look
for ways to make our work extensible or transferable to
other projects. What we were looking for was a general
purpose tool to apply to videos to help with what John
Unsworth calls “scholarly primitives”: Discovering, Annotating,
Comparing, Referring, Sampling, Illustrating,
Representing. (Unsworth, 2000)
Our initial objective was to provide simultaneous transcription
and simple valueadded features; ancillary information
in the form of links to other sites and images
germain to the current utterance. We considered existing
technologies for marking up and presenting videos such
as those in use at MIT’s OpenCourseWare <http://ocw.
mit.edu/> and TED <http://www.ted.com>) but they did
not provide anything more sophisticated than metadata
and full text search, let alone provide for multiple channels
of time-related support material. On the other hand,
our users were not interested in highly detailed performance
attributes, such as those described by Saltz (Saltz
2004). We looked at SMIL, but quickly discarded it in
light of the immaturity of existing playback systems. It
did, however, provide us with a model. XML is a heavily
used technology in our shop and we were able to produce
a TEI schema for encoding transcripts based on conventional
semantics of utterances: we specifically used the
Transcriptions of Speech module to encode all information.
Although there was a paucity of existing software that
would provide us with an inclusive playback mechanism
it was not beyond the project’s scope to produce
our own. Indeed, the more we looked at our needs, the
more reasonable it seemed. Our specification ended
up being rather short: XML would provide the natural
structure that such texts demand; multi-modal data
streams would remain separate both in terms of storage
and delivery, allowing us to abstract code such
that we could remove any dependence upon a single
media player; users should also be able to bookmark,
and therefore cite, specific points in the video. Implementation, then, would focus on treating the document
as an academic “paper”, with a feature set that anticipates
the expectations and requirements of a scholarly
user. Each timeline (transcript, events, commentary)
consists of a list of when elements; each when element
identifies timestamps in the video and relates them to
xml elements in the file. The XML files are stored in an
XML data base (eXist), which allows for highly sophisticated
xqueries if necessary. Identifying the elements in
the video stream and marking up the support documents
are done manually with commercial video playback and
XML editors (see figure 1). Our approach is complementary
in many ways to that of the AXE project (Reside,
2007).
A proof-of-concept was constructed using PHP and relying on the QuickTime player, due to its rich javascript
API. As QuickTime announces its play head position the
page determines which utterance in each timeline is current
and displays a quickly digestible block of text to the
viewer for each timeline (see figure 2). Any given utterance can be bookmarked and stored for
later retrieval, providing a pinpoint-accurate citation
which addresses the video itself; a “video quote” if you
will (see figure 2). In addition, when the user hovers over
the bookmark, the text of the utterance appears. The entire
corpus or a specific video are fully searchable, with
results being displayed as direct links in to the video.
The same interface conventions are used for the search
feature. Alternative views of the transcript are also available,
including viewing the entire text on-screen, or
choosing a XHTML or XML (P5 TEI) download. This small proof-of-concept was modified slightly for a
French instructor. She wished to assemble a number of
short videos of French-speaking people from around the
world with two goals: to improve students’ francophone
cultural literacy, and to have a corpus amenable to various
kinds of linguistic research. (Caws, 2007) We continue
to add videos to this collection as we obtain them
from around the world.
Our original task of “rescuing” the hundreds of videotaped
lectures has been scaled up to become a real project,
encouraging us to re-imagine the possibilities and
address the shortcomings of our proof-of-concept application.
We see version 2 including a more refined feature
set, with the codebase moving from PHP to Cocoon in
order to improve the portability and modularity of the
system. Refinements will include an online system for
writing transcriptions and reducing our dependency on
media players by utilizing new features in HTML5. This
functionality can also be used to provide an annotative
channel that is accessible to all users. Storage and “playback”
of annotative snippets can provide a rich layer of
added value without incurring large investments of development
time because it recycles the immensely useful
transcription code; this wiki-like feature has obvious
value in both teaching and research contexts.
Working backward from the concept of the “video
quote” we began to imagine a context in which an entire
corpus of video texts is peer-reviewed and published online.
Not original, unfortunately (http://www.jove.com/,
http://www.vjortho.com/), but the recent emergence of
such journals indicates that we may all be re-thinking
what constitutes an academic journal. We do not envisage
a simple transplant of the format in use by the above
journals. Rather, we will have to extend it to include rich
layers of annotative, transcriptive and ancillary information
which can provide a discoverable and searchable
corpus of texts which can be referenced on a granular
level, thus providing the opportunity to sample, compare
and “mash up” a corpus of video-based documents to
meet either research or instructional goals.
References
Arneil, S (2007) Francotoile, Your Gateway to Francophone
Culture http://francotoile.uvic.ca/search.php (accessed
1 Nov 2008).
Caws, C. and Arneil, S. (2007). FrancoToile: a digital
video library to develop cultural literacy in French. In
C. Montgomerie & J. Seale (Eds.), Proceedings of World
Conference on Educational Multimedia, Hypermedia
and Telecommunications 2007 (pp. 1989-1994). Chesapeake,
VA: AACE.
Elkink, M (2006) The Lansdowne Lectures Online,
http://lettuce.tapor.uvic.ca/~taprlans/ (accessed 1 Nov
2008).
Reside, D. (2007). The AXE Tool Suite: Tagging Across
Time and Space http://www.digitalhumanities.org/
dh2007/abstracts/xhtml.xq?id=145 (accessed 1 Nov
2008).
Saltz D Z (2004). Performing Arts. In Susan Schreibman,
Ray Siemens, John Unsworth (eds), A Companion
to Digital Humanities. Oxford: Blackwell, 2004.
http://www.digitalhumanities.org/companion/view? xml&chunk.id=ss1-2-11&toc.depth=1&toc.id=ss1-2-
11&brand=default (accessed 1 Nov 2008).
Unsworth, J. (2000). Scholarly Primitives: what methods
do humanities researchers have in common, and
how might our tools reflect this?. http://jefferson.village.
virginia.edu/~jmu2m/Kings.5-00/primitives.html (accessed
1 Nov 2008).

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Conference website: http://web.archive.org/web/20130307234434/http://mith.umd.edu/dh09/

Series: ADHO (4)

Organizers: ADHO

An Approach to Treating Videos as Academic Documents

1. Stewart Arneil

2. Greg Newton

ADHO - 2009