Standards, Specifications, and Paradigms for Customized Video Playback

panel / roundtable
  1. 1. Jarom Lyle McDonald

    Brigham Young University

  2. 2. Alan K. Melby

    Brigham Young University

  3. 3. Harold Hendricks

    Brigham Young University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Culture is fully inundated with video--from the
ubiquitous DVD and succeeding optical media
formats, to locally stored digital bits passed
between DVRs and video iPods, to over a billion
Internet-streamed videos a day. Unfortunately,
while those involved in humanities education
and research know how widespread video usage
is and are attempting to integrate such a rich
medium into what they do, they are more often
than not struggling, fighting against the medium
and associated baggage rather than using video
for their own purposes.
For all of the ways in which video differs
from other forms of media, perhaps the most
challenging obstacle to effectively utilizing video
assets as objects of teaching and research is
their inflexibility. Because of the complexity of
video technologies and the pressure of external
interests, video is an incredibly closed medium,
especially when compared to text, image, or even
audio. In many ways video resists fundamental
activities of digital humanities inquiry such
as metadata, structural, and segment analysis
and annotation. What's more, video also is,
technologically speaking, a linear medium; it
is (as much as if not more than other media)
architected to proceed continuously from point
A to point B, serving up bits in order and
only responding to very limited, legacy interface
controls. Even the "interactivity" touted by
content holders (such as DVD "extras") is a
rigid, linear interactivity, designed to keep the
control of playback under the stewardship and
limited scope of the video producer rather
than the needs of the learner, the desires of
the scholar, or the tastes of the consumer. To
encourage collaborative, resuable approaches to
video (while avoiding legal pitfalls or isolationist
tendencies that come with an extracted clip
approach), we need to incorporate a more
thorough, flexible, and widespread method of
video playback.
The papers in this panel will focus on data-
driven customized video playback (CVP), from
theory and methodology to real-world use cases
that are evolving and practical implementations
that are both already in use as well as
under development to meet the needs of the
Humanities today. The first presentation will
make the case for the fundamental groundwork
for video asset analysis and eventual customized
video playback, the Multimedia Content
Description Interface (also known as MPEG-7).
This XML standard for describing (both globally
and in timecode-associated ways) video assets
offers a markup solution that is complementary
to common video encoding containers (such
as MPEG-2 and MPEG-4), and, as XML,
can be easily coupled with other relevant
data and metadata standards as well. The
second paper will present an argument for
ways to take these video asset descriptions
and use them to enable both people and
technology to better facilitate customized video
playback using a videoclip playlist specification
(serializable as plain text, as XML, as JSON,
or as any other data exchange format). With
the segment descriptions of a thorough video
asset description, a videoclip playlist can then
define custom playback operations. The final
presentation will demonstrate several use cases
of customized video playback, along with
working models for achieving the type of
interactivity we desire with the technologies we
have today, including demonstration of a CVP
system in use at several university campuses.
The panel as a whole will seek to argue
a unified justification and methodology of
customized video playback, and invite future
collaboration from the Digital Humanities
community who can, if they desire, push these
ideas further towards making our proposed
standards, specifications, and paradigms as
widespread, useful, and effective as possible.

Finding the Best in
Approaches to Video Asset
McDonald, Jarom Lyle
Brigham Young University, USA
From Google's Web Services to Wikipedia's
DBPedia project to the underlying architecture
of modern digital libraries, our notion of how
to make data more semantic is moving (slowly
but persistently) towards ideal principles that
the W3C lays out for what is commonly called
"the Semantic Web." This is even true for the
subject of my study, video data, albeit with much
less of a semantically-inflected critical mass.
There are a few solid, innovative investigations
(such as the BBC's video portal and the many
incarnations of the Joost video platform) that
are or have been working to bring technologies
such as metadata, RDF/RDFa, and SPARQL
to the storage and dissemination of video
(especially online video); but there is still a
lot of work that needs to be done in order to
make today's video assets truly useful in a way
that Tim Berners-Lee would approve, a world
"in which information is given well-defined
meaning, better enabling computers and people
to work in cooperation" (Scientific American).
While the Semantic Web includes a large
number of topics too broad to cover in this
proposal, I will focus on one particular aspect
of semantic markup that does apply to video
data. It is vital to underscore the unique nature
of video as an object of
--that is,
video is meant to be played for a viewer with
the linear, temporal nature in the forefront of
experiencing the video. Thus to describe the data
of a video asset, as a whole, in a useful way
would necessarily require a structured analysis
of more than just the metadata
the video
that you might be able to achieve with Dublin
Core, IEEE-LOM, or RDF; the most significant
need is a system that can connect such semantic
vocabularies to a thorough, analytic description
of the video content itself in as close an
approximation to the playback act as might
be reasonably able to achieve--in other words,
a workable time-coded markup language. This
isn't to say that a video must be necessarily
viewed chronologically; rather, given that video
exists as bits served from time point A to time
point B, it must be described that way in order
to make use of the data encoded there. If a video
asset has the right description of its segmented,
time-coded parts (of which, naturally, there may
be many versions based on who is doing the
markup or who is using the materials), it will
eventually allow for more than just watching the
video; a segmentation model of video markup
is essential for enabling a system of interactive,
customized video playback.
Several options for such a language to use are
available and have been somewhat explored
both commercially and academically, but none
are completely satisfactory. Naturally, given
the success of the Text Encoding Initiative, it
makes sense to consider its ability to function
as a time-coded video markup system. In fact,
Reside (2007) and Arneil and Newton (2009)
have presented just such an idea at recent
Digital Humanities conferences. The flexibility
and thoroughness of the TEI makes it an
attractive option; however, while the speech
transcription models can potentially provide
time-coded descriptions of spoken elements
of a video (and even be retrofitted to other
elements of video content), because the TEI is
a text-encoding framework, it lacks a temporal
segmentation scheme designed specifically for
existing models of video encoding and playback
(for example, referring to multiple video or
audio tracks, multiplexing metadata with the
binary streams, etc.). Most projects exploring
video markup descriptions also mention the
W3C's Synchronized Multimedia Integration
Language (SMIL). Since version 3.0, SMIL
integrates a temporal segmentation model
with one for spatial fragmentation, allowing
semantic relationships both within and between
video elements. What's more, SMIL is a W3C
recommendation, offering the potential for
tighter integration with web delivered video
as it continues to mature. Several commercial
endeavors (including the streaming platform
Hulu) have incorporated SMIL into their
playback process, allowing for a sophisticated
combination of video annotation (for example,
Hulu uses it for their advertisements and
upcoming social viewing features) and search/
retrieval (combining the time-coded markup

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info


ADHO - 2010
"Cultural expression, old and new"

Hosted at King's College London

London, England, United Kingdom

July 7, 2010 - July 10, 2010

142 works by 295 authors indexed

XML available from (still needs to be added)

Conference website:

Series: ADHO (5)

Organizers: ADHO

  • Keywords: None
  • Language: English
  • Topics: None