Most of the video information retrieval systems today
rely on some set of computationally extracted
video and/or audio features, which may be complemented
with manually created annotation that is usually either arduous to create or significantly impaired at capturing the content. In this research we set out not only to find the computational features relevant to movies, but also
to investigate how much information could be semi-
automatically extracted from the production documentation
created during the actual production stages a film goes through. It was our hypothesis that this documentation, which has been carefully created for the realization of a movie, would prove highly useful as a source of metadata describing the finished movie.
This paper presents research done at the MediaTeam Oulu research group  on identifying and structuring
the key metadata descriptors applicable to motion
picture content analysis and use of motion picture content in video-on-demand applications. The underlying research
includes a study of the concepts involved in narrative structure and a series of empirical viewer tests through which a key set of metadata describing the content and structure of motion picture material was identified. Parts of this research were conducted in co-operation with the Department of Finnish, Information Studies and Logopedics at the University of Oulu .
First, established theories and concepts of narration were utilized to examine how movies are structured and what mechanisms are used to package and convey information to the viewer. Studying the formalisms and conventions governing film scripts and screenplays, we attempted to identify and characterize the key elements in movies, e.g. characters, actions and various types of plot points . In addition to these primary elements, we also looked at supporting elements , i.e. the kinds of production-related techniques and instruments the filmmakers use in trying to guide the viewer’s attention throughout a
movie. We found that, for example, editing rhythm and various changes or events in the audiovisual composition
of a movie are among the most frequently utilized
instruments for highlighting certain sections of a movie where the user should pay more (or less) attention. Our goal in studying these conventions and story-telling instruments
was to first understand the domain of the filmmaker - his intended form and function for a movie - before looking at what actions or reactions the movie causes on the part of the viewer.
Secondly, a series of tests using short clips, trailers and an entire movie was carried out in order to investigate
how people perceive and process movies .
Questionnaires and interviews provided information on what kinds of things viewers notice and how they later
describe what they have seen. This information was used to
arrive at a key set of metadata that models movies using the same concepts as viewers do in describing them. It was then our task to match these concepts to the
instruments used by the filmmakers, in order to find a metadata model that is both usable and feasible to create through a semi-automatic process from what is offered by the movie. Furthermore, the model thus constructed was designed hierarchical in order to facilitate dynamic control over the level of detail in any given metadata
category, thereby enabling, for example, the smart
summarization of movies on multiple levels. This metadata
model then became the starting point for the design of the actual browser that an end-user would utilize in
navigating and searching movies.
The next step was to identify the main sources for
obtaining this metadata and the best methods for
obtaining it. In studying the movie production stages and
the documentation relating to each stage, we found that the final script and storyboards, as well as the audio and video tracks of the finished movie were the most
interesting and also most practical sources. A
comprehensive suite of both automatic and semi-
automatic tools for processing these documents and media
objects was developed in order to extract the necessary features and information, such as the participants in any given scene, the speakers and the level of interaction between them, motion activity detected on the video track and a wide range of sound properties extracted from the audio track. These tools included a ScriptTagger,
StoryLinker, Scene Hierarchy Editor and a management tool for the underlying database.
The ScriptTagger is a tool which takes in the raw text of the script and turns it automatically into a structured
XML document, which can then be further refined
semi-automatically to facilitate the tagging of more
advanced features. The StoryLinker is a tool used to bring
together the script, the storyboards and the video and audio tracks of the movie itself to form individual
scenes, where all of the above are linked. The Scene Hierarchy Editor is a semi-automatic tool for grouping together individual scenes using the model constructed on the basis of narrative structure and viewer tests, thus
constructing a hierarchical description of the movie. In addition to these, a number of tools were used to
extract the audio and video features. A management tool
was used to combine the output produced by all the tools above to construct a uniform database.
Ultimately, a prototype of a content-based browser for accessing movies using the metadata model and specialized
feature visualization was developed. The implemented browser prototype is a multimodal, content-based retrieval
system that enables a wide range of searches based on
the hierarchical metadata. This prototype offers users
customized views into a movie using user-selectable criteria.
The main view offered by the prototype is a navigable hierarchical map of the movie, where the user-selected features are visualized as navigation aids. Users can then move from a higher level down to increasingly detailed descriptions of the movie, i.e. from acts to segments and ultimately to scenes, enabling them to navigate their way down to a particular sequence, ultimately allowing cross comparison of similar sequences in other movies in the database. Alternatively, users may progress through a
selected movie on some chosen level of detail, thus
enabling them to see the structure of the movie based on the criteria they have chosen. The criteria can be changed on any level or at any point while browsing, i.e. features can be added or removed, as the user sees fit based on how well the features are applicable to the material on any given level and how well they answer the overall search needs of the user. The browser can visualize any new features as long as those features are submitted into the system in the correct format.
The browser and its associated metadata creation tools have numerous applications ranging from, for example,
commercial video-on-demand applications for both
consumers and media publishing houses to more research
oriented ones, such as analysis tools not only for film
studies but indeed also for linguistic research into dramatic
content, for example of movies and television series. The system could, for example, be used to find certain kinds of conversations or sequences of interest based on their
content, structure or audiovisual makeup. The tool is
suitable for incorporating non-linguistic information into linguistic information, which has various applications when studying multimodal content, for example in the
investigation of expressions of stance, where paralinguistic
features complement the linguistic realization of attitude and emotion.
The ideas developed and lessons learned from the
construction of these tools and browser will also be
applied to a new electronic framework for the collection,
management, online display, and exploitation of
corpora, which is being developed within the LICHEN project (The Linguistic and Cultural Heritage Electronic Network) .
 MediaTeam Oulu research group, http://www. mediateam.oulu.fi/?lang=en
 Department of Finnish, Information Studies and Logopedics, University of Oulu, http://www.oulu.fi/silo/
 Field S. Screenplay – The Foundations of Screenwriting. Dell Publishing. 1994.
 Adams B. Mapping the Semantic Landscape of Film: Computational Extraction of Indices through Film Grammar. Curtin University of Technology Ph.D. Thesis, 2003.
 Lilja J, Juuso I, Kortelainen T, Seppänen T, Suominen V. Mitä katsoja kertoo elokuvasta – elokuvan sisäisten
elementtien tunnistaminen ja sisällönkuvailu.
Informaatiotutkimus 23. 2004.
 Opas-Hänninen LL, Seppänen T, Juuso I, Hosio M, Marjomaa I, Anderson J. The LInguistic and Cultural
Electronic Network (LICHEN): Digitizing and disseminating
linguistic data for research and enhancement of the future of minority languages. Second International Conference
on Arctic Research Planning (ICARP II), November
10-12, 2005, Copenhagen, Denmark.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)
July 5, 2006 - July 9, 2006
151 works by 245 authors indexed
The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.
Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/