Bringing Southern Oral Stories Online

Natasha Smith; Hugh Cayless; Joshua Berkov; Cliff Dyer

Authorship

1. Natasha Smith

University of North Carolina at Chapel Hill
2. Hugh Cayless

New York University
3. Joshua Berkov

School of Communication Arts
4. Cliff Dyer

University of North Carolina at Chapel Hill

Work text

This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

I
n recent years, oral histories have become an alternative
medium for interrogating the past and have assumed
a prominent place in historical inquiry. They
offer unique perspectives from individuals who have
witnessed history in the making and often yield unparalleled
insight into the lives and times that they record.
Long constrained by the media used to record them,
oral histories are increasingly the target of digitization
initiatives that seek to preserve these voices and make
them heard by disparate audiences. “Oral histories of the
American South” (http://docsouth.unc.edu/sohp/) is just
such an initiative.
From its beginnings as a pilot project in 2004, this endeavor,
funded by the Institute of Museum and Library
Services, grew rapidly and attracted attention.
Documenting the American South, a digital publishing
program at the Carolina Digital Library and Archives
(CDLA), worked in close cooperation with a number
of departments at the University of North Carolina at
Chapel Hill – the UNC Library, Southern Oral Histories
program (SOHP), and School of Education – and applied
new technologies, open standards and some of the tested
practices highlighted in other DocSouth collections. Far
from being simply a collection of digitized documents,
these oral histories undergo rigorous analysis by subject
specialists. The practice of applying such scholarship
has added considerable value to other recently published
collections and has brought together the perspectives of
historians and the first-hand experiences of witnesses to
history. Finally, it is an experimental project to build an interface to simultaneously display audio and transcripts
from interviews.
The project will be complete by the time of the conference
in June 2009 and we are proposing to present a
poster on the work we have done, highlighting the challenges
and joy experienced by the Project team of librarians,
humanists, technologists, and – not to forget – users
who participated in several usability testings and studies.
The process of creating digital documents of these oral
histories is indeed a challenge. From the beginning—the
process of selection is a daunting prospect unto itself. It
is the task of historians from the SOHP to select 500+
representative interviews from a collection that now
numbers well over 4,000. This includes careful attention
to privacy and copyright issues. Only interviews free of
restriction are considered for inclusion in the project collection.
Cassette tapes and typescripts are the raw materials from
which these interviews are remade into digital objects.
Audio engineers at the Southern Folklife Collection,
using the best available hardware, software, and their
own considerable expertise, produce digital audio files
in both WAV (preservation) and MP3 (access) formats.
These audio data are of the highest possible quality, and
comply with international standards and best practices
for the creation and preservation of digital audio content.
The typescripts are all encoded in TEI P4 (with the plan
of conversion to P5), conforming to level 4. But the story
doesn’t end here…
Once created, these newly digitized interviews are subjected
to intense historical analysis by subject specialists
in the Southern Oral History Program (SOHP). With the
guidance of scholarly advisors, specialists, mainly PhD
History students, read through the transcript, write abstracts,
create descriptive titles, and select particularly
powerful segments. Their decisions are based on a number
of criteria, from intrigue to major historical relevance
and from uniqueness to conformation of commonality.
Once these segments are chosen, the PhD students then
assign keywords (category/subcategory combinations)
to each of these segments, and a given segment will usually
have multiple keywords. The keywords are then given
a rank, depending on their relevance to the segment.
A given segment might have 3 keywords, all of varying
degree if importance or relevance to the segment itself.
Assigning these keywords and then prioritizing them
is what makes our soon-to-be released new advanced
search so effective. A user can type in a keyword and immediately
retrieve the interviews that are most relevant
due to our efforts to assign and then prioritize these keywords.
Finally, PhD students write short descriptions for
the selected segments and provide the historical context
to the interviews.
Once complete, these interviews return to DocSouth,
where they are prepared for publication. DocSouth staff
collect various existing metadata from a number of
sources to create rich records for each interview, enhancing
the XML transcripts and adding the interviews to the
MySQL database. These metadata are subsequently used
by library catalogers to generate MARC records to further
enhance retrieval. A trained specialist listens to the
interviews and inserts timestamps into the files based on
the selections made by the SOHP. These text timestamps
become points of entry into the audio—clicking on them
plays back the audio for that segment.
The interviews are displayed as flat HTML files, generated
in advance from the TEI source files using XSLT.
We have implemented a cutting-edge advanced search
interface that is constructed with Python/Django forms
and templates. The form framework provides hooks for
robust input validation, while the templates separate the
content from the display for us, so non-tech-savvy designers
can help craft an elegant search interface. The
search index is built on Solr, a lucene-based search engine
which allows for fast, flexible searching on full-text
or fielded queries. Database access is in many places facilitated
by a robust Object Relational Mapper written
in Python called SQLAlchemy. This allows queries to
the DB to occur seamlessly, without having to rely on
cumbersome, fragile SQL queries.
The success of bringing these valuable documents out
of archives and into the eyes and ears of the public
come from the efforts of its many participants, including
librarians, historians, and education specialists. The
project team members and the various institutions and
interests they represent work together to produce digitized
oral history interviews that are the result of the
application of technology solutions, adherence to standards,
scholarship, and exceptional technical expertise.
All of these efforts bring additional value to oral history
interviews that are moving and insightful stories of an
American South in the process of profound and irrevocable
transformation.
References
http://www.tei-c.org/wiki/index.php/TEI_in_Libraries:_
Guidelines_for_Best_Practices
http://www.djangoproject.com/
http://www.sqlalchemy.org/ http://lucene.apache.org/solr

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Conference website: http://web.archive.org/web/20130307234434/http://mith.umd.edu/dh09/

Series: ADHO (4)

Organizers: ADHO

Bringing Southern Oral Stories Online

1. Natasha Smith

2. Hugh Cayless

3. Joshua Berkov

4. Cliff Dyer

ADHO - 2009