Working with Alignment of Text and Sound in Spoken Corpora

poster / demo / art installation
  1. 1. Knut Hofland

    University of Bergen

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The Bergen Corpus of London Teenage Language (COLT) has been transcribed and the cassette tapes have been digitized to Windows WAV-files (9 GB). The texts have been time aligned at the word level with the sound files by the company Softsound in the UK. The poster will describe how this material is made available through the Corpus WorkBench from IMS in Stuttgart. The user can search in the corpus by means of a Web-browser and from the resulting concordance play the corresponding sound to each occurrence (5-15 seconds). For this purpose a program was written to deliver small pieces of a sound file across the Web. These sound extracts can be saved by the user and further analyzed by signal processing programs.

Two Norwegian spoken corpora are also available for searching in this way. In the one corpus, a mark was put manually in the transcripts for every 10 seconds. A program then generated an interpolated time stamp for each word. In the other corpus, the program SyncWriter was used while transcribing the text. This program keeps track of time information for each unit which is transcribed. This information can be extracted from the data file together with the text. The time stamp for each word is interpolated between these values and the text and time information are indexed by the search software.


COLT: <>
Softsound Speech/Text alignment <>
Corpus WorkBench <>
SyncWriter <>
Demo concordance <>

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review


Hosted at University of Glasgow

Glasgow, Scotland, United Kingdom

July 21, 2000 - July 25, 2000

104 works by 187 authors indexed

Affiliations need to be double-checked.

Conference website:

Series: ALLC/EADH (27), ACH/ICCH (20), ACH/ALLC (12)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None