A Linguistic Time-Capsule: The Newcastle Electronic Corpus of Tyneside English

poster / demo / art installation
  1. 1. Will Allen

    Newcastle University

  2. 2. Joan Beal

    University of Sheffield

  3. 3. Karen Corrigan

    Newcastle University

  4. 4. Hermann Moisl

    Newcastle University

  5. 5. Charley Rowe

    Newcastle University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The projected Newcastle Electronic Corpus of Tyneside English (NECTE), supported by an AHRB Award (2001-2004) will make two large speech samples, collected in 1969 and 1994, respectively, available for consultation by scholars via the WWW. The resource consists of two discrete sets of recorded speech currently housed in the Catherine Cookson Archive of Tyneside and Northumbrian Dialect, at the University of Newcastle. The earlier sample is that gathered during the Tyneside Linguistic Survey (TLS) (Pellowe et al. (1972)/Strang (1968)) funded by the SSRC and the 1994 recordings were collected for an ESRC-funded project (Phonological Variation and Change (PVC)) in the same region (Milroy et al. (1997)). The TLS sample was originally recorded using reel-to-reel audio equipment and consists of 86 loosely-structured interviews which average 30 minutes in length. The informants are equally divided between various social class groupings of male and female speakers and represent young, middle and old age-cohorts. The speakers were drawn from a stratified random sample of Gateshead (the part of the Tyneside conurbation on the south bank of the Tyne). The more recent PVC corpus was collected using high quality audio tape-recorders/microphones and is currently in the form of 20 DAT tapes, each of which averages 60 minutes in length. The recordings were obtained by permitting dyads of friends or relatives to converse freely with minimal interference from the fieldworker. The informants are also equally divided between various social class groupings of male and female speakers and represent young, middle and old age-cohorts.

These two samples, when combined electronically as we propose, will provide a very significant resource for anyone interested in corpus linguistics, dialect geography, historical (English) linguistics and sociolinguistics. The overarching aim of the project, therefore, is to improve access to and promote the re-use of NECTE by producing an electronic database resource in a variety of formats, which can be accessed according to user need. Our aims, therefore, are:-

To subject the NECTE electronic orthographic transcriptions to two auditory 'correction passes' to ensure authentic replication. A third will be conducted on wordlists to verify that the Orthographic Transcription Protocol (OTP) has been implemented consistently.
To provide a database catalogue for both the TLS and PVC corpora.
To translate the 1969 electronic files of phonetic analyses that were created for subsections of the TLS into IPA. This will preserve the highly accurate analytical work of the TLS project and save substantial manual transcription effort.
To create an electronic, IPA phonetic transcription of samples from the PVC project and and conduct two 'correction passes' of these transcriptions.
To parse and tag NECTE for a range of grammatical and discourse markers to enhance its potential applications.
To create a WWW version of the NECTE data sited at the University of Newcastle and a North American mirror-site.
To permit separate access to the database in the following formats: (i) digitised sound files; (ii) electronic IPA/orthographic transcription files; (iii) electronic tagged orthographic transcription files.
To find or develop software to permit the database formats above to be aligned.
We are in the process of fulfilling the objectives outlined above and would like to use the opportunity of the 2004 ALLC conference to : (i) given an overview of and progress report on the project to date and (ii) seek advice from other corpus-builders and potential end-users of NECTE with particular interests in language and computing.


1. Beal, J.C. and Corrigan, K.P. (2000a) 'Comparing the present with the past to predict the future for Tyneside British English', Newcastle and Durham Working Papers in Linguistics, 6:13-27.
2. Beal, J.C. and Corrigan, K.P. (2000b) 'A time-capsule for future generations: The Newcastle-Poitiers Electronic Corpus of Tyneside English', Poster presented at Sociolinguistics Symposium 2000, University of the West of England, 27-29 April, 2000.
3. Beal, J.C. and Corrigan, K.P. (2000c) 'New Ways of Capturing the 'Kodak moment': Real-time vs. Apparent Time Analyses of Syntactic Variation in Tyneside English, 1969-1994', Paper presented at the 2nd Variation is Everywhere Workshop, University of Essex. September, 2000.
4. Beal, J.C. and Corrigan, K.P. (2002) 'Relativization in Tyneside and Northumbrian English', in Poussa, P. (ed.) Dialect Contact and History on the North Sea Littoral. Lincom Europa.
5. Milroy, J. et al. (1997) Phonological Variation and Change in Contemporary Spoken British English. ESRC, Unpublished Final Report, Dept. of Speech, University of Newcastle-Upon-Tyne.
6. Pellowe, J. et al. (1972) 'A dynamic modelling of linguistic variation: the urban (Tyneside) linguistic survey', Lingua, 30: 1-30.
7. Strang, B.M.H. (1968) 'The Tyneside Linguistic Survey'. Zeitschrift für Mundartforschung, NF 4 (Verhandlungen des Zweiten Internationalen Dialecktologenkongresses), pp.788-794.Wiesbaden: Franz Steiner Verlag.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info



Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None