Summary information

Study title

Transcription textGrids for the audio edition of the British National Corpus 1993

Creator

Coleman, J, University of Oxford

Study number / PID

851496 (UKDA)

10.5255/UKDA-SN-851496 (DOI)

Data access

Open

Series

Not available

Abstract

This collection comprises the Praat TextGrids for time-aligned transcriptions of the Audio BNC sound files. Transcriptions are time-aligned at the word and phoneme levels. The collection reflects the state of our transcriptions at the end date of the project. The files, together with the .wav files to which they relate, are also available from the Audio BNC server, http://bnc.phon.ox.ac.uk/. To use the data deposited in this zipfile: 1) Unzip the zipfile. This yields a large folder of Praat TextGrids. 2) The Praat TextGrids may be viewed using Praat software (freely available from www.praat.org), or using any simple text editor. Praat can also display the TextGrid annotation files time-aligned to the Audio BNC audio .wav files. (These audio files are separately available from http://www.phon.ox.ac.uk/AudioBNC; we do not have the rights to upload them to the UK Data Service.) The syntax of the TextGrid file names combines the alphanumeric filename of the corresponding .wav audio file, the 6-digit conversation number employed in the previously-published BNC transcripts and the 3-character alphanumeric transcription/recording code. Thus, 021A-C0897X0004XX-AAZZP0_000406_KDP_1.TextGrid cross-refers to the .wav file http://bnc.phon.ox.ac.uk/data/021A-C0897X0172XX-ABZZP0.wav, and to conversation 000406 from recording KDP, division () 1. A summary index to all the transcriptions (arranged by three-character BNC code) is given at http://bnc.phon.ox.ac.uk/transcripts-html/ and further details and links about the complete corpus, file naming conventions and on-line locations, is given at http://www.phon.ox.ac.uk/AudioBNC. Publications documenting how this data was collected and prepared, and how we have used it in our research, are available at http://gtr.rcuk.ac.uk/project/CD8C7191-EF60-41B8-BC80-A015ACCEC8EB#tabPublications.In this research project, Professor John Coleman and his co-workers at Oxford University Phonetics Laboratory and the University of Pennsylvania...
Read more

Methodology

Data collection period

Not available

Country

United Kingdom

Time dimension

Not available

Analysis unit

Time unit
Text unit
Other

Universe

Not available

Sampling procedure

Not available

Kind of data

Numeric

Data collection mode

The original audio recordings were transcribed in ordinary English spelling by professional audio typists.The typed transcripts were time-aligned to the audio using forced alignment (an application of automatic speech recognition technology).The Praat TextGrids deposited in this collection are the resulting transcription data files.

Funding information

Grant number

RES-062-23-2566

Access

Publisher

UK Data Service

Publication year

2014

Terms of data access

The Data Collection is available to any user without the requirement for registration for download/access.

Related publications

Not available