Summary information
Study title
Transcription textGrids for the audio edition of the British National Corpus 1993
Creator
Coleman, J, University of Oxford
Study number / PID
851496 (UKDA)
10.5255/UKDA-SN-851496 (DOI)
Data access
Open
Series
Not available
Abstract
This collection comprises the Praat TextGrids for time-aligned transcriptions of the Audio BNC sound files. Transcriptions are time-aligned at the word and phoneme levels. The collection reflects the state of our transcriptions at the end date of the project. The files, together with the .wav files to which they relate, are also available from the Audio BNC server, http://bnc.phon.ox.ac.uk/.
To use the data deposited in this zipfile:
1) Unzip the zipfile. This yields a large folder of Praat TextGrids.
2) The Praat TextGrids may be viewed using Praat software (freely available from www.praat.org), or using any simple text editor. Praat can also display the TextGrid annotation files time-aligned to the Audio BNC audio .wav files. (These audio files are separately available from http://www.phon.ox.ac.uk/AudioBNC; we do not have the rights to upload them to the UK Data Service.)
The syntax of the TextGrid file names combines the alphanumeric filename of the corresponding .wav audio file, the 6-digit conversation number employed in the previously-published BNC transcripts and the 3-character alphanumeric transcription/recording code. Thus,
021A-C0897X0004XX-AAZZP0_000406_KDP_1.TextGrid
cross-refers to the .wav file http://bnc.phon.ox.ac.uk/data/021A-C0897X0172XX-ABZZP0.wav, and to conversation 000406 from recording KDP, division () 1.
A summary index to all the transcriptions (arranged by three-character BNC code) is given at http://bnc.phon.ox.ac.uk/transcripts-html/ and further details and links about the complete corpus, file naming conventions and on-line locations, is given at http://www.phon.ox.ac.uk/AudioBNC.
Publications documenting how this data was collected and prepared, and how we have used it in our research, are available at
http://gtr.rcuk.ac.uk/project/CD8C7191-EF60-41B8-BC80-A015ACCEC8EB#tabPublications.In this research project, Professor John Coleman and his co-workers at Oxford University Phonetics Laboratory and the University of Pennsylvania...
Read moreTopics
Keywords
Methodology
Data collection period
Not availableCountry
United Kingdom
Time dimension
Not availableAnalysis unit
Time unit
Text unit
Other
Universe
Not availableSampling procedure
Not availableKind of data
Numeric
Data collection mode
The original audio recordings were transcribed in ordinary English spelling by professional audio typists.The typed transcripts were time-aligned to the audio using forced alignment (an application of automatic speech recognition technology).The Praat TextGrids deposited in this collection are the resulting transcription data files.
Funding information
Grant number
RES-062-23-2566
Access
Publisher
UK Data Service
Publication year
2014
Terms of data access
The Data Collection is available to any user without the requirement for registration for download/access.
Related publications
Not available