Summary information

Study title

The Oxford Aesop Corpus 2010

Creator

Kochanski, G, University of Oxford
Loukina, A, University of Oxford

Study number / PID

851830 (UKDA)

10.5255/UKDA-SN-851830 (DOI)

Data access

Restricted

Series

Not available

Abstract

The aim of our project is to systematically test and improve these rhythm measurements to be more reliable, easier, and robust enough to use outside the laboratory. This corpus of data consists of short paragraphs and children poetry read by native speakers of Southern British English, Russian (Moscow and St. Petersburg), Green (Athens), Taiwanese Mandarin, and French (Paris). The experimental data consists of speech recordings. It also contains the orthographic texts, automatically generated transcriptions and metadata files. The research project involved reading text from a computer screen in laboratory experiments. The speakers involved were 20-28 years old, born to monolingual parents and had grown in their respective countries. When recording took place, all speakers were living in Oxford, UK. Those that were non-English participants had lived outside their home country for less than 4 years. Speakers also read up to 700 randomly selected short sentences which were intended to use for training an automatic speech recognition system. When we say that music, poetry and language all have rhythms, what is meant by rhythm What accounts for the rhythmic differences between languages or dialects? Within the last decade, techniques for quantitative measurements of rhythm have begun to appear. So far, these rhythm measures require much careful manual marking of the speech, and they are highly dependent on the choice of words. So, they have been limited to carefully designed laboratory experiments. The aim of our project is to systematically test and improve these rhythm measurements to be more reliable, easier, and robust enough to use outside the laboratory. This process will give us clues as to which sounds of speech contribute most to rhythm and ultimately allow us a better understanding of what we mean by the term rhythm. We aim to build tools that will open part of linguistics to quantitative measurements. They will allow researchers to work with more natural...
Read more

Methodology

Data collection period

01/12/2008 - 31/10/2009

Country

United Kingdom

Time dimension

Not available

Analysis unit

Individual
Text unit

Universe

Not available

Sampling procedure

Not available

Kind of data

Audio

Data collection mode

Laboratory experiments were used with volunteers that were born into monolingual families. They were living in Oxford at the time of the research but more than four years prior had lived in the home country (Russia/Greece/Taiwan/France). There were also speakers of English as their first language from South England. They were aged 20-28. The experiment involved reading from a computer screen. In addition to short texts, all speakers also read up to 700 randomly selected short sentences which were intended to use for training an automatic speech recognition. Volunteer sampling was used for this cross-sectional (one-time study).

Funding information

Grant number

RES-062-23-1323

Access

Publisher

UK Data Service

Publication year

2015

Terms of data access

The Data Collection is available for download to users registered with the UK Data Service.

Related publications

Not available