Summary information

Study title

Amazon Mechanical Turk: Sentence annotation experiments

Creator

Lau, J, King's College London
Lappin, S, King's College London

Study number / PID

851337 (UKDA)

10.5255/UKDA-SN-851337 (DOI)

Data access

Restricted

Series

Not available

Abstract

This data collection consists of two .csv files containing lists of sentences with individual and mean sentence ratings (crowd sourced judgements) on three modes of presentation. This research holds out the prospect of important impact in two areas. First, it can shed light on the relationship between the representation and acquisition of linguistic knowledge on one hand, and learning and the encoding of knowledge in other cognitive domains. This work can, in turn, help to clarify the respective roles of biologically conditioned learning biases and data driven learning in human cognition. Second, this work can contribute to the development of more effective language technology by providing insight, from a computational perspective, into the way in which humans represent the syntactic properties of sentences in their language. To the extent that natural language processing systems take account of this class of representations they will provide more efficient tools for parsing and interpreting text and speech.In the past twenty-five years work in natural language technology has made impressive progress across a wide range of tasks, which include, among others, information retrieval and extraction, text interpretation and summarization, speech recognition, morphological analysis, syntactic parsing, word sense identification, and machine translation. Much of this progress has been due to the successful application of powerful techniques for probabilistic modeling and statistical analysis to large corpora of linguistic data. These methods have given rise to a set of engineering tools that are rapidly shaping the digital environment in which we access and process most of the information that we use. In recent work (Lappin and Shieber (2007), Clark and Lappin (2011a), Clark and Lappin (2011b)) my co-authors and I have argued that the machine learning methods that are driving the expansion of natural language technology are also directly relevant to understanding...
Read more

Methodology

Data collection period

01/10/2012 - 30/09/2015

Country

United Kingdom, United States

Time dimension

Not available

Analysis unit

Individual

Universe

Not available

Sampling procedure

Not available

Kind of data

Numeric
Text

Data collection mode

Amazon Mechanical Turk crowd sourcing

Funding information

Grant number

ES/J022969/1

Access

Publisher

UK Data Service

Publication year

2014

Terms of data access

Not available

Related publications

Not available