Summary information

Study title

British English Corpus, 2021

Creator

Baker, P, Lancaster University

Study number / PID

857510 (UKDA)

10.5255/UKDA-SN-857510 (DOI)

Data access

Open

Series

Not available

Abstract

Corpus linguistics is a British success story, revolutionising language analysis and teaching through computer-assisted examination of vast datasets. Since 2013, the Centre for Corpus Approaches to Social Science has extended this method beyond linguistics to social sciences, tackling issues like hate crime, climate change, financial reporting, and education. It has also developed freely accessible tools and resources, establishing itself as a global leader. This dataset is a one-million-word sample of written British English across 15 different genres/registers. It follows the Brown family sampling structure and is available for searches and data visualisations via #LancsBox (https://lancsbox.lancs.ac.uk) as a subcorpus of a lager Brown family corpus.Corpus linguistics is a UK success story. It is an approach to the study of language, pioneered in large part by UK researchers, that uses computers to permit the analysis of millions, or even billions, of words of data to look for patterns of usage that are not necessarily observable otherwise. Corpus linguistics has revolutionised linguistics, changing the ways that language is analysed and how languages are taught. It is therefore an increasingly well established approach to the study of language among linguists. Yet the analysis of language is not the sole preserve of linguists but, rather, is a thread that runs through all of the social sciences. Since 2013, the Centre for Corpus Approaches to Social Science has brought the benefits of the corpus approach to a range of social science disciplines (including Criminology, Sociology, Accountancy and Psychology), and has enabled new approaches to answering questions in those disciplines (e.g. on understanding hate crime, views on climate change, financial accounting, and learning in primary schools). The Centre has also produced new tools and resources for the large-scale study of language, and made them available free of charge to academics and non-academics in...

Media, communications and languageSociety and culture

ENGLISH (LANGUAGE)LINGUISTICS (ELSST)LINGUISTIC ANALYSIS (ELSST)LANGUAGES AND LINGUISTICS EDUCATION (ELSST)LANGUAGE (ELSST)NATIONAL LANGUAGES (ELSST)2024

Data collection period

Not available

Country

United Kingdom

Time dimension

Not available

Analysis unit

Text unit

Universe

Not available

Sampling procedure

Not available

Kind of data

Text

Data collection mode

The collection consists of corpus linguistics, stratified random sampling, brown family sampling structure, part-of-speech annotationand semantic annotation.

Grant number

ES/R008906/1

Publisher

UK Data Service

Publication year

2024

Terms of data access

The Data Collection is available from an external repository. Access is available via Related Resources.

Not available

Study title

Creator

Study number / PID

Data access

Series

Abstract

Topics

Keywords

Methodology

Data collection period

Country

Time dimension

Analysis unit

Universe

Sampling procedure

Kind of data

Data collection mode

Funding information

Grant number

Access

Publisher

Publication year

Terms of data access

Related publications