Summary information

Study title

British English Corpus, 2021

Creator

Baker, P, Lancaster University

Study number / PID

857510 (UKDA)

10.5255/UKDA-SN-857510 (DOI)

Data access

Open

Series

Not available

Abstract

Corpus linguistics is a British success story, revolutionising language analysis and teaching through computer-assisted examination of vast datasets. Since 2013, the Centre for Corpus Approaches to Social Science has extended this method beyond linguistics to social sciences, tackling issues like hate crime, climate change, financial reporting, and education. It has also developed freely accessible tools and resources, establishing itself as a global leader. This dataset is a one-million-word sample of written British English across 15 different genres/registers. It follows the Brown family sampling structure and is available for searches and data visualisations via #LancsBox (https://lancsbox.lancs.ac.uk) as a subcorpus of a lager Brown family corpus.Corpus linguistics is a UK success story. It is an approach to the study of language, pioneered in large part by UK researchers, that uses computers to permit the analysis of millions, or even billions, of words of data to look for patterns of usage that are not necessarily observable otherwise. Corpus linguistics has revolutionised linguistics, changing the ways that language is analysed and how languages are taught. It is therefore an increasingly well established approach to the study of language among linguists. Yet the analysis of language is not the sole preserve of linguists but, rather, is a thread that runs through all of the social sciences. Since 2013, the Centre for Corpus Approaches to Social Science has brought the benefits of the corpus approach to a range of social science disciplines (including Criminology, Sociology, Accountancy and Psychology), and has enabled new approaches to answering questions in those disciplines (e.g. on understanding hate crime, views on climate change, financial accounting, and learning in primary schools). The Centre has also produced new tools and resources for the large-scale study of language, and made them available free of charge to academics and non-academics in...
Read more

Methodology

Data collection period

Not available

Country

United Kingdom

Time dimension

Not available

Analysis unit

Text unit

Universe

Not available

Sampling procedure

Not available

Kind of data

Text

Data collection mode

The collection consists of corpus linguistics, stratified random sampling, brown family sampling structure, part-of-speech annotationand semantic annotation.

Funding information

Grant number

ES/R008906/1

Access

Publisher

UK Data Service

Publication year

2024

Terms of data access

The Data Collection is available from an external repository. Access is available via Related Resources.

Related publications

Not available