The catalogue contains study descriptions in various languages. The system searches with your search terms from study descriptions available in the language you have selected. The catalogue does not have ‘All languages’ option as due to linguistic differences this would give incomplete results. See the User Guide for more detailed information.
Corpus linguistics is a British success story, revolutionising language analysis and teaching through computer-assisted examination of vast datasets. Since 2013, the Centre for Corpus Approaches to Social Science has extended this method beyond linguistics to social sciences, tackling issues like hate crime, climate change, financial reporting, and education. It has also developed freely accessible tools and resources, establishing itself as a global leader.
This dataset is a one-million-word sample of written British English across 15 different genres/registers. It follows the Brown family sampling structure and is available for searches and data visualisations via #LancsBox (https://lancsbox.lancs.ac.uk) as a subcorpus of a lager Brown family corpus.Corpus linguistics is a UK success story. It is an approach to the study of language, pioneered in large part by UK researchers, that uses computers to permit the analysis of millions, or even billions, of words of data to look for patterns of usage that are not necessarily observable otherwise. Corpus linguistics has revolutionised linguistics, changing the ways that language is analysed and how languages are taught. It is therefore an increasingly well established approach to the study of language among linguists. Yet the analysis of language is not the sole preserve of linguists but, rather, is a thread that runs through all of the social sciences. Since 2013, the Centre for Corpus Approaches to Social Science has brought the benefits of the corpus approach to a range of social science disciplines (including Criminology, Sociology, Accountancy and Psychology), and has enabled new approaches to answering questions in those disciplines (e.g. on understanding hate crime, views on climate change, financial accounting, and learning in primary schools). The Centre has also produced new tools and resources for the large-scale study of language, and made them available free of charge to academics and non-academics in...
Terminology used is generally based on DDI controlled vocabularies: Time Method, Analysis Unit, Sampling Procedure and Mode of Collection, available at CESSDA Vocabulary Service.
Methodology
Data collection period
Not available
Country
United Kingdom
Time dimension
Not available
Analysis unit
Text unit
Universe
Not available
Sampling procedure
Not available
Kind of data
Text
Data collection mode
The collection consists of corpus linguistics, stratified random sampling, brown family sampling structure, part-of-speech annotationand semantic annotation.
Funding information
Grant number
ES/R008906/1
Access
Publisher
UK Data Service
Publication year
2024
Terms of data access
The Data Collection is available from an external repository. Access is available via Related Resources.