The catalogue contains study descriptions in various languages. The system searches with your search terms from study descriptions available in the language you have selected. The catalogue does not have ‘All languages’ option as due to linguistic differences this would give incomplete results. See the User Guide for more detailed information.
This is a dataset with tweets from X. Each tweet mentions one or more UK MPs from a subset selected for our study to give a diverse representation of political leanings. Each tweet is labelled for hostility and the identity characteristic it targets (religion, race, gender). Each annotator also provides a confidence score for each label. Three annotators annotate each tweet. Annotators are UK-based students from Computer Science and Politics.Toxic and abusive language threaten the integrity of public dialogue and democracy. Abusive language, such as taunts, slurs, racism, extremism, crudeness, provocation and disguise are generally considered offensive and insulting, has been linked to political polarisation and citizen apathy; the rise of terrorism and radicalisation; and cyberbullying. In response, governments worldwide have enacted strong laws against abusive language that leads to hatred, violence and criminal offences against a particular group. This includes legal obligations to moderate (i.e., detection, evaluation, and potential removal or deletion) online material containing hateful or illegal language in a timely manner; and social media companies have adopted even more stringent regulations in their terms of use. The last few years, however, have seen a significant surge in such abusive online behaviour, leaving governments, social media platforms, and individuals struggling to deal with the consequences.
The responsible (i.e. effective, fair and unbiased) moderation of abusive language carries significant practical, cultural, and legal challenges. While current legislation and public outrage demand a swift response, we do not yet have effective human or technical processes that can address this need. The widespread deployment of human content moderators is costly and inadequate on many levels: the nature of the work is psychologically challenging, and significant efforts lag behind the deluge of data posted every second. At the same time,...
Terminology used is generally based on DDI controlled vocabularies: Time Method, Analysis Unit, Sampling Procedure and Mode of Collection, available at CESSDA Vocabulary Service.
Methodology
Data collection period
01/02/2020 - 31/01/2024
Country
United Kingdom
Time dimension
Not available
Analysis unit
Text unit
Universe
Not available
Sampling procedure
Not available
Kind of data
Text
Data collection mode
We collected data from X using the Twitter API v1.1. The collector collected tweets based onUK MPs' user accounts (X handles). Four types of tweets were collected - tweets by the MPs, replies to their tweets, retweets by the MPs, and retweets of their tweets.
Funding information
Grant number
ES/T012714/1
Access
Publisher
UK Data Service
Publication year
2024
Terms of data access
The Data Collection is available from an external repository. Access is available via Related Resources.