Summary information

Study title

Hate speech and social acceptance of migrants in Europe: Analysis of tweets with geolocation

Creator

Arcila Calderón, Carlos (University of Salamanca)

Study number / PID

doi:10.17903/FK2/G83HNY (DOI)

Data access

Information not available

Series

Not available

Abstract

The data project includes large-scale longitudinal analysis (2015-2020) of online hate speech on Twitter (N=847,978). A tweet database was generated: collected tweets using Twitter’s Application Programming Interface (API) (v2 full-archive search endpoint, using Academic research product track), which provides access to the historical archive of messages since Twitter was created in 2006. To download the tweets, we first defined the search filter by keyword and geographic zones using the Python programming language and the NLTK, Tensorflow, Keras and Numpy libraries. We established generic words directly related with the topic, taking into account linguistic agreement in Spanish (i.e., gender and number inflections) but without considering adjectives, for instance: migrant, migrants, immigrant, immigrants, refugee (both in masculine and feminine forms in Spanish), refugees (both in masculine and feminine forms in Spanish), asylum seeker, asylum seekers (the keywords are available as supplementary materials here. For the process of hate speech detection in tweets, we used as a basis a tool created and validated by Vrysis et al. (2021). For this research, the tool has been retrained with: supervised dictionary-based term detection; and also taking an unsupervised approach (machine learning with neural networks) Using a corpus of 90,977 short messages, from which 15,761 were in Greek (5,848 with hate toward immigrants), 46,012 were in Spanish (11,117 with hate toward immigrants) and 29,204 in Italian (5,848 with hate toward immigrants). This corpus comes from two sources: the import of already classified messages in other databases (n=57,328, of which 5,362 are generic messages in Greek, 23,787 are generic messages and 9,727 are messages with hate toward immigrants in Spanish, and 18,452 are generic messages in Italian), and the other from messages manually coded by local trained analysts (in Spain, Greece and Italy), using at least 2 coders with...

Media

Not available

Data collection period

Not available

Country

Time dimension

Not available

Analysis unit

Media unit: Text

Universe

Not available

Sampling procedure

Not available

Kind of data

Not available

Data collection mode

Content coding

Other

Publisher

Κατάλογος Δεδομένων SoDaNet

Publication year

2024

Terms of data access

Not available

Study title

Creator

Study number / PID

Data access

Series

Abstract

Topics

Keywords

Methodology

Data collection period

Country

Time dimension

Analysis unit

Universe

Sampling procedure

Kind of data

Data collection mode

Access

Publisher

Publication year

Terms of data access

Related publications