Summary information

Study title

Name: TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 4, January 2021 - August 2022)
Published: 2022

TweetsCOV19 - A Semantically Annotated Corpus of Tweets About the COVID-19 Pandemic (Part 4, January 2021 - August 2022)

Creator

Dimitrov, Dimitar ( GESIS - Leibniz-Institut für Sozialwissenschaften)

Baran, Erdal ( GESIS - Leibniz-Institut für Sozialwissenschaften)

Fafalios, Pavlos ( Institute of Computer Science, FORTH-ICS, Heraklion, Greece)

Yu, Ran ( GESIS - Leibniz-Institut für Sozialwissenschaften)

Zhu, Xiaofei ( Chongqing University of Technology, Chongqing, China)

Zloch, Matthäus ( GESIS - Leibniz-Institut für Sozialwissenschaften)

Dietze, Stefan ( GESIS - Leibniz-Institut für Sozialwissenschaften & Heinrich-Heine-University Düsseldorf, Germany & L3S Research Center, Hannover, Germany)

Study number / PID

10.7802/2470 (GESIS)

10.7802/2470 (DOI)

Data access

Informationen nicht verfügbar

Series

Nicht verfügbar

Abstract

TweetsCOV19 is a semantically annotated corpus of Tweets about the COVID-19 pandemic. It is a subset of TweetsKB and aims at capturing online discourse about various aspects of the pandemic and its societal impact. Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions, and resolved URLs are exposed in RDF using established RDF/S vocabularies (for the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets). More information are available through TweetsCOV19's home page: https://data.gesis.org/tweetscov19/. We also provide a tab-separated values (tsv) version of the dataset. Each line contains features of a tweet instance. Features are separated by tab character ("\t"). The following list indicate the feature indices: 1. Tweet Id: Long. 2. Username: String. Encrypted for privacy issues. 3. Timestamp: Format ( "EEE MMM dd HH:mm:ss Z yyyy" ). 4. #Followers: Integer. 5. #Friends: Integer. 6. #Retweets: Integer. 7. #Favorites: Integer. 8. Entities: String. For each entity, we aggregated the original text, the annotated entity and the produced score from FEL library. Each entity is separated from another entity by char ";". Also, each entity is separated by char ":" in order to store "original_text:annotated_entity:score;". If FEL did not find any entities, we have stored "null;". 9. Sentiment: String. SentiStrength produces a score for positive (1 to 5) and negative (-1 to -5) sentiment. We splitted these two numbers by whitespace char " ". Positive sentiment was stored first and then negative sentiment (i.e. "2 -1"). 10. Mentions: String. If the tweet contains mentions, we remove the char "@" and concatenate the mentions with whitespace char " ". If no mentions appear, we have stored "null;". 11. Hashtags: String. If the tweet contains hashtags, we remove the char "#" and concatenate the hashtags with whitespace char " ". If no hashtags appear, we have stored "null;". 12....

Nicht verfügbar

TwitterSocial mediaText analysisDiscourseEpidemicContagious disease

Data collection period

01/01/2021 - 01/08/2022

Country

Time dimension

Nicht verfügbar

Analysis unit

Nicht verfügbar

Universe

Nicht verfügbar

Sampling procedure

Nicht verfügbar

Kind of data

Nicht verfügbar

Data collection mode

Web Scraping

Publisher

GESIS Datenarchiv für Sozialwissenschaften

Publication year

2022

Terms of data access

Freier Zugang (ohne Registrierung) - Die Forschungsdaten können von jedem direkt heruntergeladen werden. Data can only be used for non-commercial research

Nicht verfügbar

Study title

Creator

Study number / PID

Data access

Series

Abstract

Topics

Keywords

Methodology

Data collection period

Country

Time dimension

Analysis unit

Universe

Sampling procedure

Kind of data

Data collection mode

Access

Publisher

Publication year

Terms of data access

Related publications