2026-05-01T04:49:45Z https://datacatalogue.cessda.eu/oai-pmh/v0/oai

8e14841648a020ab4d1e49d00114ef59225c2d668f1f261abbc9534b87c34692 2025-09-29T01:09:34Z language:enopenaire_data

ParlaCAP: Dataset for tracking political agenda-setting across European parliaments ParlaCAP: Dataset for tracking political agenda-setting across European parliaments doi:10.23669/1ZTELP Ljubešić, Nikola Rupnik, Peter Kuzman Pungeršek, Taja Porupski, Ivan Mochtak, Michal Dinić, Vuk Širinić, Daniela Kopp, Matyáš Erjavec, Tomaž 101129751CLARIN.SI CROSSDACroatian Social Science Data Archive Social Sciencesparliamentary debatessentiment analysistopic classificationPARLIAMENTMEMBERS OF PARLIAMENTLINGUISTIC ANALYSIS Government, political systems and organisations The ParlaCAP dataset consists of 8 million speeches from 28 European national and regional parliaments, with each speech coded with the sentiment expressed (<a href="https://aclanthology.org/2024.lrec-main.1393/">ParlaSent</a> coding from negative, over neutral, to positive) and the topic discussed (<a href="https://www.comparativeagendas.net/pages/master-codebook">Comparative Agendas Project</a> coding with 22 topics), and rich metadata on the speakers, parties and democracies. The dataset is an extension of the <a href="https://hdl.handle.net/11356/2004">ParlaMint 5.0</a> dataset, which was primarily focused on the transcripts of parliamentary speeches and their metadata. The ParlaCAP dataset extends the ParlaMint dataset via the “text as data” paradigm by automatically coding topics and sentiment for each speech, simplifying the data to a tabular form, and thereby empowering social science research on agenda setting and negativity in political discourse across a broad set of parliaments. For automatic coding, multilingual transformer models were used, with the <a href="https://huggingface.co/classla/ParlaCAP-Topic-Classifier">ParlaCAP</a> model for topic, and the <a href="https://huggingface.co/classla/xlm-r-parlasent">ParlaSent</a> model for sentiment. 2020-01-152025-07-08 AustriaBosnia and HerzegovinaBelgiumBulgariaCzech RepublicDenmarkEstoniaSpainSpainSpainSpainFranceUnited KingdomGreeceCroatiaHungaryIcelandItalyLatviaNetherlandsNorwayPolandPortugalSerbiaSwedenSloveniaTurkeyUkraine Basque countryGaliciaCatalonia Media unit: Text Members of parliamentMembers of governmentGuest speakers in parliament TextNumeric Total universe/Complete enumeration Automated data extraction: Web scrapingAutomated data extraction: Database queryContent coding Programming script Erjavec, T., Kopp, M., Ljubešić, N. et al. (2025). ParlaMint II: advancing comparable parliamentary corpora across Europe. Lang Resources & Evaluation 59, 2071–2102. https://doi.org/10.1007/s10579-024-09798-w 10.1007/s10579-024-09798-w Mochtak, M., Rupnik, P., Kuzman, T., & Ljubešić, N. (2025). Parlasent: mapping sentiment in political discourse with large language models. Political Research Exchange, 7(1). https://doi.org/10.1080/2474736X.2025.2508377 10.1080/2474736X.2025.2508377 Mochtak, M., Rupnik, P., and Ljubešić, N. (2024). The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 16024-16036). Torino, Italia. ELRA and ICCL. https://doi.org/10.48550/arXiv.2309.09783 10.48550/arXiv.2309.09783 Kuzman, T., & Ljubešić, N. (2025). LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification. IEEE Access. https://doi.org/10.1109/ACCESS.2025.3544814 10.1109/ACCESS.2025.3544814 https://data.crossda.hr/oai doi:10.23669/1ZTELP 2025-09-27T02:00:00Z ddi:codebook:2_5