<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='/oai/static/oai2.xsl' ?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-17T10:34:37Z</responseDate>
  <request identifier="04b5bb3d7e744843fc22c204d860412d5f870e5ca688db265034544309d69467" metadataPrefix="oai_ddi25" verb="GetRecord">https://datacatalogue.cessda.eu/oai-pmh/v0/oai</request>
  <GetRecord>
    <record>
    <header>
      <identifier>04b5bb3d7e744843fc22c204d860412d5f870e5ca688db265034544309d69467</identifier>
      <datestamp>2025-06-17T03:15:23Z</datestamp>
      <setSpec>language:en</setSpec><setSpec>openaire_data</setSpec>
    </header>
      <metadata>
        <codeBook xmlns="ddi:codebook:2_5" version="2.5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
    <docDscr>
      <citation>
        <titlStmt>
          <titl xml:lang="en">DDI study level documentation for study 10.7802/2251 The 'Call me sexist but' Dataset (CMSB)</titl>
        </titlStmt>
        <prodStmt>
        </prodStmt>
        <holdings xml:lang="en" URI="https://search.gesis.org/research_data/SDN-10.7802-2251?lang=en"/><holdings xml:lang="de" URI="https://search.gesis.org/research_data/SDN-10.7802-2251?lang=de"/>
      </citation>
    </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="en">The 'Call me sexist but' Dataset (CMSB)</titl>
        <parTitl xml:lang="de">The 'Call me sexist but' Dataset (CMSB)</parTitl>
        <IDNo xml:lang="en" agency="GESIS">10.7802/2251</IDNo><IDNo xml:lang="de" agency="GESIS">10.7802/2251</IDNo><IDNo xml:lang="en" agency="DOI">10.7802/2251</IDNo><IDNo xml:lang="de" agency="DOI">10.7802/2251</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty affiliation="GESIS - Leibniz-Institut für Sozialwissenschaften" xml:lang="en">Samory, Mattia
        </AuthEnty><AuthEnty affiliation="GESIS - Leibniz-Institut für Sozialwissenschaften" xml:lang="de">Samory, Mattia
        </AuthEnty>
      </rspStmt>
      <prodStmt>
        <prodDate xml:lang="en"/>
      </prodStmt>
      <distStmt>
        <distrbtr abbr="GESIS" URI="http://www.gesis.org/" xml:lang="en">GESIS Data Archive for the Social Sciences</distrbtr><distrbtr abbr="GESIS" URI="http://www.gesis.org/" xml:lang="de">GESIS Datenarchiv für Sozialwissenschaften</distrbtr>
        <distDate xml:lang="en" date="2021"/><distDate xml:lang="de" date="2021"/>
      </distStmt>
      <verStmt>
        <version date="2021" xml:lang="en"/><version date="2021" xml:lang="de"/>
      </verStmt>
      <holdings xml:lang="en" URI="https://search.gesis.org/research_data/SDN-10.7802-2251?lang=en"/><holdings xml:lang="de" URI="https://search.gesis.org/research_data/SDN-10.7802-2251?lang=de"/>
    </citation>
    <stdyInfo>
      <subject>
      </subject>
      <abstract xml:lang="en">This dataset consists of three types of 'short-text' content:  &lt;br&gt; &lt;br&gt; 1. social media posts (tweets)  &lt;br&gt; 2. psychological survey items, and   &lt;br&gt; 3. synthetic adversarial modifications of the former two categories.   &lt;br&gt; &lt;br&gt; The tweet data can be further divided into 3 separate datasets based on their source:   &lt;br&gt; &lt;br&gt; 1.1 the hostile sexism dataset,  &lt;br&gt; 1.2 the benevolent sexism dataset, and   &lt;br&gt; 1.3 the callme sexism dataset.   &lt;br&gt; &lt;br&gt; 1.1 and 1.2 are pre-existing datasets obtained from Waseem, Z., &amp; Hovy, D. (2016) and Jha, A., &amp; Mamidi, R. (2017) that we re-annotated (see our paper and data statement for further information). The rationale for including these dataset specifically is that they feature a variety of sexist expressions in real conversational (social media) settings. In particular, they feature expressions that range from overtly antagonizing the minority gender through negative stereotypes (1.1) to leveraging positive stereotypes to subtly dismiss it as less-capable and fragile (1.2).  &lt;br&gt; &lt;br&gt; The callme sexism dataset (1.3) was collected by us based on the presence of the phrase 'call me sexist but' in tweets. The rationale behind this choice of query was that several Twitter users opine potentially sexist comments and signal so using the presence of this phrase, which arguably serves as a disclaimer for sexist opinions.   &lt;br&gt; &lt;br&gt; The survey items (2) pertain to attitudinal surveys that are designed to measure sexist attitudes and gender bias in participants. We provide a detailed account of our selection procedure in our paper.  &lt;br&gt; &lt;br&gt; Finally, the adversarial examples are generated by crowdworkers from Amazon Mechanical Turk by making minimal changes to tweets and scale items, in order to change sexist examples to non-sexist ones. We hope that these examples will help us control for typical confounds in non-sexist data (e.g., topic, civility) and lead to datasets with fewer biases, and consequently allow us to train more robust machine learning models. We only asked to turn sexist examples into non-sexist ones, and not vice versa, for ethical reasons.  &lt;br&gt; &lt;br&gt; The dataset is annotated to capture cases where text is sexist because of its content (what the speaker believes) or its phrasing (the speaker's choice of words). We explain the rationale for this codebook in our paper cited below.</abstract><abstract xml:lang="de">This dataset consists of three types of 'short-text' content:  &lt;br&gt; &lt;br&gt; 1. social media posts (tweets)  &lt;br&gt; 2. psychological survey items, and   &lt;br&gt; 3. synthetic adversarial modifications of the former two categories.   &lt;br&gt; &lt;br&gt; The tweet data can be further divided into 3 separate datasets based on their source:   &lt;br&gt; &lt;br&gt; 1.1 the hostile sexism dataset,  &lt;br&gt; 1.2 the benevolent sexism dataset, and   &lt;br&gt; 1.3 the callme sexism dataset.   &lt;br&gt; &lt;br&gt; 1.1 and 1.2 are pre-existing datasets obtained from Waseem, Z., &amp; Hovy, D. (2016) and Jha, A., &amp; Mamidi, R. (2017) that we re-annotated (see our paper and data statement for further information). The rationale for including these dataset specifically is that they feature a variety of sexist expressions in real conversational (social media) settings. In particular, they feature expressions that range from overtly antagonizing the minority gender through negative stereotypes (1.1) to leveraging positive stereotypes to subtly dismiss it as less-capable and fragile (1.2).  &lt;br&gt; &lt;br&gt; The callme sexism dataset (1.3) was collected by us based on the presence of the phrase 'call me sexist but' in tweets. The rationale behind this choice of query was that several Twitter users opine potentially sexist comments and signal so using the presence of this phrase, which arguably serves as a disclaimer for sexist opinions.   &lt;br&gt; &lt;br&gt; The survey items (2) pertain to attitudinal surveys that are designed to measure sexist attitudes and gender bias in participants. We provide a detailed account of our selection procedure in our paper.  &lt;br&gt; &lt;br&gt; Finally, the adversarial examples are generated by crowdworkers from Amazon Mechanical Turk by making minimal changes to tweets and scale items, in order to change sexist examples to non-sexist ones. We hope that these examples will help us control for typical confounds in non-sexist data (e.g., topic, civility) and lead to datasets with fewer biases, and consequently allow us to train more robust machine learning models. We only asked to turn sexist examples into non-sexist ones, and not vice versa, for ethical reasons.  &lt;br&gt; &lt;br&gt; The dataset is annotated to capture cases where text is sexist because of its content (what the speaker believes) or its phrasing (the speaker's choice of words). We explain the rationale for this codebook in our paper cited below.</abstract>
      <sumDscr>
        <universe xml:lang="en" clusion="I"/>
      </sumDscr>
    </stdyInfo>
    <method>
      <dataColl>
      </dataColl>
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Free access (with registration) - The research data can be downloaded by registered users.
CC BY-NC-SA 4.0: Namensnennung - Nicht kommerziell – Weitergabe unter gleichen Bedingungen  (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.de)</restrctn><restrctn xml:lang="de">Freier Zugang (mit Registrierung) - Die Forschungsdaten können von allen registrierten Nutzerinnen und Nutzern heruntergeladen werden.
CC BY-NC-SA 4.0: Namensnennung - Nicht kommerziell – Weitergabe unter gleichen Bedingungen  (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.de)</restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat>
    </othrStdyMat>
  </stdyDscr>
  <fileDscr>
  </fileDscr>
</codeBook>
      </metadata>
      <about>
        <provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
    <originDescription harvestDate="2025-06-17T03:15:23Z" altered="true">
      <baseURL>http://dbkapps.gesis.org/dbkoai/oai.asp</baseURL>
      <identifier>oai:dbk.gesis.org:SDN/10.7802_2251</identifier>
      <datestamp>2023-03-11</datestamp>
      <metadataNamespace>ddi:codebook:2_5</metadataNamespace>
    </originDescription>
</provenance>
      </about>
    </record>
  </GetRecord>
</OAI-PMH>