<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='/oai/static/oai2.xsl' ?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-21T11:08:55Z</responseDate>
  <request identifier="5051d5d83d0807fc171bd2635caeb9029905134e25125a5e635a3024d06a4773" metadataPrefix="oai_ddi25" verb="GetRecord">https://datacatalogue.cessda.eu/oai-pmh/v0/oai</request>
  <GetRecord>
    <record>
    <header>
      <identifier>5051d5d83d0807fc171bd2635caeb9029905134e25125a5e635a3024d06a4773</identifier>
      <datestamp>2025-06-17T03:25:49Z</datestamp>
      <setSpec>language:sv</setSpec><setSpec>openaire_data</setSpec>
    </header>
      <metadata>
        <codeBook xmlns="ddi:codebook:2_5" version="2.5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
    <docDscr>
      <citation>
        <titlStmt>
          <titl xml:lang="sv">Tokeniserad produktinformation för centralt godkända läkemedel inom EU (extraherad 2022-05-03)</titl>
        </titlStmt>
        <prodStmt>
          <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer><producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
        </prodStmt>
        <holdings xml:lang="en" URI="https://doi.org/10.57804/ggrw-hr06">Landing page</holdings>
      </citation>
    </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv">Tokeniserad produktinformation för centralt godkända läkemedel inom EU (extraherad 2022-05-03)</titl>
        <parTitl xml:lang="en">Tokenized product information for centrally approved medicines within EU (extracted May 3, 2022)</parTitl>
        <IDNo xml:lang="en" agency="SND">2022-157-1-1</IDNo><IDNo xml:lang="en" agency="DOI">https://doi.org/10.57804/ggrw-hr06</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty affiliation="Uppsala University" xml:lang="en">Westman, Gabriel
        </AuthEnty><AuthEnty affiliation="Uppsala universitet" xml:lang="sv">Westman, Gabriel
        </AuthEnty>
      </rspStmt>
      <prodStmt>
        <prodDate xml:lang="en"/>
      </prodStmt>
      <distStmt>
        <distrbtr abbr="SND" URI="https://snd.gu.se" xml:lang="en">Swedish National Data Service</distrbtr><distrbtr abbr="SND" URI="https://snd.gu.se" xml:lang="sv">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2022-09-29">2022-09-29</distDate>
      </distStmt>
      <verStmt>
      </verStmt>
      <holdings xml:lang="en" URI="https://doi.org/10.57804/ggrw-hr06">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject>
        <keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D007254">Information Science</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D007254">Informationsvetenskap</keyword><keyword xml:lang="en" vocab="ELSST" vocabURI="https://elsst.cessda.eu/id/da8c6947-2999-41bf-914b-9360015e85c6">linguistics</keyword><keyword xml:lang="sv" vocab="ELSST" vocabURI="https://elsst.cessda.eu/id/da8c6947-2999-41bf-914b-9360015e85c6">lingvistik</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D001185">Artificial Intelligence</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D001185">Artificiell intelligens</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D010604">Pharmacy</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D010604">Farmaci</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D006281">Health Occupations</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D006281">Vårdyrken</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D000465">Algorithms</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D000465">Algoritmer</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D003205">Computing Methodologies</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D003205">Dataanalys</keyword><keyword xml:lang="en" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D055641">Mathematical Concepts</keyword><keyword xml:lang="sv" vocab="MeSH" vocabURI="http://id.nlm.nih.gov/mesh/D055641">Matematiska begrepp</keyword>
        <topcClas xml:lang="en" vocab="Standard för svensk indelning av forskningsämnen 2011">Computer and Information Science</topcClas><topcClas xml:lang="sv" vocab="Standard för svensk indelning av forskningsämnen 2011">Data- och informationsvetenskap (Datateknik)</topcClas><topcClas xml:lang="en" vocab="Standard för svensk indelning av forskningsämnen 2011">Basic Medicine</topcClas><topcClas xml:lang="sv" vocab="Standard för svensk indelning av forskningsämnen 2011">Medicinska och farmaceutiska grundvetenskaper</topcClas><topcClas xml:lang="en" vocab="Standard för svensk indelning av forskningsämnen 2011">Natural Sciences</topcClas><topcClas xml:lang="sv" vocab="Standard för svensk indelning av forskningsämnen 2011">Naturvetenskap</topcClas><topcClas xml:lang="en" vocab="Standard för svensk indelning av forskningsämnen 2011">Medical and Health Sciences</topcClas><topcClas xml:lang="sv" vocab="Standard för svensk indelning av forskningsämnen 2011">Medicin och hälsovetenskap</topcClas>
      </subject>
      <abstract xml:lang="en">The text corpus was compiled on May 3, 2022, by scripted downloading of all available English language product information files for all centrally approved medicinal products within the EU, from the European Medicines Agency website. Package Leaflet (PL) and Summary of product characteristics (SmPC) documents for each medicinal product, excluding multiplicate documents for medicinal products with more than one strength or pharmaceutical preparation, were used. The PDF files were scraped using the pdfplumber version 0.6.1 package in Python 3.8.10 to extract all text except page numbering, headers, and footers.  Line breaks and special characters (excluding punctuation characters) were removed, and punctuation was added to sentences where this was missing (such as headings) to avoid false aggregation. All paragraphs were tokenized on a sentence level using the Natural Language Toolkit (NLTK) version 3.7 tokenizer  This database contains sentence-level tokenized product infomation from all centrally approved medicinal products within the EU (May 3, 2022) including Summary of product characteristics (SmPC) and Package leaflet (PL) documents.  A total of 1258 medicinal products were initially included, of which 5 were subsequently excluded due to document compatibility issues. From these, a total of 783 K sentences were extracted from PL and SmPC documents.</abstract><abstract xml:lang="sv">Tokeniserad produktinformation för centralt godkända läkemedel inom EU. Se engelsk beskrivning för detaljer om hur data kompilerats och bearbetats.  Databasen innehåller tokeniserad produktinformation på meningsnivå, extraherad från alla centralt godkända läkemedel inom EU (2022-05-03). Se engelskspråkig beskrivning för ytterligare detaljer.</abstract>
      <sumDscr>
      </sumDscr>
    </stdyInfo>
    <method>
      <dataColl>
      </dataColl>
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through SND. Data are freely accessible.</restrctn><restrctn xml:lang="sv">Åtkomst till data via SND. Data är fritt tillgängliga.</restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat>
    </othrStdyMat>
  </stdyDscr>
  <fileDscr>
  </fileDscr>
</codeBook>
      </metadata>
      <about>
        <provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd">
    <originDescription harvestDate="2025-06-17T03:25:49Z" altered="true">
      <baseURL>https://snd.gu.se</baseURL>
      <identifier>2022-157-1</identifier>
      <datestamp>2022-10-25T14:30:13Z</datestamp>
      <metadataNamespace>ddi:codebook:2_5</metadataNamespace>
    </originDescription>
</provenance>
      </about>
    </record>
  </GetRecord>
</OAI-PMH>