Artificial Intelligence Large Language Models
in Digital Humanities



Christof Schöch
Trier Center for Digital Humanities
Trier University, Germany

CODHES-2024
Conference on Digital Humanities
and Environmental Sustainability

31 Oct 2024

Introduction







My background

  • My Education
    • Master’s Degree: French, English, Psychology (Freiburg U)
    • Ph.D.: Eighteenth-Century French Literature (Kassel and Paris)
    • Post Doc: Scholarly Digital Editions, Research Infrastructrure, Text and Data Mining (Würzburg U)
  • Professor of Digital Humanities (2017–, Trier U)
    • Primary field: Computational Literary Studies
    • Co-editor of the Journal of Computational Literary Studies (JCLS)
    • Co-director, Trier Center for Digital Humanities
    • Coordinator of a Master of Science in Digital Humanities
    • President of the Alliance of Digital Humanities Organizations (ADHO, 2023-2024)

Overview

  1. What do we mean when we say “AI”?
  2. AI LLMs in DH today
  3. Use cases of AI LLMs in DH
    1. Multilingual stylometry
    2. Direct speech recognition
    3. Historical wine labels
  4. Conclusion: affordances and concerns

What do we mean when we say AI?

AI between hype and doom

AI > ML > DL (e.g. Chollet)

CS > ML > DL > LLMs | AI

AI LLMs in DH today

Progress is impressive

LLMs are now everywhere in DH

  • Papers at many venues routinely use them
    • Digital Humanities Conference
    • Computational Humanities Research
    • Journal of Computational Literary Studies
  • Wide range of issue are addressed using LLMs
    • NLP tasks supporting DH research
    • Complex annotation tasks
    • Text encoding / information extraction
    • Text and image analysis
    • Support when programming

But there are also problematic issues

  • Explainability for ML/AI
  • Reproducibility / Open Science
  • Sustainability

Use Case: Multilingual Stylometry








High-profile cases of stylometric authorship attribution

William Shakespeare:
Craig and Kinney (2009)

Molière and Corneille:
Cafiero and Camps (2019)

Elena Ferrante:
Tuzzi and Cortelazzo (2018)

Galbraith / Rowling:
Juola (2015)

Multilingual stylometry?

 translation ↦
↧ original
eng fra ukr hun
eng eng-eng eng-fra eng-ukr eng-hun
fra fra-eng fra-fra fra-ukr fra-hun
ukr ukr-eng ukr-fra ukr-ukr ukr-hun
hun hun-eng hun-fra hun-ukr hun-hun


  • Using corpora from the European Literary Text Collection (ELTeC)
  • 4 different languages, 30 novels each, translation into the other three languages
  • 360 novels to translate => use DeepL to perform machine translation

Multilingual Stylometry: Overview

Lessons learned

  • Affordances
    • Using DeepL was a game-changer for this research
    • Interesting new insights into stylometric methods
    • Impossible without using LLM-based machine translation
  • Limitations
    • The translations are only of “acceptable” quality
    • The main part of the research is just “normal” ML
    • Each translation pair performed by one “agent”: unrealistic

Use Case: Direct Speech Recognition








A typical CLS problem: Speech and Thought Representation

  • Fundamental distinction for analysis of fictional narrative
    • Narrator vs. character speech (and thoughts)
    • Direct, indirect, free indirect character speech
  • For many reasons, it might be relevant to separate the two
    • How much has the author given an individual voice to each character?
    • Is there a correlation between subgenres and proportion of character speech?

Our first attemps at this (2016)

  • Corpus of French novels ca. 1840 to 1920
  • Focus on direct character speech (sentence-level)
  • Traditional ML approach
    • Feature engineering: 80 features!
    • Train and test a classifier (SVM)
    • Resulting F1-Score: 0.924

LLM-based approach (2021)

  • Würzburg/Mannheim team
  • Corpus of German narrative texts
  • Focus on various types of character speech (token-level)
  • LLM-based approach
    • Pre-trained LLMs
      (BERT and Flair)
    • Fine-tuning with extensive training data
    • Resulting F1-Scores: ~0.87

Approach using SetFit (2024)

  • Focus on direct character speech (sentence-based)
  • Using SetFit (Sentence Transformers Fine-Tuning)
    • BERT-based sentence transformers
    • Few-shot learning: small amount of training data
    • Resulting F1-Scores: 0.89-0.93 (depending on scenario)

Lessons learned

  • Progress is indeed impressive!
    • Not so much in performance (remains similar)
    • But in time/effort needed: days instead of months/years
    • Notably: no hand-crafted features required anymore
  • A big concern: Explainability
    • Insight into direct speech through feature weights
    • Trivial in a traditional ML approach
    • Much more involved in a DL/LLM approach

Use Case: Historical Wine Labels








Collections of historical wine labels

Knowledge Graph about the wine labels

  • Knowledge Graph using Wikibase (Linked Open Data paradigm)
  • One focus: mapping of vineyards, wine makers, villages, labels
  • Another focus: Objects depicted on the labels
  • Goal: Use LLMs to annotate our collection and feed the KG

First tests with ChatGPT (Nov. 2023)

  • ChatGPT 4 with ADA plugin
  • Prompt: “What objects, things, animals, text and design elements to you recognize in this image?”
  • Very impressive OCR and object detection
  • Some interesting inferences (“likely the Mosel”)
  • But: unstructured prose, difficult to work with

Using a Custom GPT (Oct. 2024, ChatGPT 4o)

  • Changes
    • Detailed instructions (more precise prompt)
    • Controlled vocabulary of depicted objects (~100 items)
    • JSON output as default
  • Improvements
    • Less conversational, more useful results
    • Items mentioned are part of the vocabulary
  • Persisting issues
    • Instructions applied inconsistently
    • QIDs are incorrect
    • And: our results are not reproducible

Lessons learned

  • Affordances
    • Compared to systems from about 5 years ago, progress is impressive
    • Accessibility is very high: Easy to create Custom GPT
    • A good way to feed information into our Wikibase
  • Limitations
    • Constraining ChatGPT to formalized knowledge is hard
    • Obtaining the same performance with open, desktop solutions is hard
  • Way forward
    • Use ChatGPT for free prose descriptions of labels
    • Use traditional “object detection” to feed the KG

Conclusion: Avantages and Challenges of LLMs in DH

Advantages: Impressive progress!

  • Higher performance in some tasks
  • Lower development time on other tasks
  • No hand-crafted features required anymore
  • Better representation of meaning in context
  • Context spans will continue to grow: sentences, paragraphs, novels
  • Promising approach:
    • DH: Using LLMs for Information Extraction that feeds into Knowledge Graphs;
    • NLP: Use Knowledge Graphs (e.g. via KG embeddings) to feed factual knowledge into LLMs and improve IE

Challenges

  • Fairness: Very high hardware requirements (= high cost)
  • Sustainability: Very high energy needs (use LLM only when needed)
  • Explainability: Much harder in DL/LLMs than in ML
  • Open Science: Most LLMs not open source, not reproducible, not transparent

Final words: learn from the past

  • Big data in DH (~2010)
  • Topic Modeling in DH (~2012)
  • Word Embeddings in DH (~2015)
  • ChatGPT in DH (~2023)

Many thanks! 😺




References


Brunner, Annelen, Ngoc Duyen Tanja Tu, Lukas Weimer, and Fotis Jannidis. 2020. “To BERT or Not to BERT-Comparing Contextual Embeddings in a Deep Learning Architecture for the Automatic Recognition of Four Types of Speech, Thought and Writing Representation.” In SwissText/KONVENS. https://www.academia.edu/download/101289662/paper5.pdf.
Cafiero, Florian, and Jean-Baptiste Camps. 2019. “Why Molière Most Likely Did Write His Plays.” Science Advances 5 (11): eaax5489. https://doi.org/10.1126/sciadv.aax5489.
Craig, Hugh, and Arthur F. Kinney. 2009. Shakespeare, Computers, and the Mystery of Authorship. Cambridge University Press.
Juola, Patrick. 2015. “The Rowling Case: A Proposed Standard Protocol for Authorship Attribution.” Digital Scholarship in the Humanities 30 (suppl. 1): 100–113. https://doi.org/10.1093/llc/fqv040.
Schöch, Christof, Julia Dudar, Evgeniia Fileva, and Artjoms Šeļa. 2024. “Multilingual Stylometry: The Influence of Language on the Performance of Authorship Attribution Using Corpora from the European Literary Text Collection.” In Proceedings of the Computational Humanities Research Conference 2024. Aarhus: CEUR. https://doi.org/10.5281/zenodo.13995613.
Schöch, Christof, Daniel Schlör, Stefanie Popp, Annelen Brunner, Ulrike Henny, and José Calvo Tello. 2016. “Straight Talk! Automatic Recognition of Direct Speech in Nineteenth-Century French Novels.” In Digital Humanities: Conference Abstracts, 346–53. Kraków: Jagiellonian University & Pedagogical University. http://dh2016.adho.org/abstracts/31.
Tuzzi, Arjuna, and Michele A. Cortelazzo, eds. 2018. Drawing Elena Ferrante’s Profile: Workshop Proceedings. Padova: Padova UP.
Weis, Joëlle, and Christof Schöch. 2024. “Vom Perler Hasenberg Zur Lehmener WürzlayWeinetiketten Digital Erschließen.” In Digital Ist Besser? Sammlungsforschung Im Digitalen Zeitalter, edited by Katharina Günther and Stefan Alschner. Göttingen: Wallstein. https://www.wallstein-verlag.de/9783835356153-002.html.