Computational Literary Studies: History, Practices, Opportunities, Requirements and Challenges




Prof. Dr. Christof Schöch
Trier University, Germany

Institute for Georgian Literature – Tbilisi State University – Georgia

13 Mar 2025

Introduction / Overview

Overview

  • Context: Digital Humanities
  • Computational Literary Studies: History, Methods, Projects
  • Computational Literary Studies: Requirements, Benefits, Challenges
  • Conclusion

Context: Digital Humanities

My Profile

  • French Studies (.e.g La Description double)
  • Scholarly Digital Editing (e.g. Essai sur le récit)
  • Computational Literary Studies (e.g. CLiGS group)
  • Digital Humanities (at Trier University)

Digital Humanities at Trier University

  • Kompetenzzentrum – Trier Center for Digital Humanities (since 1998)
  • Department for Computational Linguistics and Digital Humanities (since 2012)
  • Digital History (several domains, e.g. Medieval Jewish History, Enlightenment)

The Trier Center for Digital Humanities

  • Founded in 1998, now a central research unit
  • About 35 team members, 30 parallel projects
  • Three main research areas, many topics

Digital Humanities in German-speaking areas

  • DHd-Verband (scholarly association, since 2012)
  • Annual DHd conference (500-600 participants, since 2014)
  • NFDI consortia (Culture, History, Objects, Text)
  • Professorships, centers, study programs

Computational Literary Studies: History, Methods, Projects

Origins: Stylometric Authorship Attribution

  • Mendenhall, “The Characteristic Curves of Composition”, 1887
  • Mosteller and Wallace, The Federalist Papers, 1963/1964
  • John Burrows, “Delta”, 2002
  • Maciej Eder, Jan Rybicki, Mike Kestemont: “stylo” for R (2012–)

Diversification: A Plurality of Methods

  • Stylometric Authorship Attribution
  • Stylometry beyond authorship (genre, period, gender)
  • Topic Modeling (thematic analysis)
  • Sentiment Analysis (positive / negative attitude)
  • Network Analysis (for drama, correspondences, novels)
  • Contrastive Analysis (keyness)
  • Character Speech Recognition (in novels)
  • Literary Mapping / spatial analysis
  • Linked Open Data
  • Machine Learning (clustering, classification)
  • Large Language Models (BERT and GPTs)

Opening up paths…

Consolidation: Computational Literary Studies

Principles and Opportunities of CLS (and DH)

  • Open Science (e.g. Survey of Methods, 2023)
  • Decanonization (e.g. Corpus for Beyond Words)
  • Multilingualism (e.g. ELTeC: 21 languages)
  • Sustainability (e.g. LOD and TEI in Mining and Modeling Text)
  • Exploration (e.g. wine label project)

Computational Literary Studies: Requirements and Challenges

Digitization and Corpus Building

  • With digital methods, we can scale up analysis, but…
  • No analysis is possible without digital corpora
  • More texts is good, more metadata is even better
  • Challenges:
    • Digitize the canon and the archive
    • OCR models for historical languages
    • Metadata scarcity
    • Diversity paradox
    • Sustainability (formats, accessibility)

Challenge: The Diversity Paradox in ELTeC

English ELTeC corpus

Romanian ELTeC corpus

Tools and Methods

  • Most relevant tools are readily available, but…
  • Challenges:
    • Competent use of tools requires training (e.g. stylo)
    • Some tools require language-specific NLP resources (tokenization, lemmatisation, POS-tagging, BERT models, etc.; cf. Topic Modeling)
    • Tools and methods need to be trust-worthy: evaluation (e.g. Beyond words)

Infrastructure (in a wide sense)

  • Infrastructure is much more than just tools
    • Corpora with metadata
    • Tools and research environments (such as FuD)
    • Competency building capacity (e.g. workshops)
    • Community building capacity (see below)
  • Challenges
    • Resources
    • Time / patience

Steps in community building

  1. “Stammtisch” (regular social meeting)
  2. Network (e.g. mailing list, Signal group, etc.)
  3. Events (e.g. lectures, training)
  4. Center (consolidation, locally)
  5. Association (national scope)

Conclusion

Conclusion

  • Computational Literary Studies is a well-established area of DH
  • Today, it uses a diverse set of methods and tools
  • There is a number of opportunities and challenges when establishing CLS for Georgian Literature (to be discussed…)



Thank you! | დიდი მადლობა [didi madɫoba]

References

Dalen-Oskam, Karina van. 2023. The Riddle of Literary Quality: A Computational Approach. Amsterdam: Amsterdam University Press.
Jockers, Matthew L. 2013. Macroanalysis: Digital Methods and Literary History. University of Illinois Press.
Paige, Nicholas D. 2020. Technologies of the Novel: Quantitative Data and the Evolution of Literary Systems. New York: Cambridge University Press.
Underwood, Ted. 2019. Distant Horizons: Digital Evidence and Literary Change. Chicago: The University of Chicago Press.