Towards Computational Comparative
Literary Studies

Adressing the Challenges of Multilingualism




Christof Schöch
(Trier University, Germany)

KEASTWEST Conference 2024
Dongguk University, Seoul, South Korea

25 May 2024

Introduction

Multilingualism and me

Computational Comparative Literary Studies?

  • What could CCLS be?
    • Literary Studies
    • but: Computational (using digital data and methods)
    • and: Comparative (transnational, transmedial)
  • Many challenges for conversion
    • requires multiple areas of expertise
    • significant challenges of multilingualism

Three attempts at CCLS

  1. Corpus Building: The Diversity Paradox
  2. Data Modeling: Linked Open Data
  3. Text Analysis: Multilingual Stylometry

(1) Corpus Building:
The Diversity Paradox







The COST Action ‘Distant Reading for European Literary History’

The ‘European Literary Text Collection’ (ELTeC)

A closer look: corpus composition in ELTeC

          English ELTeC corpus

          Romanian ELTeC corpus

The Diversity Paradox

  • ELTeC design goals: enable meaningful cross-language investigations
    • Balance with respect to key text characteristics
      (text length, author gender, prestige)
    • Inclusivity with respect to language-based literary traditions
  • Consequence: the ‘diversity paradox’
    • If the criteria are too loose, balance is compromimsed
      (many, but invalid, corpora)
    • If the criteria are too strict, inclusivity is compromised
      (valid, but few, corpora)
    • In both cases, meaningful cross-language investigations are impossible

(2) Data Modeling:
Linked Open Data







The project ‘Mining and Modeling Text’

Linked Open Data: Simple Statements

Linked Open Data: Multilingualism

MiMoText Base: Query for themes in novels

(3) Text Analysis:
Multilingual Stylometry







High-profile cases of stylometric authorship attribution

William Shakespeare:
Craig and Kinney (2009)

Molière and Corneille:
Cafiero and Camps (2019)

Elena Ferrante:
Tuzzi and Cortelazzo (2018)

Galbraith / Rowling:
Juola (2015)

Multilingual stylometry?


 translation ↦
↧ original
fra eng hun ukr
fra fra-fra fra-eng fra-hun fra-ukr
eng eng-fra eng-eng eng-hun eng-ukr
hun hun-fra hun-eng hun-hun hun-ukr
ukr ukr-fra ukr-eng ukr-hun ukr-ukr


  • Using corpora from the European Literary Text Collection (ELTeC)
  • Translated entirely into the other languages using DeepL

Some first results

More information: Dudar et al. (in progress).

Full interactive showcase

Conclusion

Take-home message

  • Good, multilingual corpora are rare and hard to build
  • Linked Open Data is a huge opportunity for multilingual data modeling
  • Text analysis is still primarily multi-lingual rather than cross-lingual
    (but multilingual LLMs are in the process of changing that)

Lessons learned

  • Multilingual research is multicultural research
  • ‘Computational Comparative Literary Studies’ requires multiple competencies
  • Nobody can learn everything: we need interdisciplinary collaboration
  • Let’s learn from each other: Computational and Comparative Literary Studies




Thank you for your kind attention!

References

Cafiero, Florian, and Jean-Baptiste Camps. 2019. “Why Molière Most Likely Did Write His Plays.” Science Advances 5 (11): eaax5489. https://doi.org/10.1126/sciadv.aax5489.
Craig, Hugh, and Arthur F. Kinney. 2009. Shakespeare, Computers, and the Mystery of Authorship. Cambridge University Press.
Dudar, Julia, Evgeniia Fileva, Artjoms Šeļa, and Christof Schöch. in progress. “Multilingual Stylometry: The Influence of Corpus Composition and Language on the Performance of Authorship Attribution Using Corpora from the European Literary Text Collection (ELTeC).” Tbc, in progress.
Juola, Patrick. 2015. “The Rowling Case: A Proposed Standard Protocol for Authorship Attribution.” Digital Scholarship in the Humanities 30 (suppl. 1): 100–113. https://doi.org/10.1093/llc/fqv040.
Schöch, Christof, Maria Hinzmann, Julia Röttgermann, Katharina Dietz, and Anne Klee. 2022. “Smart Modelling for Literary History.” International Journal of Humanities and Arts Computing 16 (1): 78–93. https://doi.org/10.3366/ijhac.2022.0278.
Schöch, Christof, Roxana Patras, Tomaž Erjavec, and Diana Santos. 2021. “Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives.” Modern Languages Open, no. 1: 25. https://doi.org/10.3828/mlo.v0i0.364.
Tuzzi, Arjuna, and Michele A. Cortelazzo, eds. 2018. Drawing Elena Ferrante’s Profile: Workshop Proceedings. Padova: Padova UP.