Towards Computational Comparative Literary Studies

Adressing the Challenges of Multilingualism




Christof Schöch
(Trier University, Germany)

KEASTWEST Conference 2024
Dongguk University, Seoul, South Korea

25 May 2024

Introduction

Multilingualism and me

Computational Comparative Literary Studies?

  • Literary Studies,
  • but: Computational
  • and: Comparative

Overview

  • Challenges of Multilingualism
    • Corpus Building: The Diversity Paradox
    • Data Modeling: Linked Open Data
    • Text Analysis: Multilingual Stylometry
  • Conclusion

(1) Corpus Building:
The Diversity Paradox







The COST Action ‘Distant Reading for European Literary History’

The ‘European Literary Text Collection’ (ELTeC)

A closer look: corpus composition in ELTeC

          ELTeC English

          ELTeC Romanian

The Diversity Paradox

  • ELTeC design goals: enable meaningful cross-language investigations
    • Balance with respect to key text characteristics (text length, author gender, prestige)
    • Inclusivity with respect to language-based literary traditions
  • Consequence: the ‘diversity paradox’
    • If the criteria are too loose, balance is compromimsed (many, but invalid, corpora)
    • If the criteria are too strict, inclusivity is compromised (valid, but few, corpora)
    • In both cases, meaningful cross-language investigations are impossible

(2) Data Modeling:
Linked Open Data







The project ‘Mining and Modeling Text’

Linked Open Data: Simple Statements (S-P-O)

MiMoText Base: Query for themes in novels

(3) Text Analysis:
Multilingual Stylometry







High-profile stylometry cases

William Shakespeare:
Craig and Kinney (2009)

Molière and Corneille:
Cafiero and Camps (2019)

Elena Ferrante:
Tuzzi and Cortelazzo (2018)

Galbraith / Rowling:
Juola (2015)

Multilingual stylometry?

Some early results

More information: Dudar et al. (in progress).

Conclusion

Summary of findings

  • Good, multilingual corpora are rare (and hard to build)
  • Linked Open Data is a huge opportunity for multilingual data modeling
  • Text analysis is still primarily multi-lingual rather than cross-lingual

Lessons learned

  • Multilingual research is multicultural research
  • Computational comparative Literary Studies requires multiple competencies
  • Nobody can learn everything: we need interdisciplinary collaboration
  • Let’s learn from each other: CLS from CL, and CL from CLS




Thank you for your kind attention!

References

Cafiero, Florian, and Jean-Baptiste Camps. 2019. “Why Molière Most Likely Did Write His Plays.” Science Advances 5 (11): eaax5489. https://doi.org/10.1126/sciadv.aax5489.
Craig, Hugh, and Arthur F. Kinney. 2009. Shakespeare, Computers, and the Mystery of Authorship. Cambridge University Press.
Dudar, Julia, Evgeniia Fileva, Artjoms Šeļa, and Christof Schöch. in progress. “Multilingual Stylometry: The Influence of Corpus Composition and Language on the Performance of Authorship Attribution Using Corpora from the European Literary Text Collection (ELTeC).” Tbc, in progress.
Juola, Patrick. 2015. “The Rowling Case: A Proposed Standard Protocol for Authorship Attribution.” Digital Scholarship in the Humanities 30 (suppl. 1): 100–113. https://doi.org/10.1093/llc/fqv040.
Schöch, Christof, Maria Hinzmann, Julia Röttgermann, Katharina Dietz, and Anne Klee. 2022. “Smart Modelling for Literary History.” International Journal of Humanities and Arts Computing 16 (1): 78–93. https://doi.org/10.3366/ijhac.2022.0278.
Schöch, Christof, Roxana Patras, Tomaž Erjavec, and Diana Santos. 2021. “Creating the European Literary Text Collection (ELTeC): Challenges and Perspectives.” Modern Languages Open, no. 1: 25. https://doi.org/10.3828/mlo.v0i0.364.
Tuzzi, Arjuna, and Michele A. Cortelazzo, eds. 2018. Drawing Elena Ferrante’s Profile: Workshop Proceedings. Padova: Padova UP.