Keyness in Computational Literary Studies: History, Definitions and Evaluation

Christof Schöch
With Keli Du, Julia Dudar, Julia Röttgermann, Julian Schröter
(Trier University, Germany)

Untangling Associations: Advances in keyword and collocation analysis
Université Paul Valéry, Montpellier, 22 Sept. 2023.


  • # 1 – History: Keyness in CLS
  • # 2 – Definitions: Keyness / Distinctiveness
  • # 3 – Evaluation: Classification Task
  • # 4 – Conclusion: Findings and Outlook

History: Keyness in CLS

Burrows’ Zeta in Authorship Attribution

  • The key publication is: John Burrows, “All the way through” (Burrows 2007)
  • He proposed to use Zeta in the context of authorship attribution
  • Zeta is calculated as the difference of the document frequencies of a feature in two contrasting sets of documents, where the documents are segments of full texts.1
  • Zeta: “A simple measure of [an author’s] consistency in the use of each word-type.” (=> dispersion)
  • Focuses “on a single author and seek[s] to identify which of many texts are most likely to be his or hers.” (=> authorship attribution)

Zeta for Authorship Attribution: Shakespeare (Craig and Kinney 2009)

Further uses and discussion of Keyness

  • Uses and discussion of Zeta:
  • Uses of other keyness measures:
    • using Antconc (log-likelihood)
    • or TXM (spécificité)

Keyness for gender (Weidman and O’Sullivan 2018)

Scatterplot of segments by male and female authors, by percentage of markers and anti-markers, for three literary periods.

Keyness for Genre (Schöch 2018)

PCA plot using 50 Zeta-based keywords. Comedies in red, tragedies in blue, tragi-comedies in green.

Zeta and Company (Schöch et al. 2018)

  • Project funded by the German Research Foundation (DFG, 2020-2023)
  • Domain of application: popular subgenres of the 20th-century French novel
  • Inspirations: John Burrows (Burrows 2007), Jeffrey Lijffijt (Lijffijt et al. 2014), MOTIFS (e.g. Kraif and Tutin 2017), Phraséorom (e.g. Diwersy et al. 2021), dispersion (Gries 2008)
  • Fundamental aim: Enable scholars in CLS to make educated choices about what keyness measure to use
  • Also: Bridge the gap between CCL and CLS
  • Activities: modeling, implementing, evaluating and using statistical measures of comparison of two groups of texts.

Definitions: Keyness / Distinctiveness

Traditional definition of keyness

  • Purely quantitative sense: A keyword is “a word which occurs with unusual frequency […] [in a document or corpus] by comparison with a reference corpus”. (Scott 1997)

What is Distinctiveness? (Schröter et al. 2021)

  • (A) Logical vs. statistical sense
    • Purely logical: A feature is distinctive of corpus A if its presence in a document D is a necessary and sufficient condition for D to belong to A and not to B.
    • Statistical: A feature is distinctive of corpus A if it is true that, the higher its keyness in document D, the higher the probability that D is an instance of A and not of B.
  • (B) Salient vs. agnostic
    • Salient: A feature is distinctive iff it is noticed by readers (for confirming or violating their expectations)
    • Agnostic: A feature can be distinctive without being salient in the above-mentioned sense.
  • (C) Qualitative vs. no qualitative content
    • Qualitative content: A feature is distinctive iff it expresses e.g. aboutness or stylistic character (=> interpretability)
    • No qualitative content: A feature can be key regardless of qualitative content (=> discriminatory power)

Measures in Zeta and Company (Du et al. 2022)

Evaluation: Classification Task

Evaluation Task: Genre Classification (Du, Dudar, and Schöch 2022)

  • Downstream classification task: “How reliably can a machine learning classifier, based on words identified using a given measure of distinctiveness, identify the subgenre of a novel when provided only with a short segment of that novel?”
  • Basic setup
    • 4 classifiers
    • Different numbers of keywords (N)
    • Textual units are 5000-word segments
    • 10-fold-cross validation (90/10 split of segments)
    • Baseline: random selection of N words

Results #1 (Du, Dudar, and Schöch 2022)

Classification performance on the French corpus (1980s) with four classifiers, depending on the measure of distinctiveness and the setting of 𝑁.

Results #2 (Du, Dudar, and Schöch 2022)

Distribution of classification performance on the 1980s French corpus with N = 10 using Multinomial Naive Bayes

Conclusion: Findings and Outloook

What have we found out so far?

  • Definition
    • Keyness or distinctiveness, as a concept, can be defined in different ways
    • A match between a certain understanding of distinctiveness and a specific statistical operationalization can be established using a suitable method of evaluation.
  • Evaluation
    • Dispersion-based keyness measures show best performance in a subgenre classification task, especially when the number of features is small
    • Such measures also tend to select medium-frequency words that are highly-interpretable (=> salient, qualitative)

What are the next steps?

  • Perform further experiments, using synthetic texts and test tokens with pre-determined frequency- and/or dispersion-based contrasts
  • Perform an application study that aims to match keywords to generic traits derived from research on popular subgenres (qualitative reference for qualitative understanding of distinctiveness)
  • Add measures: dispersion + LLR (Egbert and Biber 2019), measure based on DPnofreq (Gries 2021), LRC (Evert 2022)
  • Move on to more complex features: multi-word expressions and semantic features (next project phase, 2024-2026)
  • Find a strategy for how to handle a multi-dimensional approach to keyness (multiple meanings, multiple measures), e.g. along the lines proposed in (Gries 2019)

