Libération des données

An Open Science Approach to the Curation and Analysis of the XVIIIe siècle: bibliographie



Christof Schöch
Trier Center for Digital Humanities, Germany

Digital Humanities Conference 2024
Washington, DC, USA – George Mason University

08 Aug 2024

Introduction

Overview

  • The XVIIIe siècle: bibliographie
  • An Open Science approach
  • Data conversion
  • Descriptive analysis
  • Patterns of collaboration
  • Limitations and next steps

The XVIIIe siècle: bibliographie

What is the XVIIIe siècle: bibliographie?

  • Published by Canadian scholar Benoît Melançon
  • Covers research on the Eighteenth Century
  • Appears in monthly instalments since 1992 (!)
  • Now includes 72.000 references
  • All data for 1992-2022 was published in bulk in early 2023

Compared to MLA International Bibliography



BIB18 MLA
items, 1992-2022 64.400 102.000
items, 1900-2024 72.200 180.000
items in French 74% ~10%
items in English 21% ~70%

An open science approach

Why Open Science

  • Open Science is science done right
  • Making data freely available, but not in standard formats, is going only half the way
  • The goals don’t justify the means 🤷

How is this Open Science?

  • Open data: Data, code, visuals in Github repository
  • Open source: Only freely-available software
  • Open process: Only algorithmic approaches – no Gephi 😢
  • Open access: Published using Jupyter Notebooks in Quarto

Data conversion

Data formats involved

  • Benoît Melançon
    • Entry of data via web form
    • Managed in a FilaMaker DB
    • Monthly exports to HTML for the web display
    • 2023: Bulk export to CSV
  • My work
    • CSV to BibTeX using Python for import into Zotero
    • Manual and semi-automatic corrections
    • Exports from Zotero: BibTeX, XML-RDF, BibJSON

Descriptive analysis

Number of entries per year

Site de Benoît Melançon

Prevalence of publication types

Most frequently-mentioned scholars

Most prevalent languages

Most prevalent publishers

Words frequently mentioned in titles

Patterns of collaboration

Co-authors: percentages and network

Co-editors: percentages and network

Limitations

Some limitations

  • Data curation: self-selective curation: biases?
  • Data model (Zotero):
    • Multiple chapters in edited volume not modeled as “part-of” relationship
    • Monograph vs. edited volume vs. scholarly edition
    • Gender of authors and editors not modeled explicitly
  • Zotero as a tool: unwieldy for collection of 64k items
  • Data quality: duplicates, languages, missing abstracts

Many thanks! 😺




References


Melançon, Benoît. 1992-2024. XVIIIe siècle: bibliographie.” Site de Benoît Melançon, 1992-2024. https://benoitmelancon.quebec/xviii/biblio.tdm.html.
———. 2023. “Libération des données.” Site de Benoît Melançon. https://benoitmelancon.quebec/xviii/donnees_biblios_1_550.html.
Tennant, Jon. 2019. “Open Science: Just Science Done Right?” Figshare. https://figshare.com/articles/presentation/Open_Science_Just_science_done_right_/9759353.