How to Transcribe a Million Manuscripts with eScriptorium
Lecture by Peter Stokes
Given online by the Schoenberg Institute for Manuscript Studies on December 1, 2023
Abstract: Recent advances in machine learning combined with the availability of millions of images of manuscript pages means that we are now able to produce automatic transcriptions of medieval and other manuscripts, with over 99% accuracy in the right circumstances. This is extremely promising and opens up many new possibilities, but – as with any new approach – it naturally raises challenges and questions as well. Perhaps the first question is how we can best make use of this opportunity, in other words, how to read a million manuscripts. At the same time, machine learning and other “big data” approaches also raise questions about representation, since by definition they only work for scripts and languages that are already available in large quantities, whereas rare or historical languages that have fewer resources become all the more ignored.
This talk will address these questions in the context of kraken and eScriptorium, a pair of tools for automatic transcription of handwritten and printed documents especially for rare and historical scripts, led by the Digital Humanities team in the lab “Archéologie et Philologie d’Orient et d’Occident” at the École Pratique des Hautes Études – Université PSL, in Paris.
Peter Stokes is a research professor in digital and computational humanities applied to historical writing at Université PSL in Paris.
How to Transcribe a Million Manuscripts with eScriptorium
Lecture by Peter Stokes
Given online by the Schoenberg Institute for Manuscript Studies on December 1, 2023
Abstract: Recent advances in machine learning combined with the availability of millions of images of manuscript pages means that we are now able to produce automatic transcriptions of medieval and other manuscripts, with over 99% accuracy in the right circumstances. This is extremely promising and opens up many new possibilities, but – as with any new approach – it naturally raises challenges and questions as well. Perhaps the first question is how we can best make use of this opportunity, in other words, how to read a million manuscripts. At the same time, machine learning and other “big data” approaches also raise questions about representation, since by definition they only work for scripts and languages that are already available in large quantities, whereas rare or historical languages that have fewer resources become all the more ignored.
This talk will address these questions in the context of kraken and eScriptorium, a pair of tools for automatic transcription of handwritten and printed documents especially for rare and historical scripts, led by the Digital Humanities team in the lab “Archéologie et Philologie d’Orient et d’Occident” at the École Pratique des Hautes Études – Université PSL, in Paris.
Peter Stokes is a research professor in digital and computational humanities applied to historical writing at Université PSL in Paris.
Click here to view the eScriptorium website
Top Image: Frauenfeld, Kantonsbibliothek Thurgau, Y 24, f. 17 – Burgundian Breviary (https://www.e-codices.ch/en/list/one/kbt/y024)
Subscribe to Medievalverse
Related Posts