Preparing multi-layered visualisations of Old Babylonian cuneiform tablets for a machine learning OCR training model towards automated sign recognition
Information Technology (it). Bd. itit-2023-0063. 2024 S. 1 - 15
Erscheinungsjahr: 2024
Publikationstyp: Zeitschriftenaufsatz
Sprache: Englisch
Doi/URN: 10.1515/itit-2023-0063
Geprüft | Bibliothek |
Inhaltszusammenfassung
In the framework of the CUNE-IIIF-ORM project the aim is to train an Artificial Intelligence Optical Character Recognition (AI-OCR) model that can automatically locate and identify cuneiform signs on photorealistic representations of Old Babylonian texts (c. 2000–1600 B.C.E.). In order to train the model, c. 200 documentary clay tablets have been selected. They are manually annotated by specialist cuneiformists on a set of 12 still raster images generated from interactive Multi-Light Reflecta...In the framework of the CUNE-IIIF-ORM project the aim is to train an Artificial Intelligence Optical Character Recognition (AI-OCR) model that can automatically locate and identify cuneiform signs on photorealistic representations of Old Babylonian texts (c. 2000–1600 B.C.E.). In order to train the model, c. 200 documentary clay tablets have been selected. They are manually annotated by specialist cuneiformists on a set of 12 still raster images generated from interactive Multi-Light Reflectance images. This image set includes visualisations with varying light angles and simplifications based on the dept information on the impressed signs in the surface. In the Cuneur Cuneiform Annotator, a Gitlab-based web application, the identified cuneiform signs are annotated with polygons and enriched with metadata. This methodology builds a qualitative annotated training corpus of approximately 20,000 cropped signs (i.e. 240,000 visualizations), all with their unicode codepoint and conventional sign name. It will act as a multi-layerd core dataset for the further development and fine-tuning of a machine learning OCR training model for the Old Babylonian cuneiform script. This paper discusses how the physical nature of handwritten inscribed Old Babylonian documentary clay tablets challenges the annotation and metadating task, and how these have been addressed within the CUNE-IIIF-ORM project to achieve an effective training corpus to support the training of a machine learning OCR model.» weiterlesen» einklappen
Autoren
Klassifikation
DFG Fachgebiet:
Alte Kulturen
DDC Sachgruppe:
Informatik