Starten Sie Ihre Suche...


Durch die Nutzung unserer Webseite erklären Sie sich damit einverstanden, dass wir Cookies verwenden. Weitere Informationen

Preparing multi-layered visualisations of Old Babylonian cuneiform tablets for a machine learning OCR training model towards automated sign recognition

Information Technology (it). Bd. itit-2023-0063. 2024 S. 1 - 15

Erscheinungsjahr: 2024

Publikationstyp: Zeitschriftenaufsatz

Sprache: Englisch

Doi/URN: 10.1515/itit-2023-0063

Volltext über DOI/URN

GeprüftBibliothek

Inhaltszusammenfassung


In the framework of the CUNE-IIIF-ORM project the aim is to train an Artificial Intelligence Optical Character Recognition (AI-OCR) model that can automatically locate and identify cuneiform signs on photorealistic representations of Old Babylonian texts (c. 2000–1600 B.C.E.). In order to train the model, c. 200 documentary clay tablets have been selected. They are manually annotated by specialist cuneiformists on a set of 12 still raster images generated from interactive Multi-Light Reflecta...In the framework of the CUNE-IIIF-ORM project the aim is to train an Artificial Intelligence Optical Character Recognition (AI-OCR) model that can automatically locate and identify cuneiform signs on photorealistic representations of Old Babylonian texts (c. 2000–1600 B.C.E.). In order to train the model, c. 200 documentary clay tablets have been selected. They are manually annotated by specialist cuneiformists on a set of 12 still raster images generated from interactive Multi-Light Reflectance images. This image set includes visualisations with varying light angles and simplifications based on the dept information on the impressed signs in the surface. In the Cuneur Cuneiform Annotator, a Gitlab-based web application, the identified cuneiform signs are annotated with polygons and enriched with metadata. This methodology builds a qualitative annotated training corpus of approximately 20,000 cropped signs (i.e. 240,000 visualizations), all with their unicode codepoint and conventional sign name. It will act as a multi-layerd core dataset for the further development and fine-tuning of a machine learning OCR training model for the Old Babylonian cuneiform script. This paper discusses how the physical nature of handwritten inscribed Old Babylonian documentary clay tablets challenges the annotation and metadating task, and how these have been addressed within the CUNE-IIIF-ORM project to achieve an effective training corpus to support the training of a machine learning OCR model.» weiterlesen» einklappen

  • handwritten text recognition
  • optical character recognition
  • data reuse
  • machine learning training data
  • cuneiform
  • old babylonian

Autoren


Hendrik, Hameeuw (Autor)
De Graef, Katrien (Autor)
Ryberg Smidt, Gustav (Autor)
Goddeeris, Anne (Autor)
Kumar Thirukokaranam Chandrasekar, Krishna (Autor)

Klassifikation


DFG Fachgebiet:
Alte Kulturen

DDC Sachgruppe:
Informatik

Verbundene Forschungsprojekte


Verknüpfte Personen