Short description of the project

At the Digital Humanities Centre, we created the first Hungarian HTR model. The correspondence of József Kiss, a Hungarian poet and newspaper editor and valuable character of Hungarian literary history served as the basis of our project. The training data contains his professional and personal letters that we continuously publish online at: https://dhupla.hu/collection/kiss-jozsef-levelezes. This particular poster demonstrates a case study that reviewed and tested two basic strategies that Transkribus can offer: the first one is the use of the Base Model, and the second one is to add the material as Ground Truth. The two procedures have different mechanisms and therefore produce different results. Models with the Base Model delivered good results in fewer iterations. After 150 epochs, they are overlearning, and the error rate increases. Base models can significantly improve a relatively small mixed handwriting corpus. However, if the Base Model is buried too deep, its efficiency decreases. The models with Ground Truth, on the contrary, need more iterations. These produced worse results at the beginning, and only after 150 epochs did the values start to fall below 10%. In mixed handwriting correspondence incorporating new material as a GT has better results in the long run.

About the corpus

About the HTR model

https://readcoop.eu/model/hungarian/