The DH-LAB at the National Library of Norway can announce that we have an open-source optical character recognition (OCR) engine for North Saami in construction. North Saami is an under-resourced indigenous minority language recognized by the Norwegian State. The OCR engine is induced with the system Tesseract by the means of cross-lingual model transfer. When evaluating the model on a held-out portion of the ground truth, it reaches a bag-of-words F1 measure of 0.98 %. The OCR engine in question will be the first freely available OCR engine for North Saami.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
Series: ADHO (15)