Research Center for Humanities and Social Sciences - Academia Sinica, Department of Computer Science and Information Engineering - National Central University
Department of Computer Science and Information Engineering - National Central University
Research Center for Humanities and Social Sciences - Academia Sinica
Institute of History and Philology - Academia Sinica
One important task of historical research in DH is to identify person names from history texts. This task can be divided into two subtasks: person named entity recognition (PNER) and person named entity disambiguation (PNED). PNED is to link each PNE mention to a specific person profile in the reference knowledge base. The main challenge of machine-learning-based PNED is the lack of annotated data. We design an automatic approach to labeling the training data. We choose the Ming Shilu as our target history texts. We use the Ming-Qing Archives Name Authority Database as our reference knowledge base, which contains 14,070 government officials living in Ming dynasty. Our BERT-based model reaches an accuracy of 90.1%, which proves that our approach can generate labeled data for the PNED task of very high quality on Chinese history texts. For the general situation (including trivial instances), the accuracy is even higher (~98%).
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
In review
Hosted at Carleton University, Université d'Ottawa (University of Ottawa)
Ottawa, Ontario, Canada
July 20, 2020 - July 25, 2020
475 works by 1078 authors indexed
Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.
Conference website: https://dh2020.adho.org/
References: https://dh2020.adho.org/abstracts/
Series: ADHO (15)
Organizers: ADHO