Digital Approaches to Name Disambiguation of Chinese Historical Figures

lightning talk
Authorship
  1. 1. Lik Hang Tsui

    City University of Hong Kong

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

When integrating biographical data extracted from 2,000+ local gazetteers into the China Biographical Database (CBDB), we need to identify and link records of the same person--the act of “disambiguating” them. Traditional Chinese naming customs pose big challenges to this, especially for the gazetteer dataset containing 0.12 million records and 90k unique names of imperial government officials. Also, useful variables are missing in numerous entries in these gazetteers. My presentation analyzes solutions to disambiguating identical personal names in Chinese script. First, we identified the individuals who repeatedly took official posts in the same locality. Then, we cross-tabulated the overlap of content in multiple gazetteers. Finally, we corroborated the remaining data with external datasets e.g. CGED-Q of the Lee-Campbell research group. Through doing so we have disambiguated 51k personal names with optimal precision. Such task is only possible if done digitally. The techniques explored in this study will also be useful for disambiguation and Named Entity Recognition of other large-scale unstructured data in non-Latin script.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2020
"carrefours / intersections"

Hosted at Carleton University, Université d'Ottawa (University of Ottawa)

Ottawa, Ontario, Canada

July 20, 2020 - July 25, 2020

475 works by 1078 authors indexed

Conference cancelled due to coronavirus. Online conference held at https://hcommons.org/groups/dh2020/. Data for this conference were initially prepared and cleaned by May Ning.

Conference website: https://dh2020.adho.org/

References: https://dh2020.adho.org/abstracts/

Series: ADHO (15)

Organizers: ADHO