Information Platform for Linked Data of Regional Historical Materials and its Agent Name Finding Process

paper, specified "short paper"
Authorship
  1. 1. Akihiro Kameda

    National Museum of Japanese History, Japan

  2. 2. Makoto Goto

    National Museum of Japanese History, Japan

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Overview

This presentation describes the construction of a system and the analysis and maintenance of data for the advanced use of the inventory of regional historical resources, especially using interactive annotation of agent names. We are driving a project for the inheritance and preservation of regional historical resources. In order to achieve the objectives of this project, we have developed a data infrastructure for advanced use of the inventories of regional historical resources. In particular, we aimed to create a system in which computers help people to discover information, rather than the conventional system in which people search and browse directly. Specifically, we resolved orthographical variants and integrated values, constructed identifiers and URIs, and described the provenance and components of resources. As a result, we were able to provide a Linked Data for regional historical resources, and we found the design of appropriate information infrastructure and its data generation process. In regional historical resources, there are many people and companies described. Some people can be associated with clans and positioned in family tree diagrams, other people are nameless and their detailed profiles are unknown. Those agent names and relationships among them in the archive of regional historical resources characterize the archive itself. If we only focus on some famous people already known in other documents, it is efficient to bring the dictionary of those names including alternative names and find those names in the archive. However, some names which are quite frequently used in the archive and not so much known in other documents are worth being analyzed and described. So, we extract the candidate names from the archive, list up the information of famous people from external resources such as encyclopedias of history, and extend the list as the archive specific directory based on frequency and co-occurrence analyses.

Platform: khirin-c

We have a series of systems named
khirin(Knowledgebase of Historical Resources in Institutes). The system described in this presentation is named
khirin-c, which collaborate with a IIIF based image presentation system
khirin-i.
Khirin-c stores four types of URIs: (1)dataset-wise, (2)type-wise, (3)accompanying information of type 1 , and (4)external URIs. We can import data from some format including the RDF 1.1 Turtle

https://www.w3.org/TR/turtle/
, which we most frequently use. Moreover,
khirin-c has web-based GUI interface and we can edit the data online. To set the relationships among the dataset and options of HTML rendering, we use GUI interface mainly. To edit the data, we do it offline so that we can maintain the data and track the changes and versions.

Agent Name Finding Process

In this presentation, we focus on describing how we organized information about people and organizations to support their reading and understanding. First, we extract candidate names using Entity Names Recognition software GiNZA

https://github.com/megagonlabs/ginza
. In the cleansing stage, for the data received in tabular form, we set tentative identifiers, clustered the values using OpenRefine

https://openrefine.org/
, and used Wikidata to expand and identify the information. To support the deciphering of the data, we provided some interfaces which can narrowed down the list by each person or organization related to each item and visualized the network based on the sender-recipient relation of letters.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO