Text and Pictures in Japanese Historical Documents

poster / demo / art installation
Authorship
  1. 1. Takaaki Okamoto

    Ritsumeikan University

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Majoring in Japanese History, I have been conducting
research with its focus on handwriting in historical
documents. Through searching for documents
with the same handwriting from a great number of historical
documents, I collate documents written by the
same person, analyzing them with questions of not only
what is written, but also why this specific person wrote
this specific document, and why this particular text is
included in this specific textual body.
Since comparisons and analyses of handwritings reveal
what text-only analyses cannot, history and palaeography
have been arguing importance of the study of this
kind. Despite this argument, however, we have only seen
insufficient progress in the study of handwriting.Amajor
reason for this derives from the fact that researchers have
not been facilitated with a suitable environment to help
them conduct ever repeating tasks of not only searching
the same letters and characters in enormous amounts of
texts of enormous amounts of documents, but also comparing
them. This paper proposes a computer system to
facilitate such an environment for historians and its possible
applications for pictorial materials. Please note that
what I am aiming for here is not automatic identification
of handwriting by computer. Rather, the computer is to
help researchers’identification by organizing, searching,
and displaying data.
In order to identify somebody’s handwriting, first I
have to search letters and characters that appear both
in a ‘standard’ document written by that person and a
document to be compared with. Since this is a fairly exhaustive
research to look for all comparable letters and
characters, I use computers to sort out and organize information
about what kind of letters are located in what
place in which document, and based on this, I have been
doing research on methods to search and display characters
or strings of characters.
There are two kinds of information about where a character
is located in a document. One is logical information,
expressing information about the location by using
terms such as page, line, and column. You can find examples
of this kind in book indexes and the computer’s
full text search. The other is to point out the location visually. This is as if someone brings a book with him,
opening the page, and pointing out to you exactly the
place you are searching.
In the system I propose here, I separate characters in the
text from each other, putting them into relational database
in which each character is treated as one record.
For each character, I assign these two kinds of locus
information. One is its location in the logical structure
of the text—what number in terms of the page, the line,
and the column; and the other is the physical location as
expressed in the coordinates on the digital image. This
kind of assignment, manually conducted, results in not
only reconstructing text by assembling such characters,
but also specifying the location of a character in a digital
image of the document. When character data and image
data are linked through coordinates, we can create
a character catalogue by cutting characters out of document
images. We can also search for a character or combination
of those and highlight them in the digital image,
as if someone brings a book with him, opening the page,
and pointing out to you the exact character you want as
Fig. 1 shows.
I belong to the Japanese Culture Research Group a part
of Global COE (Center of Excellence) program Digital
Humanities Center for Japanese Arts and Cultures, Ritsumeikan
University, and the Group puts focus more on
ukiyo-e and other visual material than on textual ones
such as archival documents. For this reason, I am now
working on to systematize information on ‘what image
is where in what material’ in digitalized images that the
same university’s Art Research Center has been ever accumulating.
This system works like putting tags with
some notes on pictures, but by using computerized tags
rather than paper ones. So, what is the point of using the
computer here?
First of all, among the many merits of this procedure,
we can create other contents, based on every piece of information
about what is where. For instance, if we place
a mark on the publisher’s seal in an ukiyo-e print and
input its data, we can not only search for and display it
in the database, but also make a program that creates a
list of the publisher’s seals by cutting out the image parts
with the publisher’s seals and generate it in PDF or other
formats upon creating a layout that links the data inputs.
Secondly, using the computerized tags means that we
can show data on what is where in what material by using
URLs. Imagine the situation, in which one researcher
may want to inform another researcher of a part of a
picture in the collection of the Art Research Center archives.
He might attach the whole image or only a part of
it to his mail with detailed explanation. Instead of such
toils, when utilizing the data on the web, he would only
need to create the data on ‘what is where in what material’
and send the URL. The receiver would then access
the URL from his browser and inspect the picture with a
tag attached to it and read the notes.
Untill now computers have mainly been used in the humanities
as a means to create databases. Starting with
catalogues, now we can examine both full texts and images
of textual materials. While the catalogues and full
texts, and the catalogues and images, are respectively
linked, we have seen little progress in the linkage between
the full texts (or, in the case of pictorial materials,
data on various elements of the picture) and the images.
Since this system makes this possible and people can use
it personally, researchers can organize their research materials
of full texts and images which they have collected
by linking them. Besides such personal use, this system
can be developed to be a system for multiple users. I
believe that by systematizing data on ‘what is where in
what material,’ we can suggest further possibilities of
applying computers for the humanities, and that makes
significant contribution to not only study of handwriting,
history and paleography but also the humanities in
general.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2009

Hosted at University of Maryland, College Park

College Park, Maryland, United States

June 20, 2009 - June 25, 2009

176 works by 303 authors indexed

Series: ADHO (4)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None