Towards a Structured Description of the Taisho Tripitaka: How to show contributors

paper, specified "short paper"
  1. 1. Yoichiro Watanabe

    University of Tokyo

  2. 2. Kiyonori Nagasaki

    International Institute for Digital Humanities

  3. 3. Hyunjin Park

    Graduate school of University of Tokyo

  4. 4. Yifan Wang

    Graduate school of University of Tokyo, International Institute for Digital Humanities

  5. 5. Tomohiro Murase

    Council for the Promotion of Tripitaka Research

  6. 6. Masayoshi Watanabe

    Jodo Shu Research Institute

  7. 7. Norimichi Yajima

    Waseda University

  8. 8. Yoshihiro Sato

    Graduate school of University of Tokyo

  9. 9. Yui Sakuma

    Graduate school of University of Tokyo

  10. 10. Xinxing Yu

    Graduate school of University of Tokyo

  11. 11. Shumpei Katakura

    Graduate school of University of Tokyo

  12. 12. Masahiro Shimoda

    University of Tokyo

  13. 13. Ikki Ohmukai

    University of Tokyo

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

We, the SAT Daizōkyō Text Database Committee, maintain a database of the e-texts of the Buddhist scriptures contained in the Taisho Tripitaka, which contains 2,920 Buddhist scriptures and over 100 million Chinese characters, and have made them available to the public in searchable form [1]. However, these text data are based on tagged e-texts designed in 1990’s, and are not yet structured enough. For this reason, we are in the process of converting e-texts of the Taisho Tripitaka into the TEI-compliant format.
While undertaking such a large-scale renovation, it is imperative that the information in the Taisho-Tripitaka be reproduced in electronic form to the greatest extent possible without loss of information. This is because for more than one 100 years, the Taisho-Tripitaka has been the foundation for Buddhist studies to this day, and distributing e-texts that differs from the published Taisho-Tripitaka would cause unnecessary confusion in research. Today, it is rare for Buddhist scholars to conduct research without the use of e- texts, and the platform for research seems to be shifting from the Taisho-Tripitaka as published texts to the Taisho-Tripitaka as e-texts. However, the status of the Taisho-Tripitaka as a standard text for reference in Buddhist studies has not changed. It would be beneficial for researchers to have e-texts that are as accurate as possible so that the basis for discussion can be shared even when e-texts did not exist.
However, in some cases interpretation is unavoidable in markup. How to distinguish between the preservation of the edition itself and the contribution of the person in charge of the work is one of the most important issues for our project, and I will discuss this point in this presentation.
As a precedent work, CBETA (Chinese Buddhist Electronic Text Association) which publishes e-texts of Buddhist scriptures including Taisho-Tripitaka, has added its own punctuation marks which are not written in the Taisho-Tripitaka itself [2]. While these are certainly useful in many cases when reading the text, there might be a possibility that any arbitrary interpretation of the person who edited the e-text may enter uncritically.
Our research group has decided to adopt a markup policy that takes care to describe as much information as possible about how the text is actually written in the Taisho-Tripitaka, while making it possible to identify the parts that are the responsibility of the person who marked up the text. In addition, even if it is clearly a typographical error by an editor of the Taisho-Tripitaka, we try to retain the original description, such as , and , rather than correcting it without notification. I show an example [fig1], where the information of variant readings in the Taisho-Tripitaka is indicated. In general, lemma, i.e., in the should be the same as the string written in the text body. However, if we compare them, we will notice that the Chinese character “盖” in is slightly different from the character “蓋” in the text body. In fact, these two characters are just variants, not semantically different, but it should be noted that different characters are used here. In such a case, we mark it up as shown in this figure to describe the difference between the two characters.

[Fig 1] See [3]
In addition, if a text is structurally analyzed by previous research, it would be possible to markup the text to reflect the content of the research. In this case, tag in indicating the research used as the basis for the markup is provided, and the @resp attribute is also used to distinguish it from the original Taisho-Tripitaka structure such as paragraphing.

[Fig 2] See [4]
[Fig 2] is one Zen text. There are seemingly no line break or paragraph changes. However, Zen texts are well-structured in general and if read by a specialized researcher, the structure can be understood as I have shown in the red dashed line. We can mark up this part as following [Fig3] with @resp.

[Fig 3] Marked up text of [Fig2]
In this way, it is possible to save the descriptions of Taisho-Tripitaka and reflect the interpretations of the workers at the same time. In addition, our group has introduced GitHub, which allows us to keep detailed records of the actual work of workers and their contributions.

[1] The SAT Daizōkyō Text Database,

[2] CBETA (Chinese Buddhist Electronic Text Association),

Taishō Shinshū Daizōkyō, Junjiro Takakusu and Kaigyoku Watanabe (eds), Tokyo: Taisho Issaikyo Kankokai, 1924-1934, vol. 14, p. 48.

Taishō Shinshū Daizōkyō, Junjiro Takakusu and Kaigyoku Watanabe (eds), Tokyo: Taisho Issaikyo Kankokai, vol. 82, p. 344.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2022
"Responding to Asian Diversity"

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website:

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO