A Database of Islamic Scientific Manuscripts — Challenges of Past and Future

paper, specified "short paper"
Authorship
  1. 1. Robert Casties

    Max Planck Institute for the History of Science / Institution Max Planck Institut für Wissenschaftsgeschichte

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


The ISMI project
The Islamic Scientific Manuscript Initiative (ISMI) project was founded in 2005 to make accessible information on all Islamic manuscripts in the exact sciences (astronomy, mathematics, optics, mathematical geography, music, mechanics, and related disciplines), whether in Arabic, Persian, Turkish, or other languages from the 9
th to the 19
th century
ISMI website:

The ISMI project limits itself to “scientific” manuscripts but it tries to encompass all such manuscripts worldwide regardless of their current location and it tries to record as much information about these manuscripts as available, including reader and ownership marks, annotations and illustrations, making it possible to learn more about structures and practices of knowledge in the islamicate world (Ragep et al., 2008).

The database
The database of the ISMI project is a cooperation project by the Max Planck Institute for the History of Science and the Institute of Islamic Studies at McGill University in Montreal. The database has been built up over more than ten years starting from an early personal database project by the involved scholars, extended by corrected information from catalog works like MAMS (Matvievskaya et al.,
1983
) and personal research by the scholars in the project and outside. It currently contains information about over 4700 texts in 15000 witnesses in 8000 codices and 2500 persons and an accompanying secondary bibliography of 2700 titles and it is constantly being extended.

The database development started in 2006 with a new data model based on the idea of a network of flexible objects and relations. Objects can have arbitrary attributes and the relations between objects are also like objects and can have attributes.

Part of current ISMI data model showing relations between text, witness, person and codex objects.

The basic objects in the data model are for example the TEXT which is abstract, the WITNESS which is a concrete material manuscript and the PERSON (real or imaginary). These objects are connected by relations like
is_exemplar_of which connects a text and its witnesses and
was_created_by which connects a text and a person as its author (see Figure 1). The same person can at the same time also be connected to other witnesses as a copyist or as a dedicatee. This very flexible data model was regularly modified and extended to accommodate changes and refinements that were developed in close cooperation with the scholars entering the data as their understanding of the source material and the technical possibilities of the system changed. Examples of theses unique additions are the possibility to record misattributions of authorship and misidentifications of witnesses in existing literature as well as documented reading events and changes of ownership.

This concept of a network of objects with flexible relations, an attribute-graph, exists today in database products like Neo4J
Neo4J:

but those were not available in 2006 which led to the development of a custom database called "OpenMind". The database software is Open Source, written in Java, uses a conventional SQL database backend and a Web-based frontend.

A first version of a public website presenting a limited set of 130 codices by the Staatsbibliothek Berlin with digitalizations was published in 2015.

Towards new standards
The current database system OpenMind was a custom development which was necessary at the time of its creation but has not aged well and burdens the future development of the project with limited flexibility and high maintenance for software development. The data model was also not created based on existing ontologies due to a lack of usable tools at the time. Both features were acceptable during the development of the project but they pose a problem to the continued maintenance of the project and the reusability of its data.
Currently both software and data are migrated to new standard tools in two phases:
In the first phase data is still entered in the legacy OpenMind backend but there is a new public web frontend based on the Drupal CMS that is fed by an XML export from the legacy backend. The XML data is also fed into a Neo4J graph database for additional queries and visualisations. This is the architecture for the beta launch in September 2018 and the public launch end of November 2018.
In the second phase the data model will be migrated to the CIDOC-CRM
CIDOC-CRM:

reference ontology using the FRBRoo
FRBRoo:

model and other extensions. All data is converted to RDF following the new data model and a frontend based on the ResearchSpace
ResearchSpace:

software and a triple store backend is created for data entry and specialized queries and visualisations. This process is currently under way.

The new ISMI website
The new public website presents data on 650 persons (selected chronologically following MAMS), 2300 texts, 6900 witnesses and related objects, representing authors from before 1350CE. The website will be public starting end of November 2018. Additional data publications are in preparation.
The new web frontend provides browsable lists of all major object types (persons, texts/works, witnesses, codices, places,…) as well as a general search and searches for specific object types. All objects on the pages are linked which makes it easy to get from a person to all their works and their witnesses as well as to the commentaries on the titles and their supercommentaries.
The search has a simple normalization for Arabic and a special normalization for romanized Arabic and is specially tuned to be very forgiving for differences in spelling especially for Arabic names. Feedback for the search and navigation during the beta test phase was very positive.
The website also shows currently 104 freely available digitized codices using the IIIF
IIIF:

image standard and the Diva.js
Diva.js:

viewer (see Figure 2). Most of the codices were scanned by the MPIWG in a cooperation with the Staatsbibliothek Berlin but some exemplars from the Gallica project of the Bibliothéque Nationale de France and the Qatar Digital Library are also present to demonstrate the potential of public IIIF image sources in an area that has been plagued in the past with proprietary data silos and restrictive access conditions making global electronic manuscript databases nearly impossible. We hope to expand the amount of scanned codices in the future.

Display of scanned manuscript (Codex Petermann I 671, Staatsbibliothek Berlin)

The experimental “ISMI Lab” section of the site offers access to the “Query Builder” tool which allows to construct custom queries to the database based on objects, attributes and relations and a full Neo4J graph database console with access to all published data (see Figure 3). These additional tools are very powerful but require some technical expertise and familiarity with the ISMI data model. There is some documentation but this section is more of an experimental offer to also get in contact with interested scholars in the hope that interesting queries and research questions can be exchanged and new, easier to use, tools can be developed in the future.

Experimental Neo4J console showing partial graph of commentary relations.

A never ending project?
The history of the project in the last ten years has shown the difficulties of developing and maintaining a project of this complexity – organisationally, in terms of hardware, software, and scholarly support. We think this project shows the potential for a unifying manuscript database that is not limited to singular collections and presents the continually updated and expanded current knowledge of scholars in the field. We hope that scholars in the future will not have to figure out errors in decades-old printed catalogues individually again and again but that they can participate in a common database and share and enhance their individual findings. The collaborative phase of the ISMI database is only beginning and we would like to start the discussion now. We think we have laid the technical foundations to make the database maintainable and adaptable and the data shareable and linkable but the long term value of a shared resource lies in its users and its contributors.

Bibliography
Ragep, Jamil F., and Sally P. Ragep.
The Islamic Scientific Manuscript Initiative (ISMI) Towards a Sociology of the Exact Sciences in Islam. In A Shared Legacy: Islamic Science East and West. Homage to Professor J. M. Millàs Vallicrosa, edited by Emilia Calvo, Mercè Comes, Roser Puig, and Monica Rius, 15–21. Barcelona: University of Barcelona, 2008.

G. P. Matvievskaya and B. A. Rosenfeld,
Matematiki i astronomi musulmanskogo srednevekovya i ikh trudi (VIII-XVII vv.) [Mathematicians and Astronomers of the Muslim Middle Ages and Their Works (VIII-XVII centuries)], 3 vols. (Moscow: Nauka, 1983), later extended as
Boris A. Rosenfeld and
Ekmeleddin İhsanoğlu
, Mathematicians, astronomers and other scholars of Islamic civilization and their works (7th-19th c.)

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.