Log Analysis Method towards Understanding Detailed IIIF Image Usage

paper, specified "long paper"
Authorship
  1. 1. Chifumi Nishioka

    Kyoto University

  2. 2. Kiyonori Nagasaki

    University of Tokyo

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
It is important for libraries and museums to analyze and understand how digital collections and their contents have been used for many reasons, e.g., accountability for stakeholders. Results of analysis can be used to improve digital collections (Hughes, 2011). The usage analysis is conducted along with two steps: (1) Selection of a measurement: A measurement that suits for the purpose of the usage analysis is chosen. Then, the measurement is calculated based on data such as server logs. So far, measurements such as the number of accesses to materials (e.g., manuscripts) and images have been widely employed (Jones et al., 2000). (2) Visualization of the result: The result of the usage analysis is visualized to facilitate users to understand. Charts (e.g., line charts) have been used.
In these years, a lot of libraries and museums have adopted IIIF (International Image Interoperability Framework) (Snydman et al., 2015), which promotes mutual use of images. IIIF defines a couple of APIs to enable interoperable use. In IIIF-compatible digital collections, images are fetched via IIIF Image API, whose syntax is defined as:
{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}
Every time an image is zoomed and panned on an image viewer, the image is called via IIIF Image APIs with varying values of the region. Thus, it is possible to investigate the detailed image usage by examining which regions have been requested. In this paper, we show a method to analyze image usage and to visualize the analysis result. Specifically, we employ the number of accesses to each pixel as a measurement and visualize by heat maps. Since a pixel is the smallest unit of an image, we enable a fine-grained analysis different from previous studies (Warwick et al., 2008; Jones et al., 2000).

Method
This section describes how to measure and visualize the detailed usage of images on IIIF-compatible digital collections (Section 2.1) and how to display the visualized result (Section 2.2).

Measurement and visualization
The method is comprised of following two steps:

Measure the number of accesses to each pixel: For each image, an H×W matrix, where all elements are 0, is generated. H and W are height and width of the image in pixel. Each element of a matrix corresponds to each pixel. The height and width of images are retrieved by info.json provided by IIIF Image API. Subsequently, requested images and regions are acquired by parsing logs of IIIF Image API. Based on requested regions, the number of accesses to each pixel is counted and recorded to the matrices.
Generate heat maps: After counting the number of accesses to each pixel, the result is outputted as a heat map. The RGB value of each pixel is calculated considering the minimum and maximum values of the number of accesses to pixel in an image.

In order to save the memory for the analysis, we count the number of accesses in the unit of N×N pixels, instead of counting the number of accesses to each pixel. In addition, we output heat maps in (H/N)×(W/N) pixels, instead of the same size with the target image, in order to save the storage.

Display of heat maps
Generated heat maps are displayed over corresponding target images, in order to enable users (e.g., administrator of a digital collection) to understand image usage. IIIF Presentation API enables overlay images by specifying two images on a page. Mirador, a popular viewer among IIIF community, implements a function of overlay display, as shown in Figure 1. One can manipulate visibility and opacity for each image in the left side panel.

Figure 1. Overlay display of a heat map on its target image using Mirador. Photograph courtesy of the Main Library, Kyoto University -

Konjaku monogatarishuu

Example and Improvement
This section illustrates examples of analysis results and improvements for the analysis method.

Analysis considering probabilities to be accessed
Figure 2 illustrates a typical heat map that represents image usage. The number of accesses close to the center is higher than others. This tendency has been observed in other images. The reason is that pixels close to the center have a higher probability to be accessed. In order to treat each pixel equitably, it is necessary to adjust the number of accesses according to the probability.

Figure 2. Typical heat map. Photograph courtesy of the Main Library, Kyoto University –

Yashiki-zu (design drawing of a mansion) from Nakai Collection

We compute the probability to be accessed for a pixel that is a and b pixels from the midpoint of each side of the image as:

Then, let c(w,h,a,b) be the number of accesses to a pixel that is a and b pixels from the midpoint of each side of the image. Then, the number of accesses can be adjusted as:

is the probability to be accessed for the center of the image divided by the probability to be accessed for the point, which is a and b pixels from the midpoint of each side. We take logarithm in order to mitigate influence from the adjustment. As a result, the number of accesses for Figure 2 is adjusted as shown in Figure 3.

Figure 3. Heat map where probabilities to be accessed are considered for Figure 2

Referrer of images
As exemplified in Figure 4, we observe images, in which accesses are concentrated in specific regions. Looking into referrers of access logs, it turns out that these regions are referenced by
IIIF Curation Platform. Since IIIF enables mutual use, regions and images have more opportunities to be referenced from other platforms. Referrers show motivation and background behind accesses. If the web site that the referrer indicates is completely disclosed, it is possible to present a link to the web site on a viewer as annotations. So that users can discover regions and images that are highly relevant.

Figure 4. Example where specific regions get many accesses. Photograph courtesy of the Main Library, Kyoto University -

The story of Benkei, a tragic warrior

Applications
This section lists possible applications of the result of the usage analysis.

Collaborative research platform: The data model used in IIIF follows
Web Annotation Data Model. Therefore, IIIF facilitates to share not only images but also information accompanying images (e.g., annotations such as transcriptions). For this reason, IIIF-compatible collaborative research platforms have been developed (Sato and Ota, 2017). By presenting heat maps, researchers can understand which regions have not been examined by collaborators and work for these regions.

Transcription Platform: In the decades, a lot of transcription projects have been launched (Terras, 2016). Transcribers zoom and pan images during generating transcriptions. If a platform is compatible with IIIF, it is possible to verify a pattern such as whether there is a difference in transcription performance (e.g., accuracy) between regions being zoomed and those being not zoomed. If there is a pattern, we can use it to facilitate verification process for transcriptions.

Selection of thumbnails: In many cases, images on the first page of materials are used as thumbnails. However, the first image does not necessarily represent the material. We may select the most-viewed region of images in the material as a thumbnail, which can be revealed by the analysis method.

Challenges
Visualization of access logs is not a problem, if anonymization is conducted. However, anonymization does not make sense in some cases. For instance, for images that are only accessed by researchers in a specific field, colleagues can easily guess who accessed images and regions. Even if anonymization is complete, a series of activities of a researcher on images might reveal his/her viewpoint that would be a key issue of his/her academic outcome. In this case, his/her priority right of discovery may be infringed. Therefore, careful consideration will be necessary to exploit the analysis result as a service. In the future, we would like to consider some guidelines based on the above-mentioned challenges as well as existing policies such as privacy protection.

Bibliography

Hughes, L.
(2011).
Evaluating and Measuring the Value, Use and Impact of Digital Collections
. Facet.

Jones, S., Cunningham, S. J., McNab, R. and Boddie, S.
(2000). A transaction log analysis of a digital library.
International Journal on Digital Libraries
,
3
(2): 152–69 doi:10.1007/s007999900022.

Sato M. and Ota I.
(2017). Collaboration system based on crowdsourcing with Mirador - Proposal of a system to support analysis and theory in collaborative research of humanities.
SIG Technical Reports Computer and Humanities
,
2017
-
CH
-
114
(7): 1–6.

Snydman, S., Sanderson, R. and Cramer, T.
(2015). The International Image Interoperability Framework (IIIF): A community & technology approach for web-based images.
Proceedings of the Archiving Conference
. Society for Imaging Science and Technology, pp. 16–21.

Terras, M.
(2016). Crowdsourcing in the Digital Humanities.
A New Companion to Digital Humanities
: 420–39.

Warwick, C., Terras, M., Huntington, P. and Pappa, N.
(2008). If You Build It Will They Come? The LAIRAH Study: Quantifying the Use of Online Resources in the Arts and Humanities through Statistical Analysis of User Log Data.
Literary and Linguistic Computing
,
23
(1): 85–102 doi:10.1093/llc/fqm045.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

In review

ADHO - 2019
"Complexities"

Hosted at Utrecht University

Utrecht, Netherlands

July 9, 2019 - July 12, 2019

436 works by 1162 authors indexed

Series: ADHO (14)

Organizers: ADHO