Université de Fribourg
Université de Fribourg
Université de Fribourg
Université de Fribourg
Université de Fribourg
DIVADIAWI - A Web-based Interface for Semi-automatic Labeling of Historical Document Images
Wei
Hao
University of Fribourg, Switzerland
hao.wei@unifr.ch
Chen
Kai
University of Fribourg, Switzerland
kai.chen@unifr.ch
Seuret
Mathias
University of Fribourg, Switzerland
mathias.seuret@unifr.ch
Würsch
Marcel
University of Fribourg, Switzerland
marcel.wuersch@unifr.ch
Liwicki
Marcus
University of Fribourg, Switzerland
marcus.eichenberger-liwicki@unifr.ch
Ingold
Rolf
University of Fribourg, Switzerland
rolf.ingold@unifr.ch
2014-12-19T13:50:00Z
Paul Arthur, University of Western Sidney
Locked Bag 1797
Penrith NSW 2751
Australia
Paul Arthur
Converted from a Word document
DHConvalidator
Paper
Long Paper
Historical document images
web interface
semi-automatic labeling
image processing
historical studies
interface and user experience design
xml
internet / world wide web
visualisation
English
According to the latest panel discussions at the Historical Image Processing (HIP 2013)
1 and Document Analysis Systems (DAS 2014)
2 conferences, historical documents are among the most challenging types of documents for automatic processing. This is due to the vast variety of challenges they pose to document image analysis (DIA) systems. In the pipeline of automatic DIA, layout analysis is an important prerequisite for further stages, such as optical character recognition and information retrieval. Layout analysis aims at splitting a document image into regions of interest and especially distinguishing text lines from other regions.
For automatic layout analysis methods, an essential prerequisite is the ground truth (GT), i.e., existing labels (text line, decoration, etc.) annotated by human experts for the corresponding regions. An example of historical document image
3 and its ground truth are shown in Figure 1. The importance of GT is twofold: first, automatic methods could learn knowledge represented by GT from the experts, and then predict the layout of unseen images. Second, it is used to assess the performance of an automatic method by the accuracy of the prediction of the method against the GT. However, the generation of GT is time-consuming.
We propose a novel web-based interface called
D
IVA
DIAWI, in order to assist the user to semi-automatically generate ground truth for large numbers of historical documents. The name
D
IVA
DIAWI stands for a Web-based Interface for Document Image Analysis of DIVA group (Document, Image and Voice Analysis).
4
D
IVA
DIAWI incorporates two parts: automatic processing and manual editing. In the automatic processing part, the system automatically draws polygons representing text lines for which manual generation of GT is quite time-consuming. In parallel or sequentially, the user can manually edit the polygons by modifying them or directly drawing new polygons.
D
IVA
DIAWI greatly accelerates the generation of GT. It is robust to generate GT for historical document images of a diverse nature.
(a)
(b)
Figure 1. (a) An image example and (b) its ground truth. In (b), the cyan, blue, red, green, and purple polygons represent the regional contours of the page, text block, text lines, decorations, and comments, respectively.
State of the Art
In recent years, several tools for GT generation have been developed. Aletheia (Clausner et al., 2011) is an advanced system to generate GT for printed materials such as newspapers. It aids the user with a top-down method using split and shrink tools as well as a bottom-up method to group elements together. WebGT (Biller et al., 2013) is a web-based system for GT generation for historical document images. It provides a semi-automatic strategy enabling instant interaction between the user and the system. The Java-based
D
IVA
DIA tool (Chen et al., 2015) was developed by our group. It provides a user-friendly user interface to generate GT. The tool has been proven effective and flexible by performing a real task of GT generation. Other state-of-the-art tools include AGORA (Ramel et al., 2006), PixLabeler (Saund et al., 2009), TRUEVIZ (Lee et al., 2003), and GEDI (Doermann et al., 2010).
Compared to the state of the art, the novelty of
D
IVA
DIAWI is twofold. First, the high level of automation greatly reduces the human effort. The user needs to do only a few simple modifications after the automatic processing. Second,
D
IVA
DIAWI doesn’t require the user to install any software. It works on all modern browsers, e.g., Firefox, Chrome, and Internet Explorer.
System Overview
Ground Truth Representation
As it is shown in Figure 1(b), we define five types of regions of interest:
text line, text block, decoration, comment, and
page (Chen et al., 2015). The
text line is the type of region where the main text is written. The
text block incorporates text lines and document background between text lines. The
decoration represents decorative elements such as figures, drop capitals, and decorative initials. The
comment represents the annotations and inserted text in the margins. Finally the
page outlines only the document part within the scanned image. The contours of the five regions are represented by polygons. All information—including the polygons, the document name, and the name of the author who generated the GT—are saved in XML files of the PAGE format (Pletschacher and Antonacopoulos, 2010), a widely used image representation framework for layout analysis. Furthermore, the GT also has the potential to be exported to the widely used TEI-P5 format for manuscript description
5
System Workflow
D
IVA
DIAWI is now publicly accessible at http://diuf.unifr.ch/diva/divadiawi/. A screenshot of
D
IVA
DIAWI is shown in Figure 2.
D
IVA
DIAWI integrates automatic processing and manual editing. The workflow is illustrated in Figure 3 and detailed below.
Among the five region types, the regions of text lines cost most of the working time for GT generation.
6 The automatic processing of
D
IVA
DIAWI is developed to accelerate the GT generation specifically for text lines. The user needs to manually draw a rectangular polygon representing a text block that contains some text lines. As soon as the rectangular polygon is drawn, the automatic processing starts. We use Gabor filters, widely used for text extraction (Jain and Farrokhnia, 1991; Raju et al., 2004), to detect regions of text lines from the rectangle. After being detected, text lines are represented by polygons and are then drawn. The automatic processing performs well on our datasets. Most of the regions of text lines are properly outlined. However, there are still some minor mistakes. In small areas where two adjacent text lines intersect or are very close to each other, the two text lines are wrongly grouped into a single line in some cases. In addition, for some strokes lightly written near the boundary of a text line, automatic processing fails to detect them. Thus manual modification follows.
The manual editing has two modes: manual drawing and manual modification. Since automatic processing specifically deals with text lines, the user needs to manually draw regions of the page, text blocks, decorations, and comments. The user can select the shape (polygon or rectangle) and region type to draw. The drawn polygons and rectangles have different colors depending on their region types. After drawing, the user can modify polygons or rectangles by dragging a vertex to a new position. A new vertex can also be added to a polygon. In addition, a polygon can be deleted. Finally, an XML file storing the GT of the page is generated.
Figure 2. A screenshot of DIVADIAWI.
Figure 3. System workflow of
D
IVA
DIAWI.
(a)
(b)
(c)
(d)
Figure 4. The process to draw polygons of text lines by automatic processing and manual modification. (a) A part of original image. (b) Rectangular polygon of text block drawn. manually by the user. (c) Polygons of text lines obtained by automatic processing. (d) Final result obtained by manual modification.
Evaluation
IAM Historical Document Database (IAM-HistDB)
7 is used for the evaluation. The database contains a variety of historical document images, which are color or gray, single-column or double-column, and date from different centuries.
D
IVA
DIAWI works well on the database. Thanks to the automatic processing, it greatly accelerates the GT generation. We show an example to draw polygons of text lines on a color image
8 in Figure 4. Most pixels within regions of text lines are included in the polygons obtained by automatic processing (see Figure 4[c]). However, the result of this step is still not perfect. For example, the bottom part of the big character ‘G’ in the first text line is very close to the second text line, leading automatic processing to fail to separate them. The problems could be solved by manually dragging the problematic vertexes of the polygons to proper positions (see Figure 4[d]). In general, with the automatic processing, the user needs to do only a few simple manual modifications for each text line. This manual work is much less effort than completely manual editing from scratch.
System Usability Scale (SUS; Brooke, 1996), a measurement to assess the global performance of system usability, is used to assess
D
IVA
DIAWI. Researchers from the humanities and computer sciences are both invited to participate in the assessment. Quantitative evaluation details will be provided in the full paper.
Conclusions and Future Work
We propose
D
IVA
DIAWI, a web-based interface for the GT generation for historical document images.
D
IVA
DIAWI is efficient to accelerate the GT generation, thanks to the integration of automatic processing and manual editing. It is robust to work on different kinds of historical document images from the IAM-HistDB.
In the future we will keep improving
D
IVA
DIAWI. For example, automatic processing could be better designed to even decrease errors. Techniques to automatically outline other region types—e.g., page and text block—could also be applied.
Notes
1. Workshop on Historical Document Imaging and Processing, 2013, http://www.cvc.uab.es/~vfrinken/hip2013/.
2. 11th IAPR International Workshop on Document Analysis Systems, 2014, http://das2014.sciencesconf.org/.
3. Saint Gall DB: Cod. Sang. 562, Abbey Library of St. Gall (SG30).
4. http://diuf.unifr.ch/main/diva/.
5. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html.
6. In a process of manual GT generation of 100 document pages, about 80% of the time was spent on text lines.
7. http://www.iam.unibe.ch/fki/databases/iam-historical-document-database.
8. Parzival DB: Cod. 857, Abbey Library of St Gall (PAR23).
Bibliography
Biller, O., Asi, A., Kedem, K., El-Sana, J. and Dinstein, I. (2013). WebGT: An Interactive Web-based System for Historical Document Ground Truth Generation.
12th International Conference on Document Analysis and Recognition, pp. 305–8.
Brooke, J. (1996). SUS—A Quick and Dirty Usability Scale. Usability Evaluation. In Jordan, P. W., Weerdmeester, B., Thomas, A. and McLelland, I. L. (eds),
Industry. Taylor and Francis.
Chen, K., Seuret, M., Wei, H., Liwicki, M., Hennebert, J. and Ingold, R. (2015). Ground Truth Model, Tool, and Dataset for Layout Analysis of Historical Documents.
Proceedings of SPIE 22nd Document Recognition and Retrieva XXII, vol. 9402.
Clausner, C., Pletschacher, S. and Antonacopoulos, A. (2011). Aletheia—An Advanced Document Layout and Text Ground-Truthing System for Production Environments.
International Conference on Document Analysis and Recognition, pp. 48–52.
Doermann, D., Zotkina, E. and Li, H. (2010). GEDI—A Groundtruthing Environment for Document Images.
Ninth IAPR International Workshop on Document Analysis Systems (DAS 2010).
Jain, A. K. and Farrokhnia, F. (1991). Unsupervised Texture Segmentation Using Gabor Filters.
Pattern Recognition,
24(12): 1167–86.
Lee, C. and Kanungo, T. (2003). The Architecture of TRUEVIZ: a Groundtruth/Metadata Editing and Visualizing Toolkit.
Pattern Recognition,
36(3): 811–25.
Pletschacher, S. and Antonacopoulos, A. (2010). The PAGE (Page Analysis and Ground-truth Elements) Format Framework.
International Conference on Pattern Recognition, pp. 257–60.
Raju, S. S., Pati, P. B. and Ramakrishnan, A. G. (2004). Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images.
First International Workshop on Document Image Analysis for Libraries, pp. 233–43.
Ramel, J. Y., Busson, S. and Demonet, M. L. (2006). AGORA: The Interactive Document Image Analysis Tool of the BVH Project.
Second International Conference on Document Image Analysis for Libraries, pp. 145–55.
Saund, E., Lin, J. and Sarkar, P. (2009). PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images.
International Conference on Document Analysis and Recognition, pp. 646–50.
If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.
Complete
Hosted at Western Sydney University
Sydney, Australia
June 29, 2015 - July 3, 2015
280 works by 609 authors indexed
Conference website: https://web.archive.org/web/20190121165412/http://dh2015.org/
Attendance: 469 https://web.archive.org/web/20190422031340/http://dh2015.org/wp-content/uploads/2015/06/DH2015-Attendees.pdf
Series: ADHO (10)
Organizers: ADHO