A Newsletters Service based on XML-TEI

  1. 1. Alejandro Bia

    Libraries - University of Alicante

  2. 2. Irene Garrigós

    University of Alicante

  3. 3. Jaime Gomez

    University Alicante

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


In this paper we describe the production model and dissemination models of a newsletters service. We will first describe the context of the problem, one of the biggest Web-based digital libraries of Spanish works, and the spirit and objectives of its interactive and dissemination services. Amongst these, the newsletters service is an information dissemination service that provides useful information to readers. We will talk about the production model of the newsletters based on XML-TEI (2) and XSLT, and about the personalizable dissemination model we are now implementing. The dissemination model combines adaptive with adaptable personalization techniques, being capable of ranking news according to navigation-inferred preferences and then filter them according to a user-given profile.

Communication services of the digital library

One of the goals of the MCDL is to act as a communication channel for the academic community. In this sense we have implemented a number of communication services, and we try to maintain a permanent communication with our readers.

Electronic publishing makes it possible to reach every corner of the world and opens up new research and communication paths. The Miguel de Cervantes Digital LibraryÂ’s Web based news-service comprises five different digital library newsletters, and one monthly journal all managed and produced using XML-TEI and XSLT technology. News and articles may appear in different newsletters and/or the journal, and they also have different periodicity (some are published quarterly and some monthly). The news and articles come from different sources, which generally coincide with departments or units of our digital library. A general editor reviews the articles or news, decides where they must appear, and also manages the distribution lists.

A simple solution for production

Each of the five different newsletters anf the journal are delivered in several optional output formats, and all managed and produced based on XML-TEI and XSLT technology (see figure 1).

First, we considered the possibility of developing a management system based on a database. Then, while reviewing the requirements, we realized that a much simpler solution using an XML editor, XML-TEI encoding and XSLT transformations was possible.

Figure F1
Fig. 1: Newsletter and journal production workflow.

A database based system would be better for editing and maintenance of news, but news have no maintenance. Once the are published, there is no more editing of the news. However, they can be searched, but this service can be provided by an XML searcher without the need of an DBMS (database management system).

With this simple solution, no system programming was needed, only the XSLT transforms needed to be built. In addition, it takes advantage of the same TEI XML markup scheme and the same processing technology we use to produce our digital books. The differences are mainly the overlapping of news and the interrelated times of publication.

We used XSL-TEI to markup each piece of news. In the case of the newsletters and the journal we use a subset of the tagset we currently use for books, so we developed a small DTD for this purpose.

We had to develop several XSL transformation scripts to produce the different output formats required. Newsletters are generated both in plain text (for traditional mail readers) and in HTML for those who prefer a richer format.

How it works

Once the monthly news file is complete and supervised, we enter the automatic phase of generation of different output formats. The final output in HTML format is obtained from a double transformation of the XML-TEI file (see figure 3). First an XSL transformation processes the XML monthly file to generate a single HTML file (with special formatting marks embedded). Then a parsing program of our own design called MakeBook transforms that file into a digital book, generating a file for the table of contents, and a file for each section of the journal. Notes are extracted from the main file and placed in small external files, leaving hyperlinks to these files in their place. MakeBook was also thought to add page headers and footers for each section, that include buttons to implement a navigation pattern called "indexed guided tour" [2]. This navigation pattern allows both the presence of a central index or table of contents and also buttons to move back and forth the different sections of the journal (hence the metaphor “guided tour”). Buttons to jump up and down article headings with a mouse click are also provided. The result is an electronic journal that is nothing but a set of web pages interconnected with a ring-star topology: a bidirectional ring of connections to navigate the sections, plus a central table-of-contents page with bidirectional connections to each section (see figure 2).

A set of templates is used for formatting and providing navigation functionality to the generated HTML files. The purpose of using templates is to give a uniform appearance to all the monthly journals, as well as to facilitate the maintenance, so that format changes can be applied easily and evenly to the whole set (e.g., section headers and footers, background colors and textures and navigation buttons can be changed through these templates).

Figure F2
Fig. 2: Navigation topology of the journal.

Figure F3
Fig. 3: Generation of the HTML publication format for the journal.

The proposed solution for dissemination

The Web Engineering Group of the department of Computer Languages and Information Systems at the University of Alicante has developed a method, OOH [1], and an accompanying software tool (VisualWADE) to assist the design of language-independent Web Applications. This software, based on standards for information systemÂ’s object-oriented analysis and design like UML, OCL and XML, supplies an environment for modelling personalized and device-independent user interfaces. In this project, VisualWADE was used to model the navigation and personalization aspects of the application.

Personalization: ranking and filtering of news

The user model we have implemented for newsletters automatically and transparently incorporates information gathered from user navigation (adaptive part). In addition, the user can set-up some filtering restrictions and customization preferences when registering for this service (adaptable part). The final model is based both on implicit interests on certain digital library sections (information gathered during navigation) and on explicit preferences compose the user profile.

News are classified by category and subject-matter. Categories are: new publications (new digital resources), future publications, new sections, chat announcements, call for papers, suggestions from our departments, letters from readers, visits of important people, contests, and the remainder are classified as general news. Subjects or matters are derived from the actual thematic structure of the DL. Each theme section or subcollection generates a subject-matter, as for instance: Latin-American literature, humanities research, history, children's literature, theatre, interactive services of the DL, computers and humanities, critical studies, tribute to hispanists, Argentine Academy of Letters, PhD theses, movies, magazines/journals, recently printed books, law, and many more. This allows for a very fine granularity.

An algorithm for ranking news preferences

Every time the user clicks on an entry of the newsletter table of contents (see figure 4), the Web page jumps to a single piece of news, and the server increments in one the corresponding category and subject-matter counters. Only one category but multiple subjects can be assigned to a piece of news. Relative access frequencies can be computed for categories and for subjects. Then news can be given a ranking value for a given user for a given news-reading session, which is calculated as the sum of subject frequencies of the subjects corresponding to a given piece of news, multiplied by the frequency of its category. For instance, if a user has an access frequency of 0.3 for the “new-publications” category, and, 0.1 for the “history” and 0.2 for the “PhD-theses” subjects, then a piece of news announcing the publication of a PhD thesis on history will weight (0.1 + 0.2) 0.3 = 0.09, and will be ranked accordingly.

A profile for filtering news

On registration, the users can specify Boolean constraints for categories and subjects, saying which ones should to be sorted out and which should be displayed. This profile can be modified by the user.

Personalization at work

Newsletters are accessed through a monthly index, where news are ordered first by category and then by subject, according to the dynamically computed ranking. But not all the ranked news appear, they are filtered according to the user explicit profile. The first N (3) ranked entries that pass the filter are shown openly, and the rest appear as a collapsed "more news" button.

Figure F4
Figure 4: Table of contents of a personalized newsletter.

OO-HMethod Overview

Personalization properties are captured at navigation/presentation level and are reflected in their corresponding conceptual models by means of a set of association rules. The design and generation of the navigation logic is specified in two parts: a stable part, independent from the personalization properties, and a variable part, that supports the treatment of these rules. Finally, a rules engine provides the context to interpret the generated rules at execution time. (see figure 5).

Figure F5
Figure 5: Navigation logic of OO-H to support dynamic personalization.


This solution to the production of newsletters and a journal for a digital library saves an important amount of time: now we can produce the newsletters and the journal in less time and with less effort than what was previously required to produce only the newsletters by hand. The output formats are uniform and regular, and less error prone. Previous newsletters showed les uniformity and some rendering errors. Compared to the other production models of our digital library (for digital text books and for digital facsimiles books), this model is different in three ways:

The production does not begin with scanning. This model deals with the production of digitally born material.
This model includes many more different output formats.
This model is tied to a given fixed periodicity proper of this kind of publication.
The general newsletter, which is the most demanded one, currently has 14,000 subscribers worldwide.

Concerning the dissemination model for these newsletters, we have enhanced the granularity by offering more detailed personalization options, which allow us to rank the news based on preferences gathered from user navigation (observation model). The design of this solution was performed according to the OO-H Model and using the VisualWADE tool. This technology can significantly increase the productivity at the time of developing Web applications.

The MCDL, with this effort, struggles to fulfill its objective of spreading research knowledge to the global academic community through the Web. Our aim is to be able to offer a better service by optimizing searches of digital resources, by reducing waiting times for digital publication, by promoting dynamic scientific research communication and by developing efficient preservation strategies. Daily experience results in the continuous integration of new ideas. Our goal is not only the mere publication of research work, but to build a rich and open communication channel for the global scientific community. The newsletter service described here plays a key role in this communication channel.

Appendix: News in XML format

The following is a simplified sample piece of news in XML-TEI format. Attributes indicate in which newsletter/journal the piece of news should be included:

<div1 general="yes" AmLat="no" children="no" history="no"> <BR>
<head type="main">2.4-Ediciones Multimedia: </head> <BR>
<p>-Hemos publicado la edici&oacute;n multimedia del gui&oacute;n de "La Corte de
Fara&oacute;n" (1985), de Rafael Azcona y Jos&eacute; Luis
Garc&iacute;a S&aacute;nchez. Se trata de una pel&iacute;cula clave
para acercarse desde una perspectiva humor&iacute;stica al
fen&oacute;meno de la censura teatral durante el franquismo. Puede
encontrarse, adem&aacute;s, una serie de materiales diversos sobre este
divertido film y la opereta hom&oacute;nima, con los que es posible conocer
algunos de sus detalles m&aacute;s relevantes.</p><BR>
<p> <xref doc="http://cervantesvirtual.com/086.pdf?incr=1"> </xref></p><BR>

1. Cristina Cachero. OO-H: Una Extensión a los Métodos OO para el Modelado y Generación Automática de Interfaces Hipermediales . PhD Thesis, Dept. Computer Languages and Information Systems, University of Alicante, 2nd December 2002. Directed by: Dr. Jaime Gómez Ortega y Dr. Oscar Pastor López.
2. A. Ginige and S. Murugesan. Web Engineering: an Introduction. IEEE Multimedia Special Issue on Web Engineering , pages 14-18, 04 2001.
1. This paper has been partially supported by the Spanish Ministry of Science and Technology (Ministerio de Ciencia y Tecnología de España), project TIC2001-3530-C02-02.
2. XML stands for eXtensible Markup Language, and TEI is an XML vocabulary developed by the TEI Consortium. TEI stands for Text Encoding Initiative.
3. N is a user given parameter.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info



Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

  • Keywords: None
  • Language: English
  • Topics: None