Semantically connecting text fragments - Text-Text-Link-Editor

poster / demo / art installation
Authorship
  1. 1. Thomas Selig

    Fachhochschule Worms (Worms University of Applied Sciences)

  2. 2. Marc Wilhelm Küster

    Fachhochschule Worms (Worms University of Applied Sciences)

  3. 3. Eric Sean Conner

    Fachhochschule Worms (Worms University of Applied Sciences)

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

Semantically connecting text fragments – Text-Text-Link-Editor
Print Friendly

XML
Selig, Thomas, Fachhochschule Worms, Germany, selig@ztt.fh-worms.de
Küster, Marc W., Fachhochschule Worms, Germany, kuester@fh-worms.de
Conner, Eric Sean, Fachhochschule Worms, Germany, conner@fh-worms.de
Text-Text-Link-Editor (TTLE) is a tool designed to allow researchers to link arbitrary text fragments across document boundaries. The tool’s architecture was developed with the goal of creating a generic, easy to use tool that can support various disciplines of text research, e.g. for annotating texts or to present an original text together with its translation.

TTLE is developed as part of the TextGrid project and will be integrated in the TextGridLab. Being part of TextGridLab allows TTLE to benefit from the advantages TextGrid offers like user management, storage, search tool and metadata.

The authors are currently not aware of existing tools with a similar set of functionalities as those planned for TTLE. They do, however, closely collaborate in the development with Centre for e-Research (CeRch) at King’s College London, to maximize synergy. They would also be interested in learning about possible other work on similar tools currently ongoing.

Due to scholars usually having different access privileges to various documents they want to work with, TTLE will offer three ways to select text fragments, depending on the accessibility of the source document. If the source document is writable for the user, corresponding tags to mark start and end of a selection will be inserted into the document. If a document is immutable for all users, a document-offset for start and end of the selection will be calculated, this also allows the selection of any text fragment.

Figure 1
Documents that are read only to the TTLE-user but are writable for other users are more difficult to handle. If tags in such documents have unique identifiers, those can be targeted as selection for TTLE. If no unique identifiers are available or do not match the selector’s needs a copy of the document can be created for the TTLE-user. This selection mechanism still requires TTLE to validate all links on a regular basis, as any writable document can be changed and so can be linked text fragments. TTLE will therefore keep hash sums of all text fragments that are referenced by links and inform the user of any changes.

Any number of text fragments can be linked together. These links can be either free form or can be assigned specific types. These link types are pre-defined and belong to a specific person, who can share these types with selected projects. The personalization of link types automatically hides all link types created by other people. This will prevent the user from being confused by hundreds of similar link types. Also, this concept enables working groups to use the same link types without requiring everyone to create their own set. Text can be added to every link. As free form link types do not define any specifications, only a single text block can be added to each link of type free form. Pre-defined link types instead can carry an XML schema fragment, which then has to validate against the text attached to any link of that specific type. Due to the nature of TextGrid as a distributed, collaborative system all link types are stored in a centralized triple-store and are exposed to the web through open interfaces. Additionally, this online storage makes it very easy to assigned groups of link types to specific projects the owner wants to share them with. For future stages of the project an enhancement for handling link types is foreseen. Plans are: grouping link types together, handing over link type groups to other people and making link type groups publicly available to form a text linking community. These future plans will be kept in mind while implementing the current stage.

As mentioned earlier, any text fragments can be linked together. But additionally other targets can be included in a link as well. One possible target is an external URI. This allows to reference sources not directly accessible to TTLE. But additionally this allows to include non text objects into links, for example references to persons, e.g. using their FOAF (http://www.foaf-project.org/) or dbpedia (http://dbpedia.org/) identifiers. Identical URIs are treated as identical objects by TTLE. Another possible target for a link is another link. This allows grouping links and commenting link groups. For example you can create multiple links in multiple versions of a document showing that part a is followed by part b and then create a link referring all the links of all the documents showing that part a always comes before part b except in document version x. Links to a single target are also possible, offering an easy way to comment text passages when using the free form link type.

All links created will be stored in a specific link document. This document is a TEI compliant XML file. Each link target will be represented there together with a locator specifying the selected text depending on the chosen selection method (see above). Also information about the link type and the additional data required by the link type will be stored in this file. To be more useful to working groups, information about the user who created a link and the user who last changed a link will also be stored here. To ensure easy access to linked data, the links stored in a TTLE-file will be sortable by link types and target documents.

Figure 2
To work with linked texts, TTLE will offer different views, always directly comparing two documents in two viewports. If more than two documents form a link, each of these documents can easily be selected and made visible in one of the two viewports. At a later stage of the project plans are to use a customly defined XSL-transformation to even more adapt the displayed data to the needs of the researcher.

TTLE is developed with open interfaces in mind. One such interface enables other applications to propose links for specific documents. To demonstrate this, a separate webservice will also be implemented, which scans specified documents for similar text fragments and returns a link proposal to TTLE offering to link the text passages found to be similar.

Several prototypic implementations have been created to aid the selection of the best suited technology for this project. A web based solution has been found to be best fit. This web application will be integrated into TextGridLab using the Eclipse browser component, combining the utilization of all existing TextGrid tools, with the rendering capabilities of modern browsers and ease of maintenance of a web deployable application. Basic functionality is currently implemented and a working prototype of the application is expected for May 2012.

Plans to enhance TTLE even further after May 2012 have already been made. The following features were identified as particularly important and, subject to funding, are foreseen to be implemented starting June 2012:

Scientists often present the results of their research on their own web portal. To easily integrate the TTLE-results stored in TextGrid into these portals, an API which allows external applications to directly access links and the linked text fragments would be useful. This can either be done by export functionality within the TextGridLab or by offering a web service interface to access the data stored in the Grid.
Map like visualization of all linked documents in a project can give a good overview of the whole project, while displaying all documents of a specific link type can offer a very good view of a document’s development. Both types of visualization would be important to be supported in TTLE.
Predefined link types are an essential element of TTLE. Having a comfortable editor for managing these link types would be helpful. Important functionality would be: grouping link types together, offering link types and type groups to other people, making link types and type groups publicly available, allowing modifications to link types which are in use and transforming specific link types into others.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO - 2012
"Digital Diversity: Cultures, languages and methods"

Hosted at Universität Hamburg (University of Hamburg)

Hamburg, Germany

July 16, 2012 - July 22, 2012

196 works by 477 authors indexed

Conference website: http://www.dh2012.uni-hamburg.de/

Series: ADHO (7)

Organizers: ADHO

Tags
  • Keywords: None
  • Language: English
  • Topics: None