Recommendations of the TEI Task Force on SGML to XML Migration

poster / demo / art installation
Authorship
  1. 1. Christine Ruotolo

    Libraries - University of Virginia

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The TEI Task Force on SGML to XML Migration was convened in May 2002 and charged with developing recommendations for migrating existing TEI resources from SGML to XML. The Task Force was comprised of representatives from projects with significant TEI SGML, along with selected technical experts and the TEI editors, and has worked for the past 18 months to diagnose and document the problems, methods, and tools necessary to migrate legacy TEI data to XML.

Migrating TEI resources from SGML to XML provides a number of benefits. Many projects have been working with the same SGML DTD for many years and may need to re-examine it. Migration provides an opportunity to revisit DTDs and encoding practices which were developed to facilitate searching or display in a particular SGML-based system but are no longer necessary in an XML-based system. It also creates an opportunity to parse data again and fix errors.

One of the most compelling reasons for a project or individual to consider migrating data is the scarcity of SGML-aware software and tools and the relative abundance of XML-based tools. Indeed, as XML becomes the industry standard there is a real danger that SGML-aware software will no longer be supported. SGML also lacks a suite of related standards that allow full exploitation of the encoded data. XML, on the other hand, is accompanied by a number of related standards and specifications, such as XPath, XSLT, XML Schemas, XPointer, XLink, and XQuery.

However, despite the fact that XML is a subset of SGML, migration is not a trivial process, especially in the cases of large holdings of legacy data. Such a process demands the consideration of many technical and strategic issues. A smooth transition between SGML and XML is especially important for those working with TEI, since future releases of the TEI Guidelines will no longer be SGML-compliant. The first release of the Guidelines, the SGML-based TEI P3 (1994) has been now superseded by the XML-based TEI P4 (2002), which still maintains backward compatibility with P3, and hence SGML. Thus the conversion from P3 to P4 is relatively straightforward, while the ongoing development of P5, the next generation of the Guidelines, will render P3 increasingly obsolete. TEI P5 will be XML-based and will not ensure backward compatibility, so a P3 to P5 migration may be substantially more difficult than P3 to P4. Therefore having TEI P4-conformant XML texts will make life much simpler should a P5 migration become necessary.

The TEI Task Force on SGML to XML Migration has substantially completed its work and plans to finalize its recommendations by the end of 2003. The primary deliverables of the Task Force are two reports: "Strategic Considerations in Migration of TEI Documents from SGML to XML" and the "Practical Guide to Migration of TEI Documents from SGML to XML." The first report, intended for administrators and project managers, emphasizes the planning and decision-making involved in data migration, while the second report describes the mechanics of conversion in greater detail and is written primarily for the technical staff who will implement the conversion. The specific recommendations in the technical report are augmented by a set of Migration Case Study Reports that discuss sample migration efforts undertaken by members of the Task Force. The samples represent a broad variety of languages and encoding practices and describe solutions to particular migration challenges, like converting SDATA entities and complex DTD extensions. They include the MULTEXT-East Multilingual Corpus, the Corpus of Middle English Prose and Verse, the Japanese Text Initiative, the Women Writers Project, the Thomas MacGreevy Archive, Documenting the American South, the Victorian Women Writers Project, and the Thesaurus Musicarum Italicarum. All documentation relating to the Task Force is available through its Activities page on the TEI website: http://www.tei-c.org/Activities/MI/.

This poster session will summarize the recommendations of the Task Force, discuss how the recommendations have been received by the TEI user community, and describe possibilities for future work on data migration issues.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ALLC/EADH - 2004

Hosted at Göteborg University (Gothenburg)

Gothenborg, Sweden

June 11, 2004 - June 16, 2004

105 works by 152 authors indexed

Series: ACH/ICCH (24), ALLC/EADH (31), ACH/ALLC (16)

Organizers: ACH, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None