What Every Digital Humanities' Scholar Should Know about Unicode : Considerations on when to Propose a Character for Unicode and When to Rely on Markup

poster / demo / art installation
Authorship
  1. 1. Deborah Anderson

    Department of Linguistics - University of California Berkeley

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.

The international character encoding standard
Unicode provides scholars a means to encode their texts with a widely supported standard. It is the default
for the World Wide Web and plays a prominent role
in the P5 version of the TEI Guidelines (Sperberg-
McQueen and Burnard 2005). Yet even with over 97,000 characters defined, Unicode is still missing characters
from various specialized fields, including characters for Byzantine Greek and Latin epigraphy, and several
historic and modern minority scripts (Anderson 2003). Indeed, the TEI Guidelines acknowledges the challenges of being able to cover the full gamut of textual materials,
by noting “there will always be a need to encode
documents which use non-standard characters [i.e., which are not in Unicode] and glyphs, particularly but not exclusively in historical material” (chapter 4,
Sperberg-McQueen and Burnard 2005). And in chapter 25, P5 goes on to provide guidance on how to encode
letters and symbols not in Unicode with a “gaiji” module.
The trend seems to be away from proposing new characters
for inclusion in the Unicode Standard, at least the
Unicode Technical Committee has received relatively few requests since 2004 from digital humanities projects,
with the exception of a proposal by medievalists
(Everson, Haugen, et al. 2005). Since characters aren’t being proposed, projects must be relying on markup
with entities, employing the newly proposed “gaiji”
mechanism, using a font solution (i.e., using the
Private Use Area or a proprietary font with non-standard
encodings), or a combination of these.
However, if texts are going to be exchanged electronically
and ultimately made available to future generations of students and scholars (such as via large scale digital
projects such as Open Content Alliance or as a part of online teaching materials), it might be advisable in the long run for scholars to seriously consider proposing the characters to Unicode, if they are eligible. This talk will address practical considerations digital humanists should weigh when deciding whether to pursue – or forego -- standardizing the characters in their texts.
A primary consideration is to weigh the time and effort
required when formally proposing a character: The
process takes two to five years, and requires an advocate
to work on a proposal, be available to answer questions, and to stay involved in the process.
Other issues to consider:
* Is there broad consensus from the user community
in support of the character? (Deep divisions amongst scholars will discourage the standards committees from approving a character.)
* Even if a given character is identified as needed, not all characters may be approved. Scholars may need to be flexible with the standards committees, and be open to compromise.
* Some characters are unlikely to be encoded:
precomposed forms, decorations that do not appear to have semantic content, idiosyncratic letters and marks, and ligatures. Color as a feature of a character is also outside the realm of Unicode.
* Note that even after a character is approved, fonts need to be created and, in the case of complex rendering
of a character (or script), users may need to wait for upgrades to rendering engines (i.e., Uniscribe)
before the character can be displayed properly.
Though the onus falls on digital humanists to prove
the need for the requested characters to the standards committees, the rewards of standardization can outweigh
the obstacles: Because the character is part of the
standard, users can expect decent rendering and display
behavior from off-the-shelf software. Standardizing the characters will also make them searchable by search
engines, thereby making the textual materials available to a wider audience.
The poster will include a few examples of successful character encoding proposals, which can serve as useful
models for humanities text encoders, if they pursue
proposing new characters. References
Anderson, Deborah. “Unicode and Historic Scripts.” Ariadne, Issue 37, 30 October 2003.
URL: http://www.ariadne.ac.uk/issue37/anderson/intro.html
Everson, Michael, Odd Einar Haugen, et al. “N2957: Preliminary proposal to add medievalist characters to the UCS.” URL: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2957.pdf
Sperberg-McQueen, C.M., and Lou Burnard. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Revised and re-edited. Oxford —
Providence — Charlottesville — Bergen, 2005. URL: http://www.tei-c.org/release/doc/tei-p5-doc/html/
The Unicode Consortium. The Unicode Standard, Version 4.0.1, defined by: The Unicode Standard, Version
4.0 (Reading, MA, Addison-Wesley, 2003. ISBN
0-321-18578-1), as amended by Unicode 4.0.1. URL: http://www.unicode.org/versions/Unicode4.0.1

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ACH/ALLC / ACH/ICCH / ADHO / ALLC/EADH - 2006

Hosted at Université Paris-Sorbonne, Paris IV (Paris-Sorbonne University)

Paris, France

July 5, 2006 - July 9, 2006

151 works by 245 authors indexed

The effort to establish ADHO began in Tuebingen, at the ALLC/ACH conference in 2002: a Steering Committee was appointed at the ALLC/ACH meeting in 2004, in Gothenburg, Sweden. At the 2005 meeting in Victoria, the executive committees of the ACH and ALLC approved the governance and conference protocols and nominated their first representatives to the ‘official’ ADHO Steering Committee and various ADHO standing committees. The 2006 conference was the first Digital Humanities conference.

Conference website: http://www.allc-ach2006.colloques.paris-sorbonne.fr/

Series: ACH/ICCH (26), ACH/ALLC (18), ALLC/EADH (33), ADHO (1)

Organizers: ACH, ADHO, ALLC

Tags
  • Keywords: None
  • Language: English
  • Topics: None