Bridging the Divide: Supporting Minority and Historic Scripts in Fonts: Problems and Recommendations

paper, specified "short paper"
Authorship
  1. 1. Deborah Anderson

    University of California Berkeley

Work text
This plain text was ingested for the purpose of full-text search, not to preserve original formatting or readability. For the most complete copy, refer to the original conference program.


Introduction
Today, users of many modern minority and historic scripts in Unicode are not able to reliably send text electronically, because Unicode-enabled fonts and software are not available.

Especially true for scripts in Unicode versions 6.0 to 9.0 (2010 – 2016), where over 40% of the scripts have no fonts. (Unicode version 10.0 was released in June 2017, so support in fonts would not yet be expected). The Google Noto project aims to provide fonts for all approved scripts, but release of fonts is only up to fonts for Unicode version 6.2, released in 2012.

In addition, some communities have access to Unicode fonts, but the fonts aren’t used, because they do not provide features deemed necessary, such as positioning of characters (e.g., Egyptian Hieroglyphs
[Richmond and Glass, 2016]) or variant glyphs (e.g., Old Italic [Anderson, 2017]). Instead, images are used, which are not searchable or, alternatively, “hacked” fonts are employed, which require each person to have the same, non-standard font to send text. Keyboards or other input mechanisms are also not available for many of these same scripts. As a result, the promise that Unicode will “enable people around the world to use computers in any language” (Unicode Consortium, 2018a), does not yet ring true for some communities.

This short paper will highlight font-related problems with specific examples and will provide suggestions on how to address them.
Problems

Creating a Unicode-enabled font for a language is often not a simple task, especially when the script for the language includes combining marks (which require correct positioning), or if the script has special rendering behavior, such as the consonant clusters found in South Asian scripts (Evans, 2017).

Font creation is made more challenging when typographic details on the script (and language) are not available. Since many recently approved scripts in Unicode are not well known, information on the typography is not readily available. Unfortunately, fine details are often not included in Unicode proposals for the scripts.

Interaction with the user community is critical in developing a suitable font, but some communities are difficult to contact. In addition, there can be differing views on the preferred shapes of glyphs. For a set of 51 Tamil numbers and fractions, for example, the community took 8 years to come to agreement on the preferred representative shapes. Specific cases will be cited, based on the author’s experience, including discussion of how to connect user communities with font providers.

Technical Issue: Glyph Variants

For some script users, access to glyph variants is important. This is true, for example, for the Old Italic Unicode block which unified several related alphabets of Italy, dating from approximately the 8 until 1c BCE. In Old Italic, the glyph in a particular alphabet may vary from that shown in the Unicode Standard.

The Old Italic block was encoded with the understanding that different fonts would be used for the different languages and alphabets (Unicode Consortium, 2017). How should the two forms of Faliscan (above) be handled in the same font then? How should a pan-Old Italic font handle the different alphabets (which use the same code points)?
This paper will describe the pros and cons of different options available, including use of:

Code points in Unicode’s Private Use Area (with the caveat that these code points would not be reliable for general interchange) (Unicode Consortium, 2018c).
A Unicode variation sequence, when a distinction needs to be captured in plain-text (Unicode Consortium, 2018d).
An OpenType font feature, such as character variants, stylistic alternates, stylistic sets, or localized forms (Microsoft Typography, 2018).
Language-specific fonts (i.e., Faliscan1 and Faliscan2 fonts for the two forms above).

Suggested Solutions

Incorporate font creation as a part of the overall script encoding effort, such as: including a font item in the budget to pay for a font designer to develop a font; provide information on how to create a font for users; fund a font-creation workshop within the community.

Encourage user communities to submit a list of the basic repertoire of characters and auxiliary characters to the Common Locale Data Repository (Unicode Consortium, 2018b), since this information is used for by font and software developers worldwide. In addition, provide information on the shapes of the needed letters and variants, citing reference works (i.e., a book or website) on a publicly accessible webpage.

For handling glyph variants, short-term and long-terms approaches should be considered:

If a given variant is deemed by users to be necessary in plain-text, submit a Unicode proposal
If OpenType features are used in a font, lobby software vendors to provide better support for the features in applications (as support for some features is still spotty [4])
For the short-term, PUA or separate fonts may be necessary.

For font designers:

Use language tags from ISO 639 (SIL International, 2017), BCP 47 (Phillips and Davis, 2009), and OpenType language/script tags (Microsoft Typography, 2017a; Microsoft Typography, 2017b) in the font internals. If a language (or script) is missing a tag, a new tag should be registered. According to Roozbeh Pournader, an expert at implementation of fonts, these tags are the way the fonts communicate with other software today.

Encourage users to review the glyphs in alpha versions of any forthcoming or any released Noto fonts, and submit comments to the Noto project (Google.com, n.d.).

Conclusion
Access to a Unicode font is critical for users of lesser-used scripts, in order to participate more fully in the digital world. Unicode fonts make the user’s text interchangeable, discoverable, and able to be preserved for the long-term in a stable format. Recognition of font-related issues is a small step towards addressing the problem. Input from the audience will be encouraged in order to identify other potential approaches.

Funding

This work was supported by the National Endowment for the Humanities [grant number PR-253360-17].

Bibliography

Anderson, D. (2017). Dealing with Variants in Historic Scripts. Presentation at 41
st Internationalization and Unicode Conference, Santa Clara, California, October, 2017.

Evans, L. (2017). Beyond Unicode Proposals: Encoding Characters and Scripts is Not Enough! Presentation at 41
st Internationalization and Unicode Conference, Santa Clara, California, October 2017.

Google.com. (n.d.). Google Noto Fonts.
https://www.google.com/get/noto/ (accessed April 17, 2018).

Microsoft Typography. (2017a). Language system tags.
https://www.microsoft.com/typography/otspec/languagetags.htm (accessed April 17, 2018).

Microsoft Typography. (2017b). Script tags.
https://www.microsoft.com/typography/otspec/scripttags.htm (accessed April 17, 2018).

Microsoft Typography. (2018). OpenType® specification
.
https://www.microsoft.com/en-us/Typography/OpenTypeSpecification.aspx (accessed April 17, 2018).

Phillips, A., and Davis, M. (2009). Tags for Identifying Languages.
https://tools.ietf.org/html/bcp47 (accessed April 17, 2018).

Richmond, B. and Glass, A. (2016). Proposal to encode three control characters for Egyptian Hieroglyphs. Proposal submitted to the Unicode Technical Committee.
http://www.unicode.org/L2/L2016/16018r-three-for-egyptian.pdf (accessed April 17, 2018).

SIL International. (2017). ISO 639-3: ISO 639 Code Tables.
http://www-01.sil.org/iso639-3/codes.asp (accessed April 17, 2018).

Unicode Consortium. (2017). Old Italic. In: Unicode Consortium,
The Unicode Standard, Version 10.0.0. Mountain View, CA: The Unicode Consortium, 349-351. http://www.unicode.org/versions/Unicode10.0.0/ (accessed 24 April 2018).

Unicode Consortium. (2018a). The Unicode Consortium website.
http://unicode.org/
[accessed April 17, 2018).

Unicode Consortium. (2018b). CLDR - Unicode Common Locale Data Repository. Unicode Consortium website. http://cldr.unicode.org (accessed April 17, 2018).

Unicode Consortium. (2018c). Private-Use Characters, Noncharacters & Sentinels FAQ. Unicode Consortium website.
http://www.unicode.org/faq/private_use.html (accessed April 17, 2018).

Unicode Consortium. (2018d). Variation Sequences. Unicode Consortium website.

http://www.unicode.org/faq/vs.html (accessed April 17, 2018).

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

Conference Info

Complete

ADHO / EHD - 2018
"Puentes/Bridges"

Hosted at El Colegio de México, Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico)

Mexico City, Mexico

June 26, 2018 - June 29, 2018

340 works by 859 authors indexed

Conference website: https://dh2018.adho.org/

Series: ADHO (13), EHD (4)

Organizers: ADHO