Glyph Names and Current Implementations

[ Document version 1.1. Last updated January 31, 2003 ]
  1. Introduction
  2. Where and why glyph names are used
  3. Why the name prefix "u" is not yet recommended for glyphs which are encoded in the Unicode BMP
  4. Length and character set limitations on names
  5. Adobe Type Manager and kern pair filtering
  6. Document changes
1. Introduction

This article is strongly time-dated, as it contains comments about current implementations. Please bear in this mind if reading it much after Oct 2002.

2. Where and why glyph names are used

Following the naming conventions of the article "Unicode and Glyph Names" will currently enable copying text and searching text in PDF (Adobe Portable Document Format) documents under a wider variety of circumstances than having no names, or names that do not follow these conventions. In the era of the Internet, where many documents must be searchable in order to be useful, this is very important.

Many PDF files are made from PostScript printer files, when the original fonts referenced by the document are not available, and the embedded font data must be used. In this case, the Unicode cmap table of an OpenType font is not available, and the only clue that the PDF maker may have about the semantics of a glyph is the name of the glyph.

Even when the original font file is present, glyphs which do not match characters in the Unicode specification may still be usefully connected to a Unicode character by the glyph name. For example, naming a decorative variant of "t" as "t.alt" allows a PDF producing program to note that "t.alt" carries the same semantics as "t", for searching.

In the future, it is expected that more products than Adobe's suite will support text string searches by the Unicode semantics of a glyph, and that the scope of usefulness of these rules will become much wider.

For OpenType/TTF fonts, where the font may be missing glyph names altogether, the presence in the font file cmap table of a correct Unicode value for a glyph will still cause it to be correctly identified in most cases. However, for glyphs that do not have a Unicode character representation, the comments above apply.

3. Why the prefix "u" is not yet recommended for glyphs which are encoded in the Unicode BMP

The prefix "u" is not supported by by Acrobat 4 and 5. It will be supported by Acrobat 6 and later, which is also when support for Unicode characters outside the Basic Multilingual Plane (BMP) will be introduced. The AGL names, and names with the prefix "uni", and the "." and "_" parsing rules are already understood by Acrobat 4 and 5.

4. Length and character set limitations on names

Glyphs from Western OpenType/CFF and OpenType/TTF fonts must still be referenced in many workflow cases as name-keyed font data. As a result, glyph names are still subject to the length and character code limitations that are imposed by the Type 1 specification and the implementation of existing installations of PostScript interpreters. These both require the following:

A glyph name may be up to 31 characters in length, must be entirely comprised of characters from the following set:

  1. A-Z
  2. a-z
  3. 0-9
  4. . (period)
  5. _ (underscore)

and must not start with a digit or period. The only exception is the special character ".notdef".

"twocents", "a1", and "_" are valid glyph names. "2cents" and ".twocents" are not.

5. Adobe Type Manager and kern pair filtering

For many programs, support for kerning in OpenType fonts is provided to a limited degree by the Windows and Macintosh Adobe Type Managers. This limitation arises because most programs that are not OpenType-aware assume that all the kern pairs in a font can reasonably fit is a single table, and that there will be no more than a few thousand kern pairs. Providing more kern pairs than this causes such programs to crash. With the class-based kerning supported by OpenType layout, even a font with only 220 glyphs will usually exceed the limit, if well-kerned. In order to allow use of OpenTyoe fonts without crashing many current programs, the Adobe Type Manager programs support kerning via the legacy operating system calls by first fully expanding the class-based kerning to a list of single glyph name pairs, and then filtering this list through a hard-coded list of glyph names. If the glyph name on either side of the kern pair is not in the filter list, then the kern pair is omitted. The kern filtering list for Windows 95 and Mac OS 9 is here. The kern filtering list for Windows NT and 2000 is here.

6. Document changes

[31 January 2003] Added kern filter lists.

[4 November 2002] First version.