The Glyph Substitution Table
The Glyph Substitution table (GSUB) contains information for substituting glyphs to render the scripts and language systems supported in a font. Many language systems require glyph substitutes. For example, in the Arabic script, the glyph shape that depicts a particular character varies according to its position in a word or text string (see figure 1). In other language systems, glyph substitutes are aesthetic options for the user, such as the use of ligature glyphs in the English language (see Figure 2).
![]() |
|
Figure 1. Isolated, initial, medial, and final forms of the Arabic character HAH |
![]() |
|
Figure 2. Two Latin glyphs and their associated ligature Overview Many fonts use limited character encoding standards that map glyphs to characters one-to-one, assigning a glyph to each character code value in a font. Multiple character codes cannot be mapped to a single glyph, as needed for ligature glyphs, and multiple glyphs cannot be mapped to a single character code, as needed to decompose a ligature into its component glyphs. To supply glyph substitutes, font developers must assign different character codes to the glyphs, or they must create additional fonts or character sets. To access these glyphs, users must bear the burden of switching between character codes, character sets, or fonts. Substituting Glyphs with OpenType The OpenType GSUB table fully supports glyph substitution. To access glyph substitutes, GSUB maps from the glyph index or indices defined in a cmap table to the glyph index or indices of the glyph substitutes. For example, if a font has three alternative forms of an ampersand glyph, the cmap table associates the ampersand's character code with only one of these glyphs. In GSUB, the indices of the other ampersand glyphs are then referenced by this one index. The text-processing client uses the GSUB data to manage glyph substitution actions. GSUB identifies the glyphs that are input to and output from each glyph substitution action, specifies how and where the client uses glyph substitutes, and regulates the order of glyph substitution operations. Any number of substitutions can be defined for each script or language system represented in a font. The GSUB table supports six types of glyph substitutions that are widely used in international typography:
Table Organization The GSUB table begins with a header that defines offsets to a ScriptList, a FeatureList, and a LookupList (see Figure 3g):
For a detailed discussion of ScriptLists, FeatureLists, and LookupLists, see the chapter OpenType Common Table Formats. |
![]() |
|
Figure 7. High-level organization of GSUB table This organization helps text-processing clients to easily locate the features and lookups that apply to a particular script or language system. To access GSUB information, clients should use the following procedure:
Lookup data is defined in one or more subtables that define the specific conditions, type, and results of a substitution action used to implement a feature. All subtables in a lookup must be of the same LookupType, as listed in the LookupType Enumeration table: LookupType Enumeration table for glyph substitution |
| Value | Type | Description |
| 1 | Single | Replace one glyph with one glyph |
| 2 | Multiple | Replace one glyph with more than one glyph |
| 3 | Alternate | Replace one glyph with one of many glyphs |
| 4 | Ligature | Replace multiple glyphs with one glyph |
| 5 | Context | Replace one or more glyphs in context |
| 6 | Chaining Context | Replace one or more glyphs in chained context |
| 7 | Extension Substitution | Extension mechanism for other substitutions (i.e. this excludes the Extension type substitution itself) |
| 8 | Reverse chaining context single | Applied in reverse order, replace single glyph in chaining context |
| 9+ | Reserved | For future use |
|
Each LookupType subtable has one or more formats. The "best" format depends on the type of substitution and the resulting storage efficiency. When glyph information is best presented in more than one format, a single lookup may define more than one subtable, as long as all the subtables are for the same LookupType. For example, within a given lookup, a glyph index array format may best represent one set of target glyphs, whereas a glyph index range format may be better for another set. A series of substitution operations on the same glyph or string requires multiple lookups, one for each separate action. Each lookup is given a different array number in the LookupList table and is applied in the LookupList order. During text processing, a client applies a lookup to each glyph in the string before moving to the next lookup. A lookup is finished for a glyph after the client locates the target glyph or glyph context and performs a substitution, if specified. To move to the "next" glyph, the client will typically skip all the glyphs that participated in the lookup operation: glyphs that were substituted as well as any other glyphs that formed a context for the operation. In the case of chained contextual lookups, glyphs comprising backtrack and lookahead sequences may participate in more than one context. The rest of this chapter describes the GSUB header and the subtables defined for each GSUB LookupType. Examples at the end of this page illustrate each of the five LookupTypes, including the three formats available for contextual substitutions. GSUB Header The GSUB table begins with a header that contains a version number for the table (Version) and offsets to a three tables: ScriptList, FeatureList, and LookupList. For descriptions of each of these tables, see the chapter, OpenType Common Table Formats. Example 1 at the end of this chapter shows a GSUB Header table definition. GSUB Header |
| Type | Name | Description |
| Fixed | Version | Version of the GSUB table-initially set to 0x00010000 |
| Offset | ScriptList | Offset to ScriptList table-from beginning of GSUB table |
| Offset | FeatureList | Offset to FeatureList table-from beginning of GSUB table |
| Offset | LookupList | Offset to LookupList table-from beginning of GSUB table |
|
LookupType 1: Single Substitution Subtable Single substitution (SingleSubst) subtables tell a client to replace a single glyph with another glyph. The subtables can be either of two formats. Both formats require two distinct sets of glyph indices: one that defines input glyphs (specified in the Coverage table), and one that defines the output glyphs. Format 1 requires less space than Format 2, but it is less flexible. Format 1 calculates the indices of the output glyphs, which are not explicitly defined in the subtable. To calculate an output glyph index, Format 1 adds a constant delta value to the input glyph index. For the substitutions to occur properly, the glyph indices in the input and output ranges must be in the same order. This format does not use the Coverage Index that is returned from the Coverage table. The SingleSubstFormat1 subtable begins with a format identifier (SubstFormat) of 1. An offset references a Coverage table that specifies the indices of the input glyphs. DeltaGlyphID is the constant value added to each input glyph index to calculate the index of the corresponding output glyph. Example 2 at the end of this chapter uses Format 1 to replace standard numerals with lining numerals. SingleSubstFormat1 subtable: Calculated output glyph indices |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| int16 | DeltaGlyphID | Add to original GlyphID to get substitute GlyphID |
|
Single Substitution Format 2 Format 2 is more flexible than Format 1, but requires more space. It provides an array of output glyph indices (Substitute) explicitly matched to the input glyph indices specified in the Coverage table. The SingleSubstFormat2 subtable specifies a format identifier (SubstFormat), an offset to a Coverage table that defines the input glyph indices, a count of output glyph indices in the Substitute array (GlyphCount), and a list of the output glyph indices in the Substitute array (Substitute). The Substitute array must contain the same number of glyph indices as the Coverage table. To locate the corresponding output glyph index in the Substitute array, this format uses the Coverage Index returned from the Coverage table. Example 3 at the end of this chapter uses Format 2 to substitute vertically oriented glyphs for horizontally oriented glyphs. SingleSubstFormat2 subtable: Specified output glyph indices |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 2 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | GlyphCount | Number of GlyphIDs in the Substitute array |
| GlyphID | Substitute [GlyphCount] |
Array of substitute GlyphIDs-ordered by Coverage Index |
|
LookupType 2: Multiple Substitution Subtable A Multiple Substitution (MultipleSubst) subtable replaces a single glyph with more than one glyph, as when multiple glyphs replace a single ligature. The subtable has a single format: MultipleSubstFormat1. The subtable specifies a format identifier (SubstFormat), an offset to a Coverage table that defines the input glyph indices, a count of offsets in the Sequence array (SequenceCount), and an array of offsets to Sequence tables that define the output glyph indices (Sequence). The Sequence table offsets are ordered by the Coverage Index of the input glyphs. For each input glyph listed in the Coverage table, a Sequence table defines the output glyphs. Each Sequence table contains a count of the glyphs in the output glyph sequence (GlyphCount) and an array of output glyph indices (Substitute). Note: The order of the output glyph indices depends on the writing direction of the text. For text written left to right, the left-most glyph will be first glyph in the sequence. Conversely, for text written right to left, the right-most glyph will be first. The use of multiple substitution for deletion of an input glyph is prohibited. GlyphCount should always be greater than 0. Example 4 at the end of this chapter shows how to replace a single ligature with three glyphs. MultipleSubstFormat1 subtable: Multiple output glyphs |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | SequenceCount | Number of Sequence table offsets in the Sequence array |
| Offset | Sequence [SequenceCount] | Array of offsets to Sequence tables-from beginning of Substitution table-ordered by Coverage Index |
|
Sequence table |
| Type | Name | Description |
| uint16 | GlyphCount | Number of GlyphIDs in the Substitute array. This should always be greater than 0. |
| GlyphID | Substitute [GlyphCount] |
String of GlyphIDs to substitute |
|
LookupType 3: Alternate Substitution Subtable An Alternate Substitution (AlternateSubst) subtable identifies any number of aesthetic alternatives from which a user can choose a glyph variant to replace the input glyph. For example, if a font contains four variants of the ampersand symbol, the cmap table will specify the index of one of the four glyphs as the default glyph index, and an AlternateSubst subtable will list the indices of the other three glyphs as alternatives. A text-processing client would then have the option of replacing the default glyph with any of the three alternatives. The subtable has one format: AlternateSubstFormat1. The subtable contains a format identifier (SubstFormat), an offset to a Coverage table containing the indices of glyphs with alternative forms (Coverage), a count of offsets to AlternateSet tables (AlternateSetCount), and an array of offsets to AlternateSet tables (AlternateSet). For each glyph, an AlternateSet subtable contains a count of the alternative glyphs (GlyphCount) and an array of their glyph indices (Alternate). Because all the glyphs are functionally equivalent, they can be in any order in the array. Example 5 at the end of this chapter shows how to replace the default ampersand glyph with alternative glyphs. AlternateSubstFormat1 subtable: Alternative output glyphs |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | AlternateSetCount | Number of AlternateSet tables |
| Offset | AlternateSet [AlternateSetCount] |
Array of offsets to AlternateSet tables-from beginning of Substitution table-ordered by Coverage Index |
|
AlternateSet table |
| Type | Name | Description |
| uint16 | GlyphCount | Number of GlyphIDs in the Alternate array |
| GlyphID | Alternate[GlyphCount] | Array of alternate GlyphIDs-in arbitrary order |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | LigSetCount | Number of LigatureSet tables |
| Offset | LigatureSet [LigSetCount] |
Array of offsets to LigatureSet tables-from beginning of Substitution table-ordered by Coverage Index |
|
A LigatureSet table, one for each covered glyph, specifies all the ligature strings that begin with the covered glyph. For example, if the Coverage table lists the glyph index for a lowercase "f," then a LigatureSet table will define the "ffl," "fl," "ffi," "fi," and "ff" ligatures. If the Coverage table also lists the glyph index for a lowercase "e," then a different LigatureSet table will define the "etc" ligature. A LigatureSet table consists of a count of the ligatures that begin with the covered glyph (LigatureCount) and an array of offsets to Ligature tables, which define the glyphs in each ligature (Ligature). The order in the Ligature offset array defines the preference for using the ligatures. For example, if the "ffl" ligature is preferable to the "ff" ligature, then the Ligature array would list the offset to the "ffl" Ligature table before the offset to the "ff" Ligature table. LigatureSet table: All ligatures beginning with the same glyph |
| Type | Name | Description |
| uint16 | LigatureCount | Number of Ligature tables |
| Offset | Ligature [LigatureCount] |
Array of offsets to Ligature tables-from beginning of LigatureSet table-ordered by preference |
|
For each ligature in the set, a Ligature table specifies the GlyphID of the output ligature glyph (LigGlyph); a count of the total number of component glyphs in the ligature, including the first component (CompCount); and an array of GlyphIDs for the components (Component). The array starts with the second component glyph (array index = 1) in the ligature because the first component glyph is specified in the Coverage table. Note: The Component array lists GlyphIDs according to the writing direction of the text. For text written right to left, the right-most glyph will be first. Conversely, for text written left to right, the left-most glyph will be first. Example 6 at the end of this chapter shows how to replace a string of glyphs with a single ligature. Ligature table: Glyph components for one ligature |
| Type | Name | Description |
| GlyphID | LigGlyph | GlyphID of ligature to substitute |
| uint16 | CompCount | Number of components in the ligature |
| GlyphID | Component [CompCount - 1] |
Array of component GlyphIDs-start with the second component-ordered in writing direction |
|
LookupType 5: Contextual Substitution Subtable A Contextual Substitution (ContextSubst) subtable defines the most powerful type of glyph substitution lookup: it describes glyph substitutions in context that replace one or more glyphs within a certain pattern of glyphs. ContextSubst subtables can be any of three formats that define a context in terms of a specific sequence of glyphs, glyph classes, or glyph sets. Each format can describe one or more input glyph sequences and one or more substitutions for each sequence. All three formats of ContextSubst subtables specify substitution data in a SubstLookupRecord. A description of that record follows. SubstLookupRecord |
| Type | Name | Description |
| uint16 | SequenceIndex | Index into current glyph sequence-first glyph = 0 |
| uint16 | SequenceIndex | Lookup to apply to that position-zero-based |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | SubRuleSetCount | Number of SubRuleSet tables-must equal GlyphCount in Coverage table |
| Offset | SubRuleSet [SubRuleSetCount] |
Array of offsets to SubRuleSet tables-from beginning of Substitution table-ordered by Coverage Index |
|
A SubRuleSet table consists of an array of offsets to SubRule tables (SubRule), ordered by preference, and a count of the SubRule tables defined in the set (SubRuleCount). The order in the SubRule array can be critical. Consider two contexts, <abc> and <abcd>. If <abc> is first in the SubRule array, all instances of <abc> in the text-including all instances of <abcd>-will be changed. If <abcd> comes first in the array, however, only <abcd> sequences will be changed, without affecting any instances of <abc>. |
| Type | Name | Description |
| uint16 | SubRuleCount | Number of SubRule tables |
| Offset | SubRule [SubRuleCount] |
Array of offsets to SubRule tables-from beginning of SubRuleSet table-ordered by preference |
|
A SubRule table consists of a count of the glyphs to be matched in the input context sequence (GlyphCount), including the first glyph in the sequence, and an array of glyph indices that describe the context (Input). The Coverage table specifies the index of the first glyph in the context, and the Input array begins with the second glyph (array index = 1) in the context sequence. Note: The Input array lists the indices in the order the corresponding glyphs appear in the text. For text written from right to left, the right-most glyph will be first; conversely, for text written from left to right, the left-most glyph will be first. A SubRule table also contains a count of the substitutions to be performed on the input glyph sequence (SubstCount) and an array of SubstitutionLookupRecords (SubstLookupRecord). Each record specifies a position in the input glyph sequence and a LookupListIndex to the substitution lookup that is applied at that position. The array should list records in design order, or the order the lookups should be applied to the entire glyph sequence. SubRule table: One simple context definition |
| Type | Name | Description |
| uint16 | GlyphCount | Total number of glyphs in input glyph sequence-includes the first glyph |
| uint16 | SubstCount | Number of SubstLookupRecords |
| GlyphID | Input [GlyphCount - 1] |
Array of input GlyphIDs-start with second glyph |
| struct | SubstLookupRecord [SubstCount] |
Array of SubstLookupRecords-in design order |
|
Example 7 at the end of the chapter shows how to use the ContextSubstFormat1 subtable to replace a sequence of three glyphs with a sequence preferred for the French language system. Format 2, a more flexible format than Format 1, describes class-based context substitution. For this format, a specific integer, called a class value, must be assigned to each glyph component in all context glyph sequences. Contexts are then defined as sequences of glyph class values. More than one context may be defined at a time. For example, suppose that a swash capital glyph should replace each uppercase letter glyph that is preceded by a space glyph and followed by a lowercase letter glyph (a glyph sequence of space - uppercase - lowercase). The set of uppercase glyphs would constitute one glyph class (Class 1), the set of lowercase glyphs would constitute a second class (Class 2), and the space glyph would constitute a third class (Class 3). The input context might be specified with a context rule (called a SubClassRule) that describes "the set of glyph strings that form a sequence of three glyph classes, one glyph from Class 3, followed by one glyph from Class 1, followed by one glyph from Class 2." Each ContextSubstFormat2 subtable contains an offset to a class definition table (ClassDef), which defines the glyph class values of all input contexts. Generally, a unique ClassDef table will be declared in each instance of the ContextSubstFormat2 table that is included in a font, even though several Format 2 tables could share ClassDef tables. Class assignments are fixed (the same for each position in the context), and classes are exclusive (a glyph cannot be in more than one class at a time). The output glyphs that replace the glyphs in the context sequences do not need class values because they are specified elsewhere by GlyphID. The ContextSubstFormat2 subtable also contains a format identifier (SubstFormat) and defines an offset to a Coverage table (Coverage). For this format, the Coverage table lists indices for the complete set of unique glyphs (not glyph classes) that may appear as the first glyph of any class-based context. In other words, the Coverage table contains the list of glyph indices for all the glyphs in all classes that may be first in any of the context class sequences. For example, if the contexts begin with a Class 1 or Class 2 glyph, then the Coverage table will list the indices of all Class 1 and Class 2 glyphs. A ContextSubstFormat2 subtable also defines an array of offsets to the SubClassSet tables (SubClassSet) and a count of the SubClassSet tables (SubClassSetCnt). The array contains one offset for each class (including Class 0) in the ClassDef table. In the array, the class value defines an offset's index position, and the SubClassSet offsets are ordered by ascending class value (from 0 to SubClassSetCnt - 1). For example, the first SubClassSet listed in the array contains all contexts beginning with Class 0 glyphs, the second SubClassSet contains all contexts beginning with Class 1 glyphs, and so on. If no contexts begin with a particular class (that is, if a SubClassSet contains no SubClassRule tables), then the offset to that particular SubClassSet in the SubClassSet array will be set to NULL. ContextSubstFormat2 subtable: Class-based context glyph substitution |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 2 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| Offset | ClassDef | Offset to glyph ClassDef table-from beginning of Substitution table |
| uint16 | SubClassSetCnt | Number of SubClassSet tables |
| Offset | SubClassSet [SubClassSetCnt] |
Array of offsets to SubClassSet tables-from beginning of Substitution table-ordered by class-may be NULL |
|
Each context is defined in a SubClassRule table, and all SubClassRules that specify contexts beginning with the same class value are grouped in a SubClassSet table. Consequently, the SubClassSet containing a context identifies a context's first class component. Each SubClassSet table consists of a count of the SubClassRule tables defined in the SubClassSet (SubClassRuleCnt) and an array of offsets to SubClassRule tables (SubClassRule). The SubClassRule tables are ordered by preference in the SubClassRule array of the SubClassSet. SubClassSet subtable |
| Type | Name | Description |
| uint16 | SubClassRuleCnt | Number of SubClassRule tables |
| Offset | SubClassRule [SubClassRuleCount] |
Array of offsets to SubClassRule tables-from beginning of SubClassSet-ordered by preference |
|
For each context, a SubClassRule table contains a count of the glyph classes in the context sequence (GlyphCount), including the first class. A Class array lists the classes, beginning with the second class (array index = 1), that follow the first class in the context. Note: Text order depends on the writing direction of the text. For text written from right to left, the right-most class will be first. Conversely, for text written from left to right, the left-most class will be first. The values specified in the Class array are the values defined in the ClassDef table. For example, a context consisting of the sequence "Class 2, Class 7, Class 5, Class 0" will produce a Class array of 7,5,0. The first class in the sequence, Class 2, is identified in the ContextSubstFormat2 table by the SubClassSet array index of the corresponding SubClassSet. A SubClassRule also contains a count of the substitutions to be performed on the context (SubstCount) and an array of SubstLookupRecords (SubstLookupRecord) that supply the substitution data. For each position in the context that requires a substitution, a SubstLookupRecord specifies a LookupList index and a position in the input glyph sequence where the lookup is applied. The SubstLookupRecord array lists SubstLookupRecords in design order-that is, the order in which lookups should be applied to the entire glyph sequence. SubClassRule table: Context definition for one class |
| Type | Name | Description |
| uint16 | GlyphCount | Total number of classes specified for the context in the rule-includes the first class |
| uint16 | SubstCount | Number of SubstLookupRecords |
| uint16 | Class [GlyphCount - 1] |
Array of classes-beginning with the second class-to be matched to the input glyph class sequence |
| struct | SubstLookupRecord [SubstCount] |
Array of Substitution lookups-in design order |
|
Example 8 at the end of this chapter uses Format 2 to substitute Arabic mark glyphs for base glyphs of different heights. Context Substitution Format 3 Format 3, coverage-based context substitution, defines a context rule as a sequence of coverage tables. Each position in the sequence may define a different Coverage table for the set of glyphs that matches the context pattern. With Format 3, the glyph sets defined in the different Coverage tables may intersect, unlike Format 2 which specifies fixed class assignments (identical for each position in the context sequence) and exclusive classes (a glyph cannot be in more than one class at a time). For example, consider an input context that contains a lowercase glyph (position 0), followed by an uppercase glyph (position 1), either a lowercase or numeral glyph (position 2), and then either a lowercase or uppercase vowel (position 3). This context requires four Coverage tables, one for each position:
Unlike Formats 1 and 2, this format defines only one context rule at a time. It consists of a format identifier (SubstFormat), a count of the glyphs in the sequence to be matched (GlyphCount), and an array of Coverage offsets that describe the input context sequence (Coverage). Note: The order of the Coverage tables listed in the Coverage array must follow the writing direction. For text written from right to left, then the right-most glyph will be first. Conversely, for text written from left to right, the left-most glyph will be first. The subtable also contains a count of the substitutions to be performed on the input Coverage sequence (SubstCount) and an array of SubstLookupRecords (SubstLookupRecord) in design order-that is, the order in which lookups should be applied to the entire glyph sequence. Example 9 at the end of this chapter substitutes swash glyphs for two out of three glyphs in a sequence. ChainContextSubstFormat3 subtable: Coverage-based context glyph substitution |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 3 |
| uint16 | GlyphCount | Number of glyphs in the input glyph sequence |
| uint16 | SubstCount | Number of SubstLookupRecords |
| Offset | Coverage[GlyphCount] | Array of offsets to Coverage table-from beginning of Substitution table-in glyph sequence order |
| struct | SubstLookupRecord [SubstCount] |
Array of SubstLookupRecords-in design order |
|
LookupType 6: Chaining Contextual Substitution Subtable A Chaining Contextual Substitution subtable (ChainContextSubst) describes glyph substitutions in context with an ability to look back and/or look ahead in the sequence of glyphs. The design of the Chaining Contextual Substitution subtable is parallel to that of the Contextual Substitution subtable, including the availability of three formats for handling sequences of glyphs, glyph classes, or glyph sets. Each format can describe one or more backtrack, input, and lookahead sequences and one or more substitutions for each sequence. Chaining Context Substitution Format 1: Simple Chaining Context Glyph Substitution Format 1 defines the context for a glyph substitution as a particular sequence of glyphs. For example, a context could be <xyz>, <holiday>, <!?*#@>, or any other glyph sequence. Within a context sequence, Format 1 identifies particular glyph positions (not glyph indices) as the targets for specific substitutions. When a text-processing client locates a context in a string of text, it finds the lookup data for a targeted position and makes a substitution by applying the lookup data at that location. To specify the context, the coverage table lists the first glyph in the input sequence, and the ChainSubRule subtable defines the rest. Once a covered glyph is found at position i, the client reads the corresponding ChainSubRuleSet table and examines each table to determine if it matches the surrounding glyphs in the text. In the simplest of cases, there is a match if the string <backtrack sequence>+<input sequence>+<lookahead sequence> matches with the glyphs at position i - BacktrackGlyphCount in the text. LookupFlag values affect backtrack/lookahead sequences. To clarify the ordering of glyph arrays for input, backtrack and lookahead sequences, the following illustration is provided. Input sequence match begins at i where the input sequence match begins. The backtrack sequence is ordered beginning at i - 1 and increases in offset value as one moves away from i. The lookahead sequence begins after the input sequence and increases in logical order. |
| Logical order - | a | b | c | d | e | f | g | h | i | j |
| i | ||||||||||
| Input sequence - | 0 | 1 | ||||||||
| Backtrack sequence - | 3 | 2 | 1 | 0 | ||||||
| Lookahead sequence - | 0 | 1 | 2 | 3 |
|
If there is a match, then the client finds the target glyph positions for substitutions and completes the substitutions. Please note that (just like in the ContextSubstFormat1 subtable) these lookups are required to operate within the range of text from the covered glyph to the end of the input sequence. No substitutions can be defined for the backtracking sequence or the lookahead sequence. Once the substitutions are complete, the client should move to the glyph position immediately following the matched input sequence and resume the lookup process from there. A single ChainContextSubstFormat1 subtable may define more than one context glyph sequence. If different context sequences begin with the same glyph, then the Coverage table should list the glyph only once because all glyphs in the table must be unique. For example, if three contexts each start with an "s" and two start with a "t," then the Coverage table will list one "s" and one "t." All of the ChainSubRule tables defining contexts that begin with the same first glyph are grouped together and defined in a ChainSubRuleSet table. For example, the ChainSubRule tables that define the three contexts that begin with an "s" are grouped in one ChainSubRuleSet table, and the ChainSubRule tables that define the two contexts that begin with a "t" are grouped in a second ChainSubRuleSet table. Each glyph listed in the Coverage table must have a ChainSubRuleSet table defining all the ChainSubRule tables that apply to a covered glyph. A ChainContextSubstFormat1 subtable contains a format identifier (SubstFormat), an offset to a Coverage table (Coverage), a count of defined ChainSubRuleSets (ChainSubRuleSetCount), and an array of offsets to the ChainSubRuleSet tables (ChainSubRuleSet). As mentioned, one ChainSubRuleSet table must be defined for each glyph listed in the Coverage table. In the ChainSubRuleSet array, the ChainSubRuleSet table offsets are ordered in the Coverage Index order. The first ChainSubRuleSet in the array applies to the first GlyphID listed in the Coverage table, the second ChainSubRuleSet in the array applies to the second GlyphID listed in the Coverage table, and so on. ChainContextSubstFormat1 subtable: Simple context glyph substitution |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| uint16 | ChainSubRuleSetCount | Number of ChainSubRuleSet tables-must equal GlyphCount in Coverage table |
| Offset | ChainSubRuleSet [ChainSubRuleSetCount] |
Array of offsets to ChainSubRuleSet tables-from beginning of Substitution table-ordered by Coverage Index |
|
A ChainSubRuleSet table consists of an array of offsets to ChainSubRule tables (ChainSubRule), ordered by preference, and a count of the ChainSubRule tables defined in the set (ChainSubRuleCount). The order in the ChainSubRule array can be critical. Consider two contexts, <abc> and <abcd>. If <abc> is first in the ChainSubRule array, all instances of <abc> in the text-including all instances of <abcd>-will be changed. If <abcd> comes first in the array, however, only <abcd> sequences will be changed, without affecting any instances of <abc>. ChainSubRuleSet table: All contexts beginning with the same glyph |
| Type | Name | Description |
| uint16 | ChainSubRuleCount | Number of ChainSubRule tables |
| Offset | ChainSubRule [ChainSubRuleCount] |
Array of offsets to ChainSubRule tables-from beginning of ChainSubRuleSet table-ordered by preference |
|
A ChainSubRule table consists of a count of the glyphs to be matched in the backtrack, input, and lookahead context sequences, including the first glyph in each sequence, and an array of glyph indices that describe each portion of the contexts. The Coverage table specifies the index of the first glyph in each context, and each array begins with the second glyph (array index = 1) in the context sequence. Note: All arrays list the indices in the order the corresponding glyphs appear in the text. For text written from right to left, the right-most glyph will be first; conversely, for text written from left to right, the left-most glyph will be first. A ChainSubRule table also contains a count of the substitutions to be performed on the input glyph sequence (SubstCount) and an array of SubstitutionLookupRecords (SubstLookupRecord). Each record specifies a position in the input glyph sequence and a LookupListIndex to the substitution lookup that is applied at that position. The array should list records in design order, or the order the lookups should be applied to the entire glyph sequence. ChainSubRule subtable |
| Type | Name | Description |
| uint16 | BacktrackGlyphCount | Total number of glyphs in the backtrack sequence (number of glyphs to be matched before the first glyph) |
| GlyphID | Backtrack [BacktrackGlyphCount] |
Array of backtracking GlyphID's (to be matched before the input sequence) |
| uint16 | InputGlyphCount | Total number of glyphs in the input sequence (includes the first glyph) |
| GlyphID | Input [InputGlyphCount - 1] |
Array of input GlyphIDs (start with second glyph) |
| uint16 | LookaheadGlyphCount | Total number of glyphs in the look ahead sequence (number of glyphs to be matched after the input sequence) |
| GlyphID | LookAhead [LookAheadGlyphCount] |
Array of lookahead GlyphID's (to be matched after the input sequence) |
| uint16 | SubstCount | Number of SubstLookupRecords |
| struct | SubstLookupRecord [SubstCount] |
Array of SubstLookupRecords (in design order) |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 2 |
| Offset | Coverage | Offset to Coverage table-from beginning of Substitution table |
| Offset | BacktrackClassDef | Offset to glyph ClassDef table containing backtrack sequence data-from beginning of Substitution table |
| Offset | InputClassDef | Offset to glyph ClassDef table containing input sequence data-from beginning of Substitution table |
| Offset | LookaheadClassDef | Offset to glyph ClassDef table containing lookahead sequence data-from beginning of Substitution table |
| uint16 | ChainSubClassSetCnt | Number of ChainSubClassSet tables |
| Offset | ChainSubClassSet [ChainSubClassSetCnt] |
Array of offsets to ChainSubClassSet tables-from beginning of Substitution table-ordered by input class-may be NULL |
|
Each context is defined in a ChainSubClassRule table, and all ChainSubClassRules that specify contexts beginning with the same class value are grouped in a ChainSubClassSet table. Consequently, the ChainSubClassSet containing a context identifies a context's first class component. Each ChainSubClassSet table consists of a count of the ChainSubClassRule tables defined in the ChainSubClassSet (ChainSubClassRuleCnt) and an array of offsets to ChainSubClassRule tables (ChainSubClassRule). The ChainSubClassRule tables are ordered by preference in the ChainSubClassRule array of the ChainSubClassSet. ChainSubClassSet subtable |
| Type | Name | Description |
| uint16 | ChainSubClassRuleCnt | Number of ChainSubClassRule tables |
| Offset | ChainSubClassRule [ChainSubClassRuleCount] |
Array of offsets to ChainSubClassRule tables-from beginning of ChainSubClassSet-ordered by preference |
|
For each context, a ChainSubClassRule table contains a count of the glyph classes in the context sequence (GlyphCount), including the first class. A Class array lists the classes, beginning with the second class (array index = 1), that follow the first class in the context. Note: Text order depends on the writing direction of the text. For text written from right to left, the right-most class will be first. Conversely, for text written from left to right, the left-most class will be first. The values specified in the Class array are the values defined in the ClassDef table. The first class in the sequence, Class 2, is identified in the ChainContextSubstFormat2 table by the ChainSubClassSet array index of the corresponding ChainSubClassSet. A ChainSubClassRule also contains a count of the substitutions to be performed on the context (SubstCount) and an array of SubstLookupRecords (SubstLookupRecord) that supply the substitution data. For each position in the context that requires a substitution, a SubstLookupRecord specifies a LookupList index and a position in the input glyph sequence where the lookup is applied. The SubstLookupRecord array lists SubstLookupRecords in design order-that is, the order in which lookups should be applied to the entire glyph sequence. ChainSubClassRule table: Chaining context definition for one class |
| Type | Name | Description |
| uint16 | BacktrackGlyphCount | Total number of glyphs in the backtrack sequence (number of glyphs to be matched before the first glyph) |
| uint16 | Backtrack [BacktrackGlyphCount] |
Array of backtracking classes(to be matched before the input sequence) |
| uint16 | InputGlyphCount | Total number of classes in the input sequence (includes the first class) |
| uint16 | Input [InputGlyphCount - 1] |
Array of input classes(start with second class; to be matched with the input glyph sequence) |
| uint16 | LookaheadGlyphCount | Total number of classes in the look ahead sequence (number of classes to be matched after the input sequence) |
| uint16 | LookAhead [LookAheadGlyphCount] |
Array of lookahead classes(to be matched after the input sequence) |
| uint16 | SubstCount | Number of SubstLookupRecords |
| struct | SubstLookupRecord [SubstCount] |
Array of SubstLookupRecords (in design order) |
|
Chaining Context Substitution Format 3: Coverage-based Chaining Context Glyph Substitution Format 3 defines a chaining context rule as a sequence of Coverage tables. Each position in the sequence may define a different Coverage table for the set of glyphs that matches the context pattern. With Format 3, the glyph sets defined in the different Coverage tables may intersect, unlike Format 2 which specifies fixed class assignments (identical for each position in the backtrack, input, or lookahead sequence) and exclusive classes (a glyph cannot be in more than one class at a time). Note: The order of the Coverage tables listed in the Coverage array must follow the writing direction. For text written from right to left, then the right-most glyph will be first. Conversely, for text written from left to right, the left-most glyph will be first. The subtable also contains a count of the substitutions to be performed on the input Coverage sequence (SubstCount) and an array of SubstLookupRecords (SubstLookupRecord) in design order: that is, the order in which lookups should be applied to the entire glyph sequence. (SubstLookupRecords are described next.) ChainContextSubstFormat3 subtable: Coverage-based chaining context glyph substitution |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 3 |
| uint16 | BacktrackGlyphCount | Number of glyphs in the backtracking sequence |
| Offset | Coverage[BacktrackGlyphCount] | Array of offsets to coverage tables in backtracking sequence, in glyph sequence order |
| uint16 | InputGlyphCount | Number of glyphs in input sequence |
| Offset | Coverage[InputGlyphCount] | Array of offsets to coverage tables in input sequence, in glyph sequence order |
| uint16 | LookaheadGlyphCount | Number of glyphs in lookahead sequence |
| Offset | Coverage[LookaheadGlyphCount] | Array of offsets to coverage tables in lookahead sequence, in glyph sequence order |
| uint16 | SubstCount | Number of SubstLookupRecords |
| struct | SubstLookupRecord [SubstCount] |
Array of SubstLookupRecords, in design order |
| Type | Name | Description |
| USHORT | SubstFormat | Format identifier. Set to 1. |
| USHORT | ExtensionLookupType | Lookup type of subtable referenced by ExtensionOffset (i.e. the extension subtable). |
| ULONG | ExtensionOffset | Offset to the extension subtable, of lookup type ExtensionLookupType, relative to the start of the ExtensionSubstFormat1 subtable. |
|
ExtensionLookupType must be set to any lookup type other than 7. All subtables in a LookupType 7 lookup must have the same ExtensionLookupType. All offsets in the extension subtables are set in the usual way, i.e. relative to the extension subtables themselves.
Substitution Lookup Record All contextual substitution subtables specify the substitution data in a Substitution Lookup Record (SubstLookupRecord). Each record contains a SequenceIndex, which indicates the position where the substitution will occur in the glyph sequence. In addition, a LookupListIndex identifies the lookup to be applied at the glyph position specified by the SequenceIndex. The contextual substitution subtables defined in Examples 7, 8, and 9 at the end of this chapter show SubstLookupRecords. LookupType 8: Reverse Chaining Contextual Single Substitution Subtable Reverse Chaining Contextual Single Substitution subtable (ReverseChainSingleSubst) describes single glyph substitutions in context with an ability to look back and/or look ahead in the sequence of glyphs. The major difference between this and other lookup types is that processing of input glyph sequence goes from end to start. Comparing to Chaining Contextual Sustitution this format is restricted to only coverage based subtable format, input sequence could contain only single glyph and only single substitution allowed on this glyph. This substitution rule is integrated into subtable format. This lookup type is designed specifically for the Arabic script writing styles, like nastaliq, where the shape of the glyph is determined by the following glyph, beginning at the last glyph of the "joor", or set of connected glyphs. An example of this lookup type is defined in Example 10 at the end of this chapter. Reverse Chaining Contextual Single Substitution Format 1: Coverage Based Reverse Chaining Contextual Single Glyph Substitution. Format 1 defines a chaining context rule as a sequence of Coverage tables. Each position in the sequence may define a different Coverage table for the set of glyphs that matches the context pattern. With Format 1, the glyph sets defined in the different Coverage tables may intersect. Note: Despite reverse order processing, the order of the Coverage tables listed in the Coverage array must be in logical order (follow the writing direction). The backtrack sequence is as illustrated in the LookupType 6: Chaining Contextual Substitution subtable. The input sequence is one glyph located at i in the logical string. The backtrack begins at i - 1 and increases in offset value as one moves toward the logical beginning of the string. The lookahead sequence begins at i + 1 and increases in offset value as one moves toward the logical end of the string. In the reverse chaining process i began at the logical end of the string and moves to the beginning. The subtable contains Coverage table for input glyph and Coverage table arrays for lookahead and backtrack sequences, also count of output glyph indices in the Substitute array (GlyphCount), and a list of the output glyph indices (Substitute array). The Substitute array must contain the same number of glyph indices as the Coverage table. To locate the corresponding output glyph index in the Substitute array, this format uses the Coverage Index returned from the Coverage table. ReverseChainSingleSubstFormat1 subtable: Coverage-based Reverse Chaining Contextual Single Glyph substitution . |
| Type | Name | Description |
| uint16 | SubstFormat | Format identifier-format = 1 |
| Offset | Coverage | Offset to Coverage table - from beginning of Substitution table |
| uint16 | BacktrackGlyphCount | Number of glyphs in the backtracking sequence |
| Offset | Coverage[BacktrackGlyphCount] | Array of offsets to coverage tables in backtracking sequence, in glyph sequence order |
| uint16 | LookaheadGlyphCount | Number of glyphs in lookahead sequence |
| Offset | Coverage[LookaheadGlyphCount] | Array of offsets to coverage tables in lookahead sequence, in glyph sequence order |
| uint16 | GlyphCount | Number of GlyphIDs in the Substitute array |
| GlyphID |




