Shaperglot - Test font files for OpenType language support¶
Shaperglot is a library and a utility for testing a font’s language support. You give it a font, and it tells you what languages are supported and to what degree.
Most other libraries to check for language support (for example, Rosetta’s wonderful hyperglot library) do this by looking at the Unicode codepoints that the font supports. Shaperglot takes a different approach.
What’s wrong with the Unicode codepoint coverage approach?¶
For many common languages, it’s possible to check that the language is supported just by looking at the Unicode coverage. For example, to support English, you need the 26 lowercase and uppercase letters of the Latin alphabet.
However, for the majority of scripts around the world, covering the codepoints needed is not enough to say that a font really supports a particular language. For correct language support, the font must also behave in a particular way.
Take the case of Arabic as an example. A font might contain glyphs which cover
all the codepoints in the Arabic block (0x600-0x6FF). But the font only supports
Arabic if it implements joining rules for the init, medi and fina features.
To say that a font supports Devanagari, it needs to implement conjuncts (which
set of conjuncts need to be included before we can say the font “supports”
Devanagari is debated…) and half forms, as well as contain a languagesystem
statement which triggers Indic reordering.
Even within the Latin script, a font only supports a language such as Turkish if its casing behaving respects the dotless / dotted I distinction; a font only supports Navajo if its ogonek anchoring is different to the anchoring used in Polish; and so on.
But there’s a further problem with testing language support by codepoint coverage: it encourages designers to “fill in the blanks” to get to support, rather than necessarily engage with the textual requirements of particular languages.
Testing for behaviour, not coverage¶
Shaperglot therefore determines language support not just on codepoint coverage, but also by examining how the font behaves when confronted with certain character sequences.
The trick is to do this in a way which is not prescriptive. We know that there are many different ways of implementing language support within a font, and that design and other considerations will factor into precisely how a font is constructed. Shaperglot presents the font with different strings, and makes sure that “something interesting happened” - without necessarily specifying what.
In the case of Arabic, we need to know that the init feature is present, and that
when we shape some Arabic glyphs, the output with init turned on is different
to the output with init turned off. We don’t care what’s different; we only
care that something has happened. (Yes, this still makes it possible to trick shaperglot into reporting support for a language which is not correctly implemented, but at that point, it’s probably less effort to actually implement it…)
Shaperglot includes the following kinds of test:
Certain codepoints were mapped to base or mark glyphs.
A named feature was present.
A named feature changed the output glyphs.
A mark glyph was attached to a base glyph or composed into a precomposed glyph (but not left unattached).
Certain glyphs in the output were different to one another.
Languagesystems were defined in the font.
…
Using Shaperglot¶
Shaperglot consists of multiple components:
Shaperglot Web interface¶
The easiest way to use Shaperglot as an end-user or font developer is through the web interface. This allows you to drag and drop a font to analyze its language coverage. This is entirely client-side, and all fonts remain on your computer. Nothing is uploaded.
Shaperglot command line tools¶
The next most user-friendly way to use Shaperglot is at the command line. You can install the latest version with:
cargo install --git https://github.com/googlefonts/shaperglot
This will provide you with a new tool called shaperglot. It has four subcommands:
shaperglot check <font> <language> <language>...checks whether a font supports the given language IDs.shaperglot report <font>reports all languages supported by the font.shaperglot describe <language>explains what needs to be done for a font to supportt a given language ID.
$ shaperglot describe Nuer
The font MUST support the following Nuer bases and marks: 'a', 'A', 'ä', 'Ä', 'a̱', 'A̱', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'e̱', 'E̱', 'ɛ', 'Ɛ', 'ɛ̈', 'Ɛ̈', 'ɛ̱', 'Ɛ̱', 'ɛ̱̈', 'Ɛ̱̈', 'f', 'F', 'g', 'G', 'ɣ', 'Ɣ', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'i̱', 'I̱', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'ŋ', 'Ŋ', 'o', 'O', 'ö', 'Ö', 'o̱', 'O̱', 'ɔ', 'Ɔ', 'ɔ̈', 'Ɔ̈', 'ɔ̱', 'Ɔ̱', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z', '◌̈', '◌̱'
The font SHOULD support the following auxiliary orthography codepoints: 'ʈ', 'Ʈ'
Latin letters should form small caps when the smcp feature is enabled
Shaperglot Rust library¶
See the documentation on https://docs.rs/shaperglot/latest
Shaperglot Python library¶
The Python library wraps the Rust library using PyO3. This new PyO3 implementation
broadly follows the same API as the original 0.x Python implementation, but all
imports are at the top level (from shaperglot import Checker, etc.) The PyO3
version is available as a pre-release from Pypi.
Python Library Documentation: https://shaperglot.readthedocs.io/en/latest/
Library usage¶
Reading the code of the CLI tool is a good way to understand how to use the library. However, the most common use case - checking a font for language support - looks like this:
from shaperglot import Checker, Languages
langs = Languages() # Load a language database
checker = Checker(filename) # Create a checker context for the font
supported = []
for lang_id, language in langs.values():
if checker.check(language).score > 80:
supported.append(lang_id)
Running checks and getting results¶
- class shaperglot.Checker(filename)¶
The context for running font language support checks
This is the main entry point to shaperglot; it is used to load a font and run checks against it.
- check(lang)¶
Run a check against the font
- Parameters:
lang – A Language object obtained from the Languages directory.
- Returns:
A Reporter object with the results of checking the font for language coverage.
- class shaperglot.Reporter¶
The result of testing a font for support of a particular language
The Reporter object can be iterated on to return a list of CheckResult objects.
- fails¶
Failing checks
This returns CheckResult objects for all checks which failed.
- fixes_required¶
Number of fixes required to add support.
- is_nearly_success(fixes)¶
Whether the font can be easily fixed to support the language.
The audience of this method is the designer of the font, not the user of the font. It returns True if a font requires fewer than
fixesfixes to support the language.
- is_success¶
Whether the font fully supports the language
This method returns
Trueif the font fully supports the language. Note that fully is a relatively high standard. For practical usage, a score of more than 80% is good enough.
- is_unknown¶
Whether the language supported could not be determined
If the languages database does not contain enough information about a language to determine whether or not a font supports it - for example, if there are no base characters defined - then the support level will be “indeterminate”, and this method will return
True.
- score¶
The score of the font for the language
Returns how supported the language is, as a percentage. Shaperglot is calibrated so that a score of 80% is adequate for everyday use. However, language support can often be improved - for example, by supporting optional auxiliary glyphs, adding small caps support, and so on.
- support_level¶
The support level of the font for the language
- Returns a string describing the support level; one of:
“none”: No support at all; the checker hit a “stop now” condition, usually caused by a missing mandatory base
“complete”: Nothing can be done to improve this font’s language support.
“supported”: There were no FAILs or WARNS, but some optional SKIPs which suggest possible improvements
“incomplete”: The support is incomplete, but usable; ie. there were WARNs, but no FAILs
“unsupported”: The language is not usable; ie. there were FAILs
“indeterminate”: The language support could not be determined, usually due to an incomplete language definition.
- to_summary_string(language)¶
The summary of the font’s support for the language
Returns a summary of the font’s support for the language, in the form of a string suitable for display to the user. e.g.:
"Font fully supports en_Latn (English): 95%"
- unique_fixes()¶
The set of unique fixes which need to be made to add support.
The audience of this method is the designer of the font, not the user of the font. This returns a dictionary of fixes required, where the key is the area of support and the value is the set of fixes required.
- warns¶
Warnings
This returns CheckResult objects for all checks which returned a warning status.
- class shaperglot.CheckResult¶
The result of running a check
Remembering that determining language support is made up of _multiple_ checks which are added together, the result of an individual check could tell us, for example, that all base characters are present, or that some are missing; that some auxiliary characters are missing; that shaping expectations were not met for a particular combination, and so on.
Looking in CheckResults can give us a lower-level indication of what is needed for support to be added for a particular language; for a higher-level overview (“is this language supported or not?”), look at the Reporter object.
- is_success¶
Whether the check was successful
- message¶
The message of the check result
- problems¶
The problems found during the check
These “problems” are aimed towards font designers, to guide them towards adding support for a particular language.
- Returns:
A list of problems found during the check
- Return type:
List[Problem]
- status_code¶
The result of the check
- Returns:
The result of the check - one of “PASS”, “WARN”, “FAIL”, “SKIP” or “STOP”
- Return type:
str
- class shaperglot.Problem¶
A problem found during a check
- check_name¶
The name of the check that found the problem
- code¶
A status code (e.g.
bases-missing)
- context¶
The context of the problem
- Returns:
A dictionary of additional information about the problem
- Return type:
dict
- message¶
A textual description of the problem
- terminal¶
Whether the problem is terminal
Some problems are so bad that there’s no point testing for any more language coverage. (Imagine checking a font for Arabic support which is missing the letter BEH. Once you’ve determined that, there’s not much point checking if it supports correct shaping behaviour.)
Handling languages¶
- class shaperglot.Languages¶
The language database
Instantiating Languages object loads the database and fills it with checks. The database can be used like a Python dictionary, with the language ID as the key. Language IDs are made up of an ISO639-3 language code, an underscore, and a ISO 15927 script code. (e.g. en_Latn for English in the Latin script.)
- disambiguate(lang)¶
Try to find a matching language ID given an ID or name
This will try to find a language ID that matches the given string; it will return a list of candidate language IDs. For example, if you provide “en”, it will return “en_Latn” and “en_Cyrl” if those are in the database. Otherwise, it will look for a matching name - if you provide “english”, it will return “en_Latn”.
- Parameters:
lang (str) – The language ID or name to search for
- Returns:
A list of candidate language IDs
- Return type:
List[str]
- keys()¶
Get a list of all language IDs in the database
- Returns:
A list of all language IDs in the database
- Return type:
List[str]
- class shaperglot.Language¶
A language in the database
For backwards compatibility, this can be used as a dictionary in a very limited way; the following keys are supported:
name: The name of the language
language: The language code
autonym: The autonym of the language (name of the language in the language itself)
- auxiliaries¶
Auxiliary characters
Auxiliary characters are not required but are recommended for support. The most common case for these is for borrowed words which are occasionally used within the language. For example, the letter é is not a required character to support the English language, but the word “café” is used in English and includes the letter é, so is an auxiliary character.
- Returns:
A list of auxiliary characters
- Return type:
List[str]
- bases¶
Base characters needed for support
- Returns:
A list of base characters needed for support
- Return type:
List[str]
- marks¶
Marks needed for support
- Returns:
A list of marks needed for support
- Return type:
List[str]
Low level check information¶
- class shaperglot.Check¶
A check to be executed
This is a high-level check which is looking for a particular piece of behaviour in a font. It may be made up of multiple “implementations” which are the actual code that is run to check for the behaviour. For example, an orthography check will first check bases, then marks, then auxiliary codepoints. The implementations for this check would be “given this list of bases, ensure the font has coverage for all of them”, and so on.
- description¶
A human-readable description of the check
- Returns:
A string describing the check
- implementations¶
An array of human-readable descriptions for what the check does.
- Returns:
An array of strings describing the check