Announcing the Allsorts font parser and shaping engine implemented in Rust

Today YesLogic is open-sourcing the Allsorts font parser, shaping engine, and subsetter for OpenType, WOFF, and WOFF2 under the Apache 2.0 license. Allsorts was extracted from the Prince HTML to PDF typesetting and layout tool and is implemented in Rust.

Prince is a tool that converts HTML and CSS into well laid out PDFs. Our customers use it to produce books, restaurant menus, invoices, business cards, and much more. Two years ago YesLogic set out to improve the font parsing and shaping in Prince.

Font shaping is the process of laying out the glyphs of a font in order to represent some input text. Rasterisation of the glyphs is a separate process. Font shaping for Latin text is quite simple. For some scripts, like those used by Indic languages, it is quite complex and requires reordering and substituting the glyphs in each syllable to produce the final output. There are only three main font shaping engines in use today: DirectWrite on Windows, CoreText on macOS and iOS, and HarfBuzz on open-source operating systems and some web-browsers. Of these, only HarfBuzz is open source.

Unfortunately there is no complete specification of how to perform font shaping for complex scripts, so determining the desired behaviour often comes down to observing what the other font shaping engines do and implementing that. In the hope of improving this situation we have been supporting Nathan Willis in an effort to document the OpenType shaping behaviour.

Prince is mostly written in the Mercury logic programming language but when it came to building the new font parsing and shaping engine we chose Rust. The reasons include a growing community and ecosystem, good interoperability with C (Mercury compiles to C), strong memory safety guarantees, high performance, minimal runtime, and cross-platform support (we build Prince binaries for FreeBSD, Linux, macOS, and Windows).

Rust’s memory safety was especially important as it helps avoid programming errors when parsing binary data (such as fonts) and performing shaping. These types of errors have been a source of security issues in other implementations1, 2, 3, 4.

Allsorts — a reference to typographic sorts, and a licorice confectionery — is the result of this two year effort. It has been extracted from Prince and open sourced under the Apache 2.0 license. The initial release has been published to crates.io. We’ve also published some example tools that use Allsorts to dump font information, shape text, and subset fonts.

The support for Indic scripts in Allsorts was based on Nathan’s new specification. We tested it by comparing Allsorts output to HarfBuzz on a large corpus of real world text in each of the supported Indic languages.

Allsorts supports parsing OpenType (TrueType and CFF variants), WOFF, and WOFF2 fonts. It can also serialise and subset these formats in a manner suitable for inclusion in PDF documents. It supports shaping of many scripts, including Chinese, Japanese, Korean, Indic scripts, and Latin. Indic layout supports Indic2 font layout.

Font parsing in Allsorts uses an approach designed to be amenable to machine generation by our work-in-progress declarative data definition language. The DDL aims to allow formally specifying binary data formats such as fonts. For more information on this work, see Brendan’s talk at the Melbourne Compose Conference earlier this year.

We’re using Allsorts to power all font parsing, shaping of supported scripts, and subsetting in Prince 13. Allsorts is faster than the previous Mercury/C font parsing and shaping implementation used in Prince. We have not yet done a detailed performance comparison to other font shaping engines as we are still focusing on correctness before beginning performance optimisation.

Being a new project Allsorts does have some limitations: the crate is only lightly documented, and it doesn’t support shaping Arabic (Prince still uses its own Mercury implementation for this), Khmer, Tibetan, and Mongolian yet.

We’re excited to publish this release and it’s our hope that Allsorts sees use in many different applications in the future. In the near term we look forward to integrating it more into the ecosystem of font handling in Rust and hope that it becomes a popular choice for pure Rust font handling and shaping.



from Hacker News https://ift.tt/33YhnHj