Changes to Symbol Fonts for Cairo Graphics Devices



The symbol font

When drawing text in R graphics, we can specify the font “family” to use, e.g., a generic family like "sans" or a specific family like "Helvetica", and we can specify the font “face” to use, e.g., plain, bold, or italic. R graphics provides four standard font faces, plain, bold, italic, bold-italic, and one special font face that R calls “symbol”. The following code and output demonstrate the different font faces.

library(grid)
grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), 
          y=5:1/6, gp=gpar(fontface=1:5))

The first four font faces are just variations on the current font family, which by default is a sans-serif font, but the symbol font face is really a separate font altogether.

Historically, the symbol font face has been useful as a way to access greek letters and a set of mathematical symbols. For example, the character ‘m’ in font face 5 is the greek letter ‘mu’.

grid.text("m", gp=gpar(fontface=5))

This feature is less useful than it used to be because, with the advent of Unicode and fonts that cover a very broad range of characters, we can now access special symbols with the standard fonts, as shown below (note the lack of fontface in the code below, but also note that the resulting mu is in a different font to the one above).

grid.text("\u03BC")

However, the symbol font is still useful in R because it is used in the “plotmath” facility for drawing mathematical equations, like the example below.

grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
                           plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))

Selecting an alternative symbol font

On some graphics devices, it is possible to select an alternative symbol font. For example, on the pdf() device, we can use the functions Type1Font() to define a new font family, including a new symbol font. The following code and output shows the default "sans" font definition for the pdf() device (on Linux; note the "Symbol.afm" value in the metrics component of the output).

pdfFonts("sans")
## $sans
## $family
## [1] "Helvetica"
## 
## $metrics
## [1] "Helvetica.afm"            
## [2] "Helvetica-Bold.afm"       
## [3] "Helvetica-Oblique.afm"    
## [4] "Helvetica-BoldOblique.afm"
## [5] "Symbol.afm"               
## 
## $encoding
## [1] "default"
## 
## attr(,"class")
## [1] "Type1Font"

The next code defines a new font that uses the same main font (Helvetica), but selects a Computer Modern (TeX) font for the symbol font.

CMitalic <- Type1Font("ComputerModern2",
                      c("Helvetica.afm", "Helvetica-Bold.afm",   
                        "Helvetica-Oblique.afm", "Helvetica-BoldOblique.afm",
                        "./cairo-symbolfamily-files/cmsyase.afm"))

We can use that new font, with its new symbol font, to produce the same mathematical equation as before, but with a different font used for the symbols.

pdf("cairo-symbolfamily-files/CMitalic.pdf", family=CMitalic, height=1)
grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
                           plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))
dev.off()
embedFonts("cairo-symbolfamily-files/CMitalic.pdf", 
           outfile="cairo-symbolfamily-files/CMitalic-embedded.pdf", 
           fontpaths=file.path(getwd(), "cairo-symbolfamily-files"))

Cairo graphics devices

R has several graphics devices that are based on the Cairo Graphics system, e.g., png(type="cairo") and cairo_pdf(). One of the benefits of these devices is that it is very easy to specify a font for drawing text. All we have to do is give the name of a font and Cairo Graphics does all of the work to map that font name to a font on our system. There is no mucking around setting up a Type 1 font definition like on the pdf() device.

For example, if a font called “Linux Biolinum Keyboard O” is installed on our system, we can simply use that font name when we draw text.

grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), 
          y=5:1/6, 
          gp=gpar(fontface=1:5, 
                  fontfamily="Linux Biolinum Keyboard O"))

However, in the output above, we can see that the symbol font looks exactly like the symbol font in the first example. That is because it is exactly the same symbol font and the problem is, or was, that on Cairo Graphics devices the user is, or was, unable to change that default symbol font.

Fedora 31 to the rescue ?

That inconvenience on Cairo Graphics devices - the inability to select an alternative symbol font - took a much more dramatic turn with the release of (the Linux distribution) Fedora 31.

Fedora 31 updated its Cairo Graphics system so that it no longer supported Type 1 fonts and the effect of that change was deleterius on, for example, plotmath output in R.

(Examples from now on are either on an Ubuntu 16.04 system or a Fedora 31 system; both systems are created using Docker images from the R-Hub project. The Docker images, pmur002/ubuntu-gcc-devel, pmur002/fedora-gcc-devel-problem and pmur002/fedora-gcc-devel-solution are available from DockerHub.)

The following output shows the full set of symbols that R makes use of from the symbol font. This is run on an Ubuntu 16.04 system (an older Linux distribution) and shows the intended result.

##  [1] "Ubuntu 16.04.6 LTS"

The next output shows what this set of symbols looks like on a Fedora 31 system. This is obviously a poorer result.

##  [1] "Fedora 31 (Container Image)"

The essence of the problem is that, on Cairo Graphics devices, the symbol font is hard-coded as the font name “symbol”. On both Linux systems, this results in a Type 1 font (as indicated by the .pfb suffix on the file name in the Ubuntu output below and the .t1 suffix on the file name in the Fedora output).

##  [1] "Ubuntu 16.04.6 LTS"
##  s050000l.pfb: "Standard Symbols L" "Regular"
##  [1] "Fedora 31 (Container Image)"
##  StandardSymbolsPS.t1: "Standard Symbols PS" "Regular"

The lack of support for this Type 1 font on Fedora 31 is evident in the missing symbols all over the plot above.

A new symbolfamily argument on Cairo Graphics devices

The first step in solving the Fedora 31 problem is to allow the user to select an alternative symbol font on Cairo Graphics devices. This means that, in R 4.0.0, the following functions all accept a new symbolfamily argument: x11(), png(), jpeg(), tiff(), bmp(), svg(), cairo_pdf(), and cairo_ps().

As with the family argument to those functions, the symbolfamily argument can be just the name of an installed font and Cairo will take care of the rest. For example, the following code creates a Cairo Graphics png() device with "NimbusSans" as the symbol font and that produces a much better result on Fedora 31.

png(type="cairo", symbolfamily="NimbusSans")
##  [1] "Fedora 31 (Container Image)"

The following output shows that the reason this works better is because the "NimbusSans" font specification resolves to an OpenType (TrueType) font (as indicated by the .otf suffix).

##  [1] "Fedora 31 (Container Image)"
##  NimbusSans-Regular.otf: "Nimbus Sans" "Regular"

A new cairoSymbolFont() function for Cairo Graphics devices

The "NimbusSans" result shown above (for Fedora 31) still has some missing symbols. This reveals another peculiarity of how R generates plotmath output on Cairo Graphics devices.

Internally, plotmath works with a (single-byte) Adobe Symbol Encoding (ASE); each greek character or mathematical symbol corresponds to a number between 0 and 255 (actually, only 32 to 254 are used and there are a number of unused numbers in that range as well). Cairo Graphics devices accept Unicode text in a (multi-byte) UTF-8 encoding, so R has to convert numbers between 32 and 254 into Unicode code points. For example, the number 34 in ASE is the /universal or “for all” symbol, which gets mapped to the code point U+2200.

R uses a conversion table from The Unicode Consortium to perform the conversion, but this includes some conversions to Unicode code points that lie in a range called the Private Use Area (PUA). For example, the number 230 in ASE is the /parenlefttp or “left parenthesis top” symbol, which gets mapped to the code point U+F8EB.

The problem with code points in the PUA is that they are private(!) - they are not universally agreed on - and this means that they are usually not implemented even by fonts that attempt to cover a broad range of Unicode. That is why there are missing symbols in the "NimbusSans" result.

There is a new cairoSymbolFont() function in R 4.0.0 that provides a solution for this problem by allowing users to specify that a symbol font does not make use of the PUA. In that case, the Cairo Graphics device will make use of an alternative mapping from ASE to Unicode that does not make use of the PUA. For example, with the alternative mapping, the number 230 in ASE maps to U+239B (Left Parenthesis Upper Hook).

The following code demonstrates how this function can be used. We again specify that the symbol font is "NimbusSans", but we also specify that the font does not use the PUA. The resulting table of symbols is now complete.

png(type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE))
##  [1] "Fedora 31 (Container Image)"

Additional components in grSoftVersion() output

The last step in resolving the Fedora 31 problem is to make sure that the default symbolfamily setting for Cairo Graphics devices is appropriate for different Linux distributions (and other platforms). For example, for backward compatibility, the default symbolfamily remains "symbol" on Ubuntu 16.04, but the default becomes cairoSymbolFont("sans", usePUA=FALSE) on Fedora 31.

In order to help with setting up these defaults, the value returned by the grSoftVersion() has two new components in R 4.0.0: "cairoFT" and "pango". The latter is either "" if Cairo is not using Pango, or it is the Pango version in use (as a character value). The former is either "yes" if Cairo is using FreeType (plus FontConfig), or "" otherwise.

##  [1] "Ubuntu 16.04.6 LTS"
##                     cairo                  cairoFT                    pango 
##                 "1.14.6"                       ""                 "1.38.1" 
##                   libpng                     jpeg                  libtiff 
##                 "1.2.54"                    "8.0" "LIBTIFF, Version 4.0.6"
##  [1] "Fedora 31 (Container Image)"
##                      cairo                   cairoFT                     pango 
##                  "1.16.0"                        ""                  "1.44.7" 
##                    libpng                      jpeg                   libtiff 
##                  "1.6.37"                     "6.2" "LIBTIFF, Version 4.0.10"

A Pango version of "1.44" or above triggers the change to cairoSymbolFont("sans", usePUA=FALSE).

Alternative symbol fonts

Although the symbol table above is complete, the symbols provided are from the Nimbus Sans font and, consequently, are consistent with that font’s style. The new symbolfamily argument allows us to explore other options. For example, on Fedora, we can choose to use the OpenSymbol font, as shown below.

dnf install libreoffice-opensymbol-fonts
png(type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE))
##  [1] "Fedora 31 (Container Image)"

Windows and macOS

Cairo Graphics devices are also available on Windows and macOS and the symbolfamily argument and the cairoSymbolFont() function are available on those platforms as well, although the default symbolfamily can be different.

A single-byte locale on Windows presents a special case because, instead of converting from ASE to UTF-8, R pretends that the ASE numbers are in a Latin1 encoding and converts from Latin1 to UTF-8. This conversion works for the default "Symbol" font, but does not for most other fonts. In this case, if the symbolfamily is not "Symbol" the Cairo Graphics devices switch back to the normal ASE to UTF-8 conversion (with or without PUA).

Alternative symbols fonts that are known to provide reasonable coverage on those platforms are: "Apply Symbols" on macOS and "Cambria Math" on Window (both with usePUA=FALSE).

R API changes

The Cairo Graphics devices receive UTF-8 text from the graphics engine, but as described above, that text may need further transformation, for example, to avoid the Unicode PUA. Those transformations occur in C code and are provided by functions in the R API so that other graphics devices can make use of them. For example, the ‘Cairo’ package, which has always allowed the user to select a symbol font, from R 4.0.0 will now also offer the option to not use the PUA.

One existing function has been modified:
Rf_AdobeSymbol2utf8(), has an additional Rboolean usePUA argument to control whether the Unicode PUA is used.

Three new functions have been added:
Rf_utf8toAdobeSymbol() converts from UTF-8 to ASE, assuming that the UTF-8 was generated using the PUA.
Rf_utf8Toutf8NoPUA() converts from UTF-8 with PUA to UTF-8 without PUA.
Rf_utf8ToLatin1AdobeSymbol2utf8() converts from UTF-8 that has come from ASE that was treated as Latin1 and then back to UTF-8 (with or without PUA).

Reproducibility

All of the materials required to rebuild this blog are available on github.

Acknowledgements

Thanks to Gavin Simpson for the original bug report, Iñaki Ucar and Nicolas Mailhot for assistance with diagnosing the problem and designing the solution, and Brian Ripley, Simon Urbanek, and Gabriel Becker for assistance with testing the new features.