--- title: "Changes to Symbol Fonts for Cairo Graphics Devices" author: "Paul Murrell" date: 2020-04-17 categories: ["Internals"] tags: ["grid, units"] abstract: In R 4.0.0, Cairo-based graphics devices will allow the user to select a symbol font. That is not as straightforward as it sounds. --- ```{r setup, include=FALSE} options(width=40) ``` ```{r eval=FALSE, include=FALSE, results="hide"} ## Run three containers, one with R-devel on Ubuntu, ## one with R-devel on Fedora BEFORE fix, ## one with R-devel on Fedora AFTER fix ## These are all based on (different versions of) R-Hub docker images ## (e.g., rhub/ubuntu-gcc-devel) system(paste0("docker run -t -d --rm ", "--name R-devel-ubuntu ", "--net=host ", "-v ", getwd(), ":/home/work ", "-w /home/work ", "pmur002/ubuntu-gcc-devel")) system(paste0("docker run -t -d --rm ", "--name R-devel-fedora-problem ", "--net=host ", "-v ", getwd(), ":/home/work ", "-w /home/work ", "pmur002/fedora-gcc-devel-problem")) system(paste0("docker run -t -d --rm ", "--name R-devel-fedora-solution ", "--net=host ", "-v ", getwd(), ":/home/work ", "-w /home/work ", "pmur002/fedora-gcc-devel-solution")) ``` ## The symbol font When drawing text in R graphics, we can specify the font "family" to use, e.g., a generic family like `"sans"` or a specific family like `"Helvetica"`, and we can specify the font "face" to use, e.g., plain, **bold**, or *italic*. R graphics provides four standard font faces, plain, bold, italic, bold-italic, and one special font face that R calls "symbol". The following code and output demonstrate the different font faces. ```{r fig.height=2} library(grid) grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), y=5:1/6, gp=gpar(fontface=1:5)) ``` The first four font faces are just variations on the current font family, which by default is a sans-serif font, but the symbol font face is really a separate font altogether. Historically, the symbol font face has been useful as a way to access greek letters and a set of mathematical symbols. For example, the character 'm' in font face 5 is the greek letter 'mu'. ```{r fig.height=.5} grid.text("m", gp=gpar(fontface=5)) ``` This feature is less useful than it used to be because, with the advent of Unicode and fonts that cover a very broad range of characters, we can now access special symbols with the standard fonts, as shown below (note the lack of `fontface` in the code below, but also note that the resulting mu is in a different font to the one above). ```{r fig.height=.5} grid.text("\u03BC") ``` However, the symbol font is still useful in R because it is used in the "plotmath" facility for drawing mathematical equations, like the example below. ```{r fig.height=1} grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ", plain(e)^{frac(-(x-mu)^2, 2*sigma^2)}))) ``` ## Selecting an alternative symbol font On some graphics devices, it is possible to select an alternative symbol font. For example, on the `pdf()` device, we can use the functions `Type1Font()` to define a new font family, including a new symbol font. The following code and output shows the default `"sans"` font definition for the `pdf()` device (on Linux; note the `"Symbol.afm"` value in the `metrics` component of the output). ```{r} pdfFonts("sans") ``` The next code defines a new font that uses the same main font (Helvetica), but selects a Computer Modern (TeX) font for the symbol font. ```{r eval=FALSE} CMitalic <- Type1Font("ComputerModern2", c("Helvetica.afm", "Helvetica-Bold.afm", "Helvetica-Oblique.afm", "Helvetica-BoldOblique.afm", "./cairo-symbolfamily-files/cmsyase.afm")) ``` We can use that new font, with its new symbol font, to produce the same mathematical equation as before, but with a different font used for the symbols. ```{r eval=FALSE, fig.height=1, results="hide"} pdf("cairo-symbolfamily-files/CMitalic.pdf", family=CMitalic, height=1) grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ", plain(e)^{frac(-(x-mu)^2, 2*sigma^2)}))) dev.off() embedFonts("cairo-symbolfamily-files/CMitalic.pdf", outfile="cairo-symbolfamily-files/CMitalic-embedded.pdf", fontpaths=file.path(getwd(), "cairo-symbolfamily-files")) ``` ```{r eval=FALSE, echo=FALSE, results="hide"} system("convert -density 192 cairo-symbolfamily-files/CMitalic-embedded.pdf cairo-symbolfamily-files/CMitalic.png") ``` ![](/Blog/public/post/cairo-symbolfamily-files/CMitalic.png){ width=672px } ## Cairo graphics devices R has several graphics devices that are based on the Cairo Graphics system, e.g., `png(type="cairo")` and `cairo_pdf()`. One of the benefits of these devices is that it is very easy to specify a font for drawing text. All we have to do is give the name of a font and Cairo Graphics does all of the work to map that font name to a font on our system. There is no mucking around setting up a Type 1 font definition like on the `pdf()` device. For example, if a font called "Linux Biolinum Keyboard O" is installed on our system, we can simply use that font name when we draw text. ```{r eval=FALSE, fig.height=2} grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), y=5:1/6, gp=gpar(fontface=1:5, fontfamily="Linux Biolinum Keyboard O")) ``` ![](/Blog/public/post/cairo-symbolfamily-files/biolinum-keyboard.png){ width=672px } However, in the output above, we can see that the symbol font looks exactly like the symbol font in the first example. That is because it is exactly the same symbol font and the problem is, or was, that on Cairo Graphics devices the user is, or was, unable to change that default symbol font. ## Fedora 31 to the rescue ? That inconvenience on Cairo Graphics devices - the inability to select an alternative symbol font - took a much more dramatic turn with the release of (the Linux distribution) Fedora 31. Fedora 31 updated its Cairo Graphics system so that it no longer supported Type 1 fonts and the effect of that change was deleterius on, for example, plotmath output in R. *(Examples from now on are either on an Ubuntu 16.04 system or a Fedora 31 system; both systems are created using Docker images from the [R-Hub](https://github.com/r-hub/rhub-linux-builders) project. The Docker images, `pmur002/ubuntu-gcc-devel`, `pmur002/fedora-gcc-devel-problem` and `pmur002/fedora-gcc-devel-solution` are available from DockerHub.)* The following output shows the full set of symbols that R makes use of from the symbol font. This is run on an Ubuntu 16.04 system (an older Linux distribution) and shows the intended result. ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/ubuntu-osVersion ``` ```{r echo=FALSE} cat(' [1] "Ubuntu 16.04.6 LTS" ') ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/ubuntu-test-chars.png"); source("cairo-symbolfamily-files/testChars.R"); TestChars()' ``` ![](/Blog/public/post/cairo-symbolfamily-files/ubuntu-test-chars.png) The next output shows what this set of symbols looks like on a Fedora 31 system. This is obviously a poorer result. ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-problem /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-problem-osVersion ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-fedora-problem /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-test-chars.png"); source("cairo-symbolfamily-files/testChars.R"); TestChars()' ``` ![](/Blog/public/post/cairo-symbolfamily-files/fedora-test-chars.png) The essence of the problem is that, on Cairo Graphics devices, the symbol font is hard-coded as the font name "symbol". On both Linux systems, this results in a Type 1 font (as indicated by the `.pfb` suffix on the file name in the Ubuntu output below and the `.t1` suffix on the file name in the Fedora output). ```{r echo=FALSE} cat(' [1] "Ubuntu 16.04.6 LTS" ') ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-ubuntu fc-match symbol > cairo-symbolfamily-files/ubuntu-symbol ``` ```{r echo=FALSE} cat(' s050000l.pfb: "Standard Symbols L" "Regular" ') ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-problem fc-match symbol > cairo-symbolfamily-files/fedora-symbol ``` ```{r echo=FALSE} cat(' StandardSymbolsPS.t1: "Standard Symbols PS" "Regular" ') ``` The lack of support for this Type 1 font on Fedora 31 is evident in the missing symbols all over the plot above. ## A new `symbolfamily` argument on Cairo Graphics devices The first step in solving the Fedora 31 problem is to allow the user to select an alternative symbol font on Cairo Graphics devices. This means that, in R 4.0.0, the following functions all accept a new `symbolfamily` argument: `x11()`, `png()`, `jpeg()`, `tiff()`, `bmp()`, `svg()`, `cairo_pdf()`, and `cairo_ps()`. As with the `family` argument to those functions, the `symbolfamily` argument can be just the name of an installed font and Cairo will take care of the rest. For example, the following code creates a Cairo Graphics `png()` device with `"NimbusSans"` as the symbol font and that produces a much better result on Fedora 31. ```{r eval=FALSE} png(type="cairo", symbolfamily="NimbusSans") ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-solution-osVersion ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-nimbus.png", type="cairo", symbolfamily="NimbusSans"); source("cairo-symbolfamily-files/testChars.R"); TestChars()' ``` ![](/Blog/public/post/cairo-symbolfamily-files/fedora-nimbus.png) The following output shows that the reason this works better is because the `"NimbusSans"` font specification resolves to an OpenType (TrueType) font (as indicated by the `.otf` suffix). ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-problem fc-match NimbusSans > cairo-symbolfamily-files/fedora-nimbus ``` ```{r echo=FALSE} cat(' NimbusSans-Regular.otf: "Nimbus Sans" "Regular" ') ``` ## A new `cairoSymbolFont()` function for Cairo Graphics devices The `"NimbusSans"` result shown above (for Fedora 31) still has some missing symbols. This reveals another peculiarity of how R generates plotmath output on Cairo Graphics devices. Internally, plotmath works with a (single-byte) Adobe Symbol Encoding (ASE); each greek character or mathematical symbol corresponds to a number between 0 and 255 (actually, only 32 to 254 are used and there are a number of unused numbers in that range as well). Cairo Graphics devices accept Unicode text in a (multi-byte) UTF-8 encoding, so R has to convert numbers between 32 and 254 into Unicode code points. For example, the number 34 in ASE is the `/universal` or "for all" symbol, which gets mapped to the code point U+2200. R uses [a conversion table from The Unicode Consortium](http://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt) to perform the conversion, but this includes some conversions to Unicode code points that lie in a range called the Private Use Area (PUA). For example, the number 230 in ASE is the `/parenlefttp` or "left parenthesis top" symbol, which gets mapped to the code point U+F8EB. The problem with code points in the PUA is that they are private(!) - they are not universally agreed on - and this means that they are usually not implemented even by fonts that attempt to cover a broad range of Unicode. That is why there are missing symbols in the `"NimbusSans"` result. There is a new `cairoSymbolFont()` function in R 4.0.0 that provides a solution for this problem by allowing users to specify that a symbol font does not make use of the PUA. In that case, the Cairo Graphics device will make use of an alternative mapping from ASE to Unicode that does not make use of the PUA. For example, with the alternative mapping, the number 230 in ASE maps to U+239B (Left Parenthesis Upper Hook). The following code demonstrates how this function can be used. We again specify that the symbol font is `"NimbusSans"`, but we also specify that the font does not use the PUA. The resulting table of symbols is now complete. ```{r eval=FALSE} png(type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE)) ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-nimbus-noPUA.png", type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE)); source("cairo-symbolfamily-files/testChars.R"); TestChars()' ``` ![](/Blog/public/post/cairo-symbolfamily-files/fedora-nimbus-noPUA.png) ## Additional components in `grSoftVersion()` output The last step in resolving the Fedora 31 problem is to make sure that the default `symbolfamily` setting for Cairo Graphics devices is appropriate for different Linux distributions (and other platforms). For example, for backward compatibility, the default `symbolfamily` remains `"symbol"` on Ubuntu 16.04, but the default becomes `cairoSymbolFont("sans", usePUA=FALSE)` on Fedora 31. In order to help with setting up these defaults, the value returned by the `grSoftVersion()` has two new components in R 4.0.0: `"cairoFT"` and `"pango"`. The latter is either `""` if Cairo is not using Pango, or it is the Pango version in use (as a character value). The former is either `"yes"` if Cairo is using FreeType (plus FontConfig), or `""` otherwise. ```{r echo=FALSE} cat(' [1] "Ubuntu 16.04.6 LTS" ') ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'grSoftVersion()' > cairo-symbolfamily-files/ubuntu-grSoftVersion ``` ```{r echo=FALSE} cat(' cairo cairoFT pango "1.14.6" "" "1.38.1" libpng jpeg libtiff "1.2.54" "8.0" "LIBTIFF, Version 4.0.6" ') ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'grSoftVersion()' > cairo-symbolfamily-files/fedora-grSoftVersion ``` ```{r echo=FALSE} cat(' cairo cairoFT pango "1.16.0" "" "1.44.7" libpng jpeg libtiff "1.6.37" "6.2" "LIBTIFF, Version 4.0.10" ') ``` A Pango version of `"1.44"` or above triggers the change to `cairoSymbolFont("sans", usePUA=FALSE)`. ## Alternative symbol fonts Although the symbol table above is complete, the symbols provided are from the Nimbus Sans font and, consequently, are consistent with that font's style. The new `symbolfamily` argument allows us to explore other options. For example, on Fedora, we can choose to use the OpenSymbol font, as shown below. ```{bash eval=FALSE} dnf install libreoffice-opensymbol-fonts ``` ```{r eval=FALSE} png(type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE)) ``` ```{bash eval=FALSE, echo=FALSE} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-solution-osVersion ``` ```{r echo=FALSE} cat(' [1] "Fedora 31 (Container Image)" ') ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-fedora-solution dnf -y install libreoffice-opensymbol-fonts ``` ```{bash eval=FALSE, echo=FALSE, results="hide"} docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-opensymbol.png", type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE)); source("cairo-symbolfamily-files/testChars.R"); TestChars()' ``` ![](/Blog/public/post/cairo-symbolfamily-files/fedora-opensymbol.png) ## Windows and macOS Cairo Graphics devices are also available on Windows and macOS and the `symbolfamily` argument and the `cairoSymbolFont()` function are available on those platforms as well, although the default `symbolfamily` can be different. A single-byte locale on Windows presents a special case because, instead of converting from ASE to UTF-8, R pretends that the ASE numbers are in a Latin1 encoding and converts from Latin1 to UTF-8. This conversion works for the default `"Symbol"` font, but does not for most other fonts. In this case, if the `symbolfamily` is not `"Symbol"` the Cairo Graphics devices switch back to the normal ASE to UTF-8 conversion (with or without PUA). Alternative symbols fonts that are known to provide reasonable coverage on those platforms are: `"Apply Symbols"` on macOS and `"Cambria Math"` on Window (both with `usePUA=FALSE`). ## R API changes The Cairo Graphics devices receive UTF-8 text from the graphics engine, but as described above, that text may need further transformation, for example, to avoid the Unicode PUA. Those transformations occur in C code and are provided by functions in the R API so that other graphics devices can make use of them. For example, the 'Cairo' package, which has always allowed the user to select a symbol font, from R 4.0.0 will now also offer the option to not use the PUA. One existing function has been modified:\ `Rf_AdobeSymbol2utf8()`, has an additional Rboolean `usePUA` argument to control whether the Unicode PUA is used. Three new functions have been added:\ `Rf_utf8toAdobeSymbol()` converts from UTF-8 to ASE, assuming that the UTF-8 was generated using the PUA.\ `Rf_utf8Toutf8NoPUA()` converts from UTF-8 with PUA to UTF-8 without PUA.\ `Rf_utf8ToLatin1AdobeSymbol2utf8()` converts from UTF-8 that has come from ASE that was treated as Latin1 and then back to UTF-8 (with or without PUA). ## Reproducibility All of the materials required to rebuild this blog are available on github. ## Acknowledgements Thanks to Gavin Simpson for the original bug report, IƱaki Ucar and Nicolas Mailhot for assistance with diagnosing the problem and designing the solution, and Brian Ripley, Simon Urbanek, and Gabriel Becker for assistance with testing the new features. ```{bash eval=FALSE, include=FALSE} ## Eval this to clean up containers ## Shut down two containers docker kill R-devel-ubuntu docker kill R-devel-fedora-problem docker kill R-devel-fedora-solution ```