---
title: "Changes to Symbol Fonts for Cairo Graphics Devices"
author: "Paul Murrell"
date: 2020-04-17
categories: ["Internals"]
tags: ["grid, units"]
abstract: In R 4.0.0, Cairo-based graphics devices will allow the user to select
  a symbol font.  That is not as straightforward as it sounds.
---

```{r setup, include=FALSE}
options(width=40)
```
```{r eval=FALSE, include=FALSE, results="hide"}
## Run three containers, one with R-devel on Ubuntu, 
## one with R-devel on Fedora BEFORE fix,
## one with R-devel on Fedora AFTER fix
## These are all based on (different versions of) R-Hub docker images
## (e.g., rhub/ubuntu-gcc-devel)
system(paste0("docker run -t -d --rm ",
              "--name R-devel-ubuntu ",
              "--net=host ",
              "-v ", getwd(), ":/home/work ",
              "-w /home/work ",
              "pmur002/ubuntu-gcc-devel"))
system(paste0("docker run -t -d --rm ",
              "--name R-devel-fedora-problem ",
              "--net=host ",
              "-v ", getwd(), ":/home/work ",
              "-w /home/work ",
              "pmur002/fedora-gcc-devel-problem"))
system(paste0("docker run -t -d --rm ",
              "--name R-devel-fedora-solution ",
              "--net=host ",
              "-v ", getwd(), ":/home/work ",
              "-w /home/work ",
              "pmur002/fedora-gcc-devel-solution"))
```

## The symbol font

When drawing text in R graphics, we can specify the font "family" to use,
e.g., a generic family like `"sans"` or a specific family like `"Helvetica"`,
and we can specify the font "face" to use, e.g., plain, **bold**, or *italic*.
R graphics provides four standard font faces,
plain, bold, italic, bold-italic, and one special font face that R 
calls "symbol".  The following code and output 
demonstrate the different font
faces.

```{r fig.height=2}
library(grid)
grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), 
          y=5:1/6, gp=gpar(fontface=1:5))
```

The first four font faces are just variations on the current font family,
which by default is a sans-serif font, but the symbol font face is really
a separate font altogether.  

Historically, the symbol font face has been useful as a way to access
greek letters and a set of mathematical symbols.  For example, the 
character 'm' in font face 5 is the greek letter 'mu'.

```{r fig.height=.5}
grid.text("m", gp=gpar(fontface=5))
```

This feature
 is less useful than it used to be because, with the advent of Unicode and
fonts that cover a very broad range of characters, 
we can now access special symbols with the standard fonts, as shown below
(note the lack of `fontface` in the code below, but also note that the
resulting mu is in a different font to the one above).

```{r fig.height=.5}
grid.text("\u03BC")
```

However, the symbol font is still useful in R because it is used 
in the "plotmath" facility for drawing mathematical
equations, like the example below.

```{r fig.height=1}
grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
                           plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))
```

## Selecting an alternative symbol font

On some graphics devices, it is possible to select an alternative 
symbol font.  For example, on the `pdf()` device,
we can use the functions `Type1Font()` to
define a new font family, including a new symbol font.
The following code and output shows the default `"sans"` font definition
for the `pdf()` device
(on Linux; note the `"Symbol.afm"` value in the `metrics` component 
of the output).

```{r}
pdfFonts("sans")
```

The next code defines a new font that uses the same main font (Helvetica),
but selects a Computer Modern (TeX) font
for the symbol font.

```{r eval=FALSE}
CMitalic <- Type1Font("ComputerModern2",
                      c("Helvetica.afm", "Helvetica-Bold.afm",   
                        "Helvetica-Oblique.afm", "Helvetica-BoldOblique.afm",
                        "./cairo-symbolfamily-files/cmsyase.afm"))
```

We can use that new font, with its new symbol font, to produce the
same mathematical equation as before, but with a different font used
for the symbols.

```{r eval=FALSE, fig.height=1, results="hide"}
pdf("cairo-symbolfamily-files/CMitalic.pdf", family=CMitalic, height=1)
grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
                           plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))
dev.off()
embedFonts("cairo-symbolfamily-files/CMitalic.pdf", 
           outfile="cairo-symbolfamily-files/CMitalic-embedded.pdf", 
           fontpaths=file.path(getwd(), "cairo-symbolfamily-files"))
```

```{r eval=FALSE, echo=FALSE, results="hide"}
system("convert -density 192 cairo-symbolfamily-files/CMitalic-embedded.pdf cairo-symbolfamily-files/CMitalic.png")
```

![](/Blog/public/post/cairo-symbolfamily-files/CMitalic.png){ width=672px }

## Cairo graphics devices

R has several graphics devices that are based on the Cairo Graphics system,
e.g., `png(type="cairo")` and `cairo_pdf()`.
One of the benefits of these devices is that it is very easy to specify
a font for drawing text.  All we have to do is give the name of a font
and Cairo Graphics does all of the work to map that font name to a font on 
our system.  There is no mucking around setting up a Type 1 font definition
like on the `pdf()` device.

For example, if a font called "Linux Biolinum Keyboard O" is installed on our system,
we can simply use that font name when we draw text.

```{r eval=FALSE, fig.height=2}
grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"), 
          y=5:1/6, 
          gp=gpar(fontface=1:5, 
                  fontfamily="Linux Biolinum Keyboard O"))
```

![](/Blog/public/post/cairo-symbolfamily-files/biolinum-keyboard.png){ width=672px }

However, in the output above, we can see that the symbol font looks 
exactly like the symbol font in the first example.  That is because
it is exactly the same symbol font and the problem is, or was, that
on Cairo Graphics devices the user is, or was, unable to change that
default symbol font.

## Fedora 31 to the rescue ?

That inconvenience on Cairo Graphics devices - the inability to select
an alternative symbol font - took a much more dramatic turn with the release of
(the Linux distribution) Fedora 31.

Fedora 31 updated its Cairo Graphics system so that it no longer 
supported Type 1 fonts and the effect of that change was deleterius on, 
for example, plotmath output in R.

*(Examples from now on are either on an Ubuntu 16.04 system or a 
Fedora 31 system;  both systems are created using Docker images from
the [R-Hub](https://github.com/r-hub/rhub-linux-builders) project.
The Docker images, `pmur002/ubuntu-gcc-devel`, 
`pmur002/fedora-gcc-devel-problem` and `pmur002/fedora-gcc-devel-solution`
are available from DockerHub.)*

The following output shows the full set of symbols that R makes use
of from the symbol font.  This is run on an Ubuntu 16.04 system (an older Linux
distribution) 
and shows the intended result.

```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/ubuntu-osVersion
```
```{r echo=FALSE}
cat(' [1] "Ubuntu 16.04.6 LTS" ')
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/ubuntu-test-chars.png"); source("cairo-symbolfamily-files/testChars.R"); TestChars()'
```

![](/Blog/public/post/cairo-symbolfamily-files/ubuntu-test-chars.png)

The next output shows what this set of symbols looks like on a Fedora 31 system.
This is obviously a poorer result.

```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-problem /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-problem-osVersion
```
```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-fedora-problem /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-test-chars.png"); source("cairo-symbolfamily-files/testChars.R"); TestChars()'
```

![](/Blog/public/post/cairo-symbolfamily-files/fedora-test-chars.png)

The essence of the problem is that, on Cairo Graphics devices, the 
symbol font is hard-coded as the font name "symbol".  On both Linux
systems, this results in a Type 1 font (as indicated by the `.pfb` suffix
on the file name in the Ubuntu output below and the `.t1` suffix 
on the file name in the Fedora output).

```{r echo=FALSE}
cat(' [1] "Ubuntu 16.04.6 LTS" ')
```
```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-ubuntu fc-match symbol > cairo-symbolfamily-files/ubuntu-symbol
```
```{r echo=FALSE}
cat(' s050000l.pfb: "Standard Symbols L" "Regular" ')
```

```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-problem fc-match symbol > cairo-symbolfamily-files/fedora-symbol
```
```{r echo=FALSE}
cat(' StandardSymbolsPS.t1: "Standard Symbols PS" "Regular" ')
```

The lack of support for this Type 1 font on Fedora 31 is evident in the 
missing symbols all over the plot above.

## A new `symbolfamily` argument on Cairo Graphics devices

The first step in solving the Fedora 31 problem is to allow 
the user to select an alternative symbol font on Cairo Graphics devices.
This means that, in R 4.0.0,
 the following functions all accept a new `symbolfamily`
argument:  `x11()`, `png()`, `jpeg()`, `tiff()`, `bmp()`, `svg()`,
`cairo_pdf()`, and `cairo_ps()`.

As with the `family` argument to those functions, the `symbolfamily`
argument can be just the name of an installed font and Cairo will take
care of the rest.  For example, the following code creates a Cairo
Graphics `png()` device with `"NimbusSans"` as the symbol font and that
produces a much better result on Fedora 31.

```{r eval=FALSE}
png(type="cairo", symbolfamily="NimbusSans")
```

```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-solution-osVersion
```
```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-nimbus.png", type="cairo", symbolfamily="NimbusSans"); source("cairo-symbolfamily-files/testChars.R"); TestChars()'
```

![](/Blog/public/post/cairo-symbolfamily-files/fedora-nimbus.png)

The following output shows that the reason this works better is because 
the `"NimbusSans"` font specification resolves to an OpenType (TrueType) font
(as indicated by the `.otf` suffix).

```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-problem fc-match NimbusSans > cairo-symbolfamily-files/fedora-nimbus
```
```{r echo=FALSE}
cat(' NimbusSans-Regular.otf: "Nimbus Sans" "Regular" ')
```

## A new `cairoSymbolFont()` function for Cairo Graphics devices

The `"NimbusSans"` result shown above (for Fedora 31) 
still has some missing symbols.  This reveals another peculiarity of
how R generates plotmath output on Cairo Graphics devices.  

Internally, plotmath works with a (single-byte) Adobe Symbol Encoding (ASE);
each greek character or mathematical symbol corresponds to a number 
between 0 and 255 (actually, only 32 to 254 are used and there are a number
of unused numbers in that range as well).
Cairo Graphics devices accept Unicode text in a (multi-byte) UTF-8 encoding, 
so R has to
convert numbers between 32 and 254 into Unicode code points.
For example, the number 34 in ASE is the `/universal` or "for all"
symbol, which gets mapped to the code point U+2200.

R uses [a conversion table from The Unicode Consortium](http://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt)
to perform the conversion, but this includes some conversions to 
Unicode code points that lie in a range called the Private Use Area (PUA).
For example, the number 230 in ASE is the `/parenlefttp` or "left parenthesis
top" symbol, which gets mapped to the code point U+F8EB.

The problem with code points in the PUA is that they are private(!) - they
are not universally agreed on - and this
means that they are usually not implemented even by fonts that attempt to
cover a broad range of Unicode.  That is why there are missing
symbols in the `"NimbusSans"` result.

There is a new `cairoSymbolFont()` function in R 4.0.0
that provides a solution for this problem
by allowing users to specify that a symbol font does not make use of the PUA.
In that case, the Cairo Graphics device will make use of an alternative 
mapping from ASE to Unicode that does not make use of the PUA.
For example, with the alternative mapping, the number 230 in ASE maps
to U+239B (Left Parenthesis Upper Hook).

The following code demonstrates how this function can be used.
We again specify that the symbol font is `"NimbusSans"`, but we also
specify that the font does not use the PUA.  The resulting table of 
symbols is now complete.

```{r eval=FALSE}
png(type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE))
```

```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-nimbus-noPUA.png", type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE)); source("cairo-symbolfamily-files/testChars.R"); TestChars()'
```

![](/Blog/public/post/cairo-symbolfamily-files/fedora-nimbus-noPUA.png)


## Additional components in `grSoftVersion()` output

The last step in resolving the Fedora 31 problem is to make sure that
the default `symbolfamily` setting for Cairo Graphics devices is
appropriate for different Linux distributions (and other platforms).
For example, for backward compatibility, the default `symbolfamily` remains
`"symbol"` on Ubuntu 16.04, but the default becomes
`cairoSymbolFont("sans", usePUA=FALSE)` on Fedora 31.

In order to help with setting up these defaults, the value 
returned by the `grSoftVersion()` has two new components in R 4.0.0:
`"cairoFT"` and `"pango"`.  The latter is either `""` if Cairo is not using
Pango, or it is the Pango version in use (as a character value).
The former is either `"yes"` if Cairo is using FreeType (plus FontConfig),
or `""` otherwise. 

```{r echo=FALSE}
cat(' [1] "Ubuntu 16.04.6 LTS" ')
```
```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-ubuntu /opt/R-devel/bin/Rscript -e 'grSoftVersion()' > cairo-symbolfamily-files/ubuntu-grSoftVersion
```
```{r echo=FALSE}
cat('                    cairo                  cairoFT                    pango 
                "1.14.6"                       ""                 "1.38.1" 
                  libpng                     jpeg                  libtiff 
                "1.2.54"                    "8.0" "LIBTIFF, Version 4.0.6"  ')
```

```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'grSoftVersion()' > cairo-symbolfamily-files/fedora-grSoftVersion
```
```{r echo=FALSE}
cat('                     cairo                   cairoFT                     pango 
                 "1.16.0"                        ""                  "1.44.7" 
                   libpng                      jpeg                   libtiff 
                 "1.6.37"                     "6.2" "LIBTIFF, Version 4.0.10"  ')
```

A Pango version of `"1.44"` or above triggers the change to 
`cairoSymbolFont("sans", usePUA=FALSE)`.

## Alternative symbol fonts

Although the symbol table above is complete, the symbols provided are
from the Nimbus Sans font and, consequently, are consistent with
that font's style.  The new `symbolfamily` argument allows us to explore
other options.  For example, on Fedora, we can choose to use the
OpenSymbol font, as shown below.

```{bash eval=FALSE}
dnf install libreoffice-opensymbol-fonts
```
```{r eval=FALSE}
png(type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE))
```

```{bash eval=FALSE, echo=FALSE}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'osVersion' > cairo-symbolfamily-files/fedora-solution-osVersion
```
```{r echo=FALSE}
cat(' [1] "Fedora 31 (Container Image)" ')
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-fedora-solution dnf -y install libreoffice-opensymbol-fonts
```
```{bash eval=FALSE, echo=FALSE, results="hide"}
docker exec R-devel-fedora-solution /opt/R-devel/bin/Rscript -e 'png("cairo-symbolfamily-files/fedora-opensymbol.png", type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE)); source("cairo-symbolfamily-files/testChars.R"); TestChars()'
```

![](/Blog/public/post/cairo-symbolfamily-files/fedora-opensymbol.png)

## Windows and macOS

Cairo Graphics devices are also available on Windows and macOS
and the `symbolfamily` argument and the `cairoSymbolFont()` function
are available on those platforms as well, although the default
`symbolfamily` can be different.

A single-byte locale on Windows presents a special case because,
instead of converting from ASE to UTF-8, R pretends that
the ASE numbers are in a Latin1 encoding and converts from Latin1 to UTF-8.
This conversion works for the default `"Symbol"` font, but does not 
for most other fonts.  In this case, if the `symbolfamily` is not 
`"Symbol"` the Cairo Graphics devices switch back to the normal ASE to UTF-8 
conversion (with or without PUA).

Alternative symbols fonts that are known to provide reasonable coverage
 on those platforms
are:  `"Apply Symbols"` on macOS and `"Cambria Math"` on Window
(both with `usePUA=FALSE`).

## R API changes

The Cairo Graphics devices receive UTF-8 text from the graphics engine,
but as described above, that text may need further transformation, for example,
to avoid the Unicode PUA.  Those transformations occur in C code and
are provided by functions in the R API so that other graphics devices
can make use of them.  For example, the 'Cairo' package,
which has always allowed the user to select a symbol font,
from R 4.0.0 will now also offer the option to not use the PUA.

One existing function has been modified:\
`Rf_AdobeSymbol2utf8()`, has an additional Rboolean
`usePUA` argument to control whether the Unicode PUA is used.

Three new functions have been added:\
`Rf_utf8toAdobeSymbol()` converts from UTF-8 to ASE, assuming that the 
UTF-8 was generated using the PUA.\
`Rf_utf8Toutf8NoPUA()` converts from UTF-8 with PUA to UTF-8 without PUA.\
`Rf_utf8ToLatin1AdobeSymbol2utf8()` converts from UTF-8 that has come from
ASE that was treated as Latin1 and then back to UTF-8 (with or without PUA).

## Reproducibility

All of the materials required to rebuild this blog are available on
<a href="https://github.com/pmur002/cairo-symbolfamily-blog">github</a>.

## Acknowledgements

Thanks to Gavin Simpson for the original bug report,
IƱaki Ucar and Nicolas Mailhot for assistance with diagnosing
the problem and designing the solution, and
Brian Ripley, Simon Urbanek, and Gabriel Becker for assistance
with testing the new features.

```{bash eval=FALSE, include=FALSE}
## Eval this to clean up containers
## Shut down two containers
docker kill R-devel-ubuntu
docker kill R-devel-fedora-problem
docker kill R-devel-fedora-solution
```