--- title: "Changes to Symbol Fonts for Cairo Graphics Devices" author: "Paul Murrell" date: 2020-04-17 categories: ["Internals"] tags: ["grid, units"] abstract: In R 4.0.0, Cairo-based graphics devices will allow the user to select a symbol font. That is not as straightforward as it sounds. ---
When drawing text in R graphics, we can specify the font “family” to use,
e.g., a generic family like "sans"
or a specific family like "Helvetica"
,
and we can specify the font “face” to use, e.g., plain, bold, or italic.
R graphics provides four standard font faces,
plain, bold, italic, bold-italic, and one special font face that R
calls “symbol”. The following code and output
demonstrate the different font
faces.
library(grid)
grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"),
y=5:1/6, gp=gpar(fontface=1:5))
The first four font faces are just variations on the current font family, which by default is a sans-serif font, but the symbol font face is really a separate font altogether.
Historically, the symbol font face has been useful as a way to access greek letters and a set of mathematical symbols. For example, the character ‘m’ in font face 5 is the greek letter ‘mu’.
grid.text("m", gp=gpar(fontface=5))
This feature
is less useful than it used to be because, with the advent of Unicode and
fonts that cover a very broad range of characters,
we can now access special symbols with the standard fonts, as shown below
(note the lack of fontface
in the code below, but also note that the
resulting mu is in a different font to the one above).
grid.text("\u03BC")
However, the symbol font is still useful in R because it is used in the “plotmath” facility for drawing mathematical equations, like the example below.
grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))
On some graphics devices, it is possible to select an alternative
symbol font. For example, on the pdf()
device,
we can use the functions Type1Font()
to
define a new font family, including a new symbol font.
The following code and output shows the default "sans"
font definition
for the pdf()
device
(on Linux; note the "Symbol.afm"
value in the metrics
component
of the output).
pdfFonts("sans")
## $sans
## $family
## [1] "Helvetica"
##
## $metrics
## [1] "Helvetica.afm"
## [2] "Helvetica-Bold.afm"
## [3] "Helvetica-Oblique.afm"
## [4] "Helvetica-BoldOblique.afm"
## [5] "Symbol.afm"
##
## $encoding
## [1] "default"
##
## attr(,"class")
## [1] "Type1Font"
The next code defines a new font that uses the same main font (Helvetica), but selects a Computer Modern (TeX) font for the symbol font.
CMitalic <- Type1Font("ComputerModern2",
c("Helvetica.afm", "Helvetica-Bold.afm",
"Helvetica-Oblique.afm", "Helvetica-BoldOblique.afm",
"./cairo-symbolfamily-files/cmsyase.afm"))
We can use that new font, with its new symbol font, to produce the same mathematical equation as before, but with a different font used for the symbols.
pdf("cairo-symbolfamily-files/CMitalic.pdf", family=CMitalic, height=1)
grid.text(expression(paste(frac(1, sigma*sqrt(2*pi)), " ",
plain(e)^{frac(-(x-mu)^2, 2*sigma^2)})))
dev.off()
embedFonts("cairo-symbolfamily-files/CMitalic.pdf",
outfile="cairo-symbolfamily-files/CMitalic-embedded.pdf",
fontpaths=file.path(getwd(), "cairo-symbolfamily-files"))
R has several graphics devices that are based on the Cairo Graphics system,
e.g., png(type="cairo")
and cairo_pdf()
.
One of the benefits of these devices is that it is very easy to specify
a font for drawing text. All we have to do is give the name of a font
and Cairo Graphics does all of the work to map that font name to a font on
our system. There is no mucking around setting up a Type 1 font definition
like on the pdf()
device.
For example, if a font called “Linux Biolinum Keyboard O” is installed on our system, we can simply use that font name when we draw text.
grid.text(c("plain", "bold", "italic", "bold-italic", "symbol"),
y=5:1/6,
gp=gpar(fontface=1:5,
fontfamily="Linux Biolinum Keyboard O"))
However, in the output above, we can see that the symbol font looks exactly like the symbol font in the first example. That is because it is exactly the same symbol font and the problem is, or was, that on Cairo Graphics devices the user is, or was, unable to change that default symbol font.
That inconvenience on Cairo Graphics devices - the inability to select an alternative symbol font - took a much more dramatic turn with the release of (the Linux distribution) Fedora 31.
Fedora 31 updated its Cairo Graphics system so that it no longer supported Type 1 fonts and the effect of that change was deleterius on, for example, plotmath output in R.
(Examples from now on are either on an Ubuntu 16.04 system or a
Fedora 31 system; both systems are created using Docker images from
the R-Hub project.
The Docker images, pmur002/ubuntu-gcc-devel
,
pmur002/fedora-gcc-devel-problem
and pmur002/fedora-gcc-devel-solution
are available from DockerHub.)
The following output shows the full set of symbols that R makes use of from the symbol font. This is run on an Ubuntu 16.04 system (an older Linux distribution) and shows the intended result.
## [1] "Ubuntu 16.04.6 LTS"
The next output shows what this set of symbols looks like on a Fedora 31 system. This is obviously a poorer result.
## [1] "Fedora 31 (Container Image)"
The essence of the problem is that, on Cairo Graphics devices, the
symbol font is hard-coded as the font name “symbol”. On both Linux
systems, this results in a Type 1 font (as indicated by the .pfb
suffix
on the file name in the Ubuntu output below and the .t1
suffix
on the file name in the Fedora output).
## [1] "Ubuntu 16.04.6 LTS"
## s050000l.pfb: "Standard Symbols L" "Regular"
## [1] "Fedora 31 (Container Image)"
## StandardSymbolsPS.t1: "Standard Symbols PS" "Regular"
The lack of support for this Type 1 font on Fedora 31 is evident in the missing symbols all over the plot above.
symbolfamily
argument on Cairo Graphics devicesThe first step in solving the Fedora 31 problem is to allow
the user to select an alternative symbol font on Cairo Graphics devices.
This means that, in R 4.0.0,
the following functions all accept a new symbolfamily
argument: x11()
, png()
, jpeg()
, tiff()
, bmp()
, svg()
,
cairo_pdf()
, and cairo_ps()
.
As with the family
argument to those functions, the symbolfamily
argument can be just the name of an installed font and Cairo will take
care of the rest. For example, the following code creates a Cairo
Graphics png()
device with "NimbusSans"
as the symbol font and that
produces a much better result on Fedora 31.
png(type="cairo", symbolfamily="NimbusSans")
## [1] "Fedora 31 (Container Image)"
The following output shows that the reason this works better is because
the "NimbusSans"
font specification resolves to an OpenType (TrueType) font
(as indicated by the .otf
suffix).
## [1] "Fedora 31 (Container Image)"
## NimbusSans-Regular.otf: "Nimbus Sans" "Regular"
cairoSymbolFont()
function for Cairo Graphics devicesThe "NimbusSans"
result shown above (for Fedora 31)
still has some missing symbols. This reveals another peculiarity of
how R generates plotmath output on Cairo Graphics devices.
Internally, plotmath works with a (single-byte) Adobe Symbol Encoding (ASE);
each greek character or mathematical symbol corresponds to a number
between 0 and 255 (actually, only 32 to 254 are used and there are a number
of unused numbers in that range as well).
Cairo Graphics devices accept Unicode text in a (multi-byte) UTF-8 encoding,
so R has to
convert numbers between 32 and 254 into Unicode code points.
For example, the number 34 in ASE is the /universal
or “for all”
symbol, which gets mapped to the code point U+2200.
R uses a conversion table from The Unicode Consortium
to perform the conversion, but this includes some conversions to
Unicode code points that lie in a range called the Private Use Area (PUA).
For example, the number 230 in ASE is the /parenlefttp
or “left parenthesis
top” symbol, which gets mapped to the code point U+F8EB.
The problem with code points in the PUA is that they are private(!) - they
are not universally agreed on - and this
means that they are usually not implemented even by fonts that attempt to
cover a broad range of Unicode. That is why there are missing
symbols in the "NimbusSans"
result.
There is a new cairoSymbolFont()
function in R 4.0.0
that provides a solution for this problem
by allowing users to specify that a symbol font does not make use of the PUA.
In that case, the Cairo Graphics device will make use of an alternative
mapping from ASE to Unicode that does not make use of the PUA.
For example, with the alternative mapping, the number 230 in ASE maps
to U+239B (Left Parenthesis Upper Hook).
The following code demonstrates how this function can be used.
We again specify that the symbol font is "NimbusSans"
, but we also
specify that the font does not use the PUA. The resulting table of
symbols is now complete.
png(type="cairo", symbolfamily=cairoSymbolFont("NimbusSans", usePUA=FALSE))
## [1] "Fedora 31 (Container Image)"
grSoftVersion()
outputThe last step in resolving the Fedora 31 problem is to make sure that
the default symbolfamily
setting for Cairo Graphics devices is
appropriate for different Linux distributions (and other platforms).
For example, for backward compatibility, the default symbolfamily
remains
"symbol"
on Ubuntu 16.04, but the default becomes
cairoSymbolFont("sans", usePUA=FALSE)
on Fedora 31.
In order to help with setting up these defaults, the value
returned by the grSoftVersion()
has two new components in R 4.0.0:
"cairoFT"
and "pango"
. The latter is either ""
if Cairo is not using
Pango, or it is the Pango version in use (as a character value).
The former is either "yes"
if Cairo is using FreeType (plus FontConfig),
or ""
otherwise.
## [1] "Ubuntu 16.04.6 LTS"
## cairo cairoFT pango
## "1.14.6" "" "1.38.1"
## libpng jpeg libtiff
## "1.2.54" "8.0" "LIBTIFF, Version 4.0.6"
## [1] "Fedora 31 (Container Image)"
## cairo cairoFT pango
## "1.16.0" "" "1.44.7"
## libpng jpeg libtiff
## "1.6.37" "6.2" "LIBTIFF, Version 4.0.10"
A Pango version of "1.44"
or above triggers the change to
cairoSymbolFont("sans", usePUA=FALSE)
.
Although the symbol table above is complete, the symbols provided are
from the Nimbus Sans font and, consequently, are consistent with
that font’s style. The new symbolfamily
argument allows us to explore
other options. For example, on Fedora, we can choose to use the
OpenSymbol font, as shown below.
dnf install libreoffice-opensymbol-fonts
png(type="cairo", symbolfamily=cairoSymbolFont("OpenSymbol", usePUA=FALSE))
## [1] "Fedora 31 (Container Image)"
Cairo Graphics devices are also available on Windows and macOS
and the symbolfamily
argument and the cairoSymbolFont()
function
are available on those platforms as well, although the default
symbolfamily
can be different.
A single-byte locale on Windows presents a special case because,
instead of converting from ASE to UTF-8, R pretends that
the ASE numbers are in a Latin1 encoding and converts from Latin1 to UTF-8.
This conversion works for the default "Symbol"
font, but does not
for most other fonts. In this case, if the symbolfamily
is not
"Symbol"
the Cairo Graphics devices switch back to the normal ASE to UTF-8
conversion (with or without PUA).
Alternative symbols fonts that are known to provide reasonable coverage
on those platforms
are: "Apply Symbols"
on macOS and "Cambria Math"
on Window
(both with usePUA=FALSE
).
The Cairo Graphics devices receive UTF-8 text from the graphics engine, but as described above, that text may need further transformation, for example, to avoid the Unicode PUA. Those transformations occur in C code and are provided by functions in the R API so that other graphics devices can make use of them. For example, the ‘Cairo’ package, which has always allowed the user to select a symbol font, from R 4.0.0 will now also offer the option to not use the PUA.
One existing function has been modified:
Rf_AdobeSymbol2utf8()
, has an additional Rboolean
usePUA
argument to control whether the Unicode PUA is used.
Three new functions have been added:
Rf_utf8toAdobeSymbol()
converts from UTF-8 to ASE, assuming that the
UTF-8 was generated using the PUA.
Rf_utf8Toutf8NoPUA()
converts from UTF-8 with PUA to UTF-8 without PUA.
Rf_utf8ToLatin1AdobeSymbol2utf8()
converts from UTF-8 that has come from
ASE that was treated as Latin1 and then back to UTF-8 (with or without PUA).
All of the materials required to rebuild this blog are available on github.
Thanks to Gavin Simpson for the original bug report, Iñaki Ucar and Nicolas Mailhot for assistance with diagnosing the problem and designing the solution, and Brian Ripley, Simon Urbanek, and Gabriel Becker for assistance with testing the new features.