--- title: "Upcoming Changes in R 4.2 on Windows" author: "Tomas Kalibera, Uwe Ligges, Kurt Hornik, Simon Urbanek, Deepayan Sarkar, Luke Tierney, Martin Maechler" date: 2021-12-07 categories: ["User-visible Behavior", "Windows"] tags: ["UTF-8", "UCRT", "encodings", "CRAN"] ---

R 4.2 for Windows will support UTF-8 as native encoding, which will be a major improvement in encoding support, allowing Windows R users to work with international text and data.

This new feature will require at least Windows 10 (version 1903) on desktop systems, Windows Server 2022 on long-term support server systems or Windows Server 1903 from the semi-annual channel. Older Windows systems will be able to run R, but with the same limitations in the encoding support as in R 4.1 and earlier.

As part of this change, R will require UCRT as the new C runtime for Windows. This means that on desktop systems older than Windows 10 and on server systems older than Windows Server 2016, UCRT will have to be installed before installing R. MSVCRT, the older C runtime, will no longer be supported. R 4.2 will also drop support for 32-bit builds on Windows.

A new compiler toolchain, Rtools42, will be used for building Windows binaries of R and R packages from source. All code will have to be rebuilt with the new toolchain.

Nothing will change for R 4.1.x, not even for the upcoming minor revisions. They will still use the current encoding support and be built using Rtools4 in 32-bit and 64-bit versions for MSVCRT.

Nothing will change for end R-users before R 4.2, they don’t have to do anything special now. Only users of R-devel on Windows will be affected before the release.

The change will, however, require some cooperation from some package authors. Most authors will not have to do anything as the number of CRAN packages that will need some attention is below 1%, but authors of packages using native (C, C++ or Fortran) code should read the following lines. Some of them will have to update their packages, but in most cases they may use patches created by Tomas Kalibera and/or receive more advice from Tomas Kalibera or Uwe Ligges. We thank the package authors who already have been working with us on the change for their cooperation.

Current state

So far, R-devel snapshot binary builds and binary builds of R packages on CRAN have been built using Rtools4 (GCC 8) and used MSVCRT as the C runtime. Thanks to Jeroen Ooms for putting together and maintaining Rtools4 and the binary builds of R. MSVCRT does not allow using UTF-8 as native encoding.

There is a separate setup (“ucrt3”) created and maintained by Tomas Kalibera with a new toolchain Rtools42 (GCC 10), with patched R-devel snapshot binary builds, patched CRAN package binary builds, patched Bioconductor package binary builds (only those needed by CRAN), and a compatible build of JAGS and Tcl/Tk bundle. Automated R-devel binary builds and CRAN package checks have been provided since March 2021, with results linked from CRAN. More information is provided in Howto: UTF-8 as native encoding in R on Windows. To help package authors with testing and fixing their packages, “ucrt3” R-devel builds automatically apply patches created by Tomas Kalibera to some packages at installation time (patches for over 100 CRAN packages and several Bioconductor packages have been created). Authors have been invited since March 2021 to adopt these patches, and there are features, described in Howto: UTF-8 as native encoding in R on Windows, allowing packages to do so while still supporting the current R releases.

Switching to UCRT

The “ucrt3” system is ready to be merged into the CRAN systems and R-devel source code. The process is planned to happen as follows and may take some time - it is a more challenging change than previous toolchain upgrades.

CRAN systems being extended by Uwe Ligges are almost ready as well: binary packages are already built and used for the purpose of CRAN package checks (results are already available on CRAN pages) and for checks via the Win-builder service.

On Monday December 13, CRAN will switch the incoming checks on Windows to what is now “ucrt3”. At the same time, R-devel source code will be patched with “ucrt3” patches. From that point on, it will assume 64-bit UCRT and no longer support MSVCRT nor 32-bit targets, and CRAN will start building R-devel snapshot binary builds with Rtools42. This switch should take from a few hours to a maximum of several days. During this short period, it might be difficult to build R-devel from source, install binary packages in R-devel or submit packages to CRAN.

The best course of action for package authors and users using R-devel on Windows will be to uninstall R-devel, uninstall old Rtools, delete the old package libraries, and install the new versions from scratch. Those who build R-devel from source will have to run distclean.

After the switch, R-devel will be automatically installing patches for CRAN and required Bioconductor packages at installation time, as “ucrt3” does now. This feature will be used temporarily to give package authors more time to fix their packages. Eventually, patching a package at installation time may be turned into a warning and the patches may be removed.

The switch is being coordinated with the Bioconductor team, who will eventually provide full support for Bioconductor packages again after the switch of R-devel and CRAN, but it is expected it might take a few days to get everything synchronized.

Preparing for the switch

Authors of packages failing the UCRT checks and of packages with installation time patches are invited to already start adapting their packages. The check results have been available on the CRAN results page for each package since March (e.g. Matrix) and now the corresponding check flavors are r-devel-windows-x86_64-new-TK and r-devel-windows-x86_64-new-UL. The differences between the two are caused primarily by different setups of the systems and package authors primarily should care about the latter run by Uwe Ligges, as this is the setup that will be used after the switch.

Package authors may now use this setup via the Win-builder service run by Uwe Ligges to check their packages. The “ucrt3” system has also been installed on R-hub about a month ago by Gabor Csardi, where it can be used for building and checking packages. It is also available for use with github actions, see Github ucrt3 release and actions. This can be used together with github actions provided by Simon Urbanek to check packages on different platforms.

Before installing “ucrt3” locally, it is safest to uninstall any previous installation of Rtools and R-devel, and to delete the R-devel package libraries. The installers are available for “ucrt3” R-devel and Rtools42. See Howto: UTF-8 as native encoding in R on Windows for detailed installation instructions and advice for package authors.

Most of the required package changes were due to downloading incompatible pre-compiled libraries at installation time. Rtools42 includes libraries for almost all CRAN packages, which can and should be used, instead. Using libraries from the toolchain ensures that they are built in a compatible way, makes the source packages more transparent, removes download issues (which are perhaps rare for individual users but not so in CRAN operations) and makes bigger changes to the toolchain, like this one, much easier.

Outlook to further developments in the toolchain

Currently, the compiler toolchain and libraries are cross-compiled on Linux (using MXE). We could eventually support the full installer build of R on Linux as most of R can already be cross-compiled, and it would be nice if we could also eventually cross-compile most or all of the R packages. This could make some operations faster, easier to automate and easier to replicate for some uses. It is perhaps not surprising that some build systems used in software primarily developed for Unix, which is the vast majority of software R packages use, run way faster on Linux. Cross-compilation is used for instance by Julia and Octave (the latter even using MXE). Even though it would have to be seen how big the benefit would be there, cross-compilation could run even in WSL in Windows.

The distribution of the libraries for packages in a single chunk (or two, the “base” and “full” version) is subject to an ongoing debate, with differing opinions. Should distribution in smaller chunks (sets of libraries, or even individual libraries) become necessary, it should be using a package manager external to R packages and be integrated with the toolchain bundle/Rtools, allowing automated re-build and change together with the compiler toolchain. MXE itself allows building .deb packages of the individual libraries, so that would be one option.

Recent changes

This is a result of ongoing work presented in earlier blog posts from May 2020, July 2020, March 2021.

The progress in “ucrt3” since March 2021 includes:

Rtools42

The bundle includes Msys2 for builds tools (e.g.  make) and the new compiler toolchain (for UCRT) and libraries. The installer is almost the same as in Rtools4 and re-uses its code.

The differences are as follows. Rtools42 contains libraries for almost all CRAN packages, which allows to get rid of downloading pre-built libraries at package installation time. Rtools42 is one step easier to install: one does not have to put the compilers on the PATH. Rtools42 no longer has a special implementation of tar. Rtools42 only has a 64-bit compiler toolchain.

Github actions

“ucrt3” is also available on github for use with github actions. One may download the R builds and the toolchain from github, using the provided actions, and check packages with it.

For the purpose of github actions, there is also a new “base” distribution of the toolchain, which only includes the compiler toolchain and libraries needed to build base R, but that is already enough to build many R packages as well, and it is smaller than the “full” distribution.

Automatic construction of linking orders

Scripts are now available for package authors to automatically find/suggest which libraries to link and in what order to their packages. With contributions from Deepayan Sarkar. More in Howto: UTF-8 as native encoding in R on Windows.

Improved automation

The automated package checks and builds are run inside a docker container (on Windows and Linux). This allows to test that all installation of external software is really automated, and hence done in a way that is recorded. Also it shows that probably all CRAN R packages can really be checked in a Windows docker container, so without some of the Windows GUI APIs and without the GUI itself, which hasn’t at all been clear at the beginning and this is a property worth preserving. The scripts used for “ucrt3” are available here.

Improved coverage

Even though it is a moving target, the check results are already comparable with CRAN checks on other platforms, including the existing Windows checks. The toolchain has been upgraded to newer GCC, MinGW-w64 and a number of libraries (geospatial an other) were upgraded as well. Inevitably, some of the package check issues were due to different setups between the check systems waking up issues in packages that had not been discovered before.

Reduced size

Size of the toolchain has been reduced by about 1G by removing unnecessary executables, which were built by default by MXE, but not used by CRAN packages. That was a significant improvement due to static linking, which makes the executables large. Also, the compressed size has been reduced by using a better compression tool and re-ordering the files before compression to increase chances of common parts being found by the compression algorithm. Now the “base” tarball of the compiler toolchain and libraries takes about 100M compressed and 1000M uncompressed, the “full” tarball takes 360M compressed and 2.7G uncompressed, and the installer for Rtools42 takes about 360M (the EXE file with internal compression) and 3G after installation.