Future versions of R will (optionally) support internal, and probably user-level threads, and so it is desirable that C code accessed from R also be thread-safe. By this we essentiall mean that two different streams of executions can be running concurrently, executing the same code but have different local variables. This document is intended to provide guidelines for those writing C code to be used with R to make it thread-safe. It is the first in a number of documents that we will attempt to provide that discuss threads.
There are 6 static variables defined in list.c
static SEXP ans; static int UniqueNames; static int IncludeFunctions; static int StoreValues; static int ItemCounts; static int MaxCount;These naturally form a group of related variables that are to be used in associate with each other. In many respects, a class would be an obvious way to group them. We will use C's equivalent of this which is a struct and gather these variables into a single variable.
typdef struct { SEXP ans; int UniqueNames; int IncludeFunctions; int StoreValues; int ItemCounts; int MaxCount; } NameWalkerData;
We can start our changes by declaring a global variable which is an instance of this structure.
static NameWalkerData GlobalNameData; static NameWalkerData *nameData = &GlobalNameData;GlobalNameData is an instance of this structure. For reasons that will become clearer later on, we will want to refer to the fields in this instance of the structure via a pointer. Hence we define nameData as a pointer to a NameWalkerData instance and set it to point to GlobalNameData.
Of course, GlobalNameData is a global/static variable and so will not be thread-safe. We have simply reduced the number of globals from 6 to 1 (or 2 because of the use of a pointer). We will remove this global variable later, but will use it to focus on the changes to the code that use the original 6 variables.
We should note that this may not work with all compilers (i.e. initializing a static variable as the address of another static variable), but we will remove this code and are using it only for purposes of explanation.
If we recompile with these two changes (defining the structure and declaring an instance of it), we will obviously get numerous error messages about the original 6 variables not being defined. We can use these errors to step through the code (i.e. using something like emacs' navigation facilities for jumping to the point of compilation errors).
We have several different ways to go about changing the code and the approach one choses depends on how much time one wants to put in [2] .
switch(TYPEOF(s)) { case SYMSXP: if(ItemCounts < MaxCount) {becomes
switch(TYPEOF(s)) { case SYMSXP: if(nameData->ItemCounts < nameData->MaxCount) {
We have used the first approach and the resulting code can be seen in step1.html
SEXP do_allnames(SEXP call, SEXP op, SEXP args, SEXP env) { SEXP expr; int i, savecount; NameWalkerData localData; nameData = &localData; checkArity(op, args); expr = CAR(args);The code at the end of this step is in step2.html
So how can we get rid of this global variable? At this point in the process, it is easy to see that we can simply pass the instance of NameWalkerData as an argument to namewalk() from do_allnames() and all recursive calls to namewalk(). We remove the global variable nameData and make it local to do_allnames() and point to localData within that routine. Then, we modify the declaration for namewalk() to take an additional argument of type NameWalkerData * and we make certain to call this parameter nameData. This means we do not have to change any references to the different fields that we introduced in step 1. Recompiling at this point will identify all the places that we call namewalk() without the new argument and we can change these calls to include the local variable nameData.
The code at the end of this step is in step3.html
The simplest mechanism for finding global (i.e. non-local) variables is to compile the C code and to use the nm utility [3] .