JSON, null and NA

Duncan Temple Lang

University of California at Davis

Department of Statistics


Limitations of JSON regarding the meaning of null

JavaScript Object Notation (JSON) is a convenient format for representing data and facilitates transferring data between applications. It is widely used in different applications, Web services and other contexts. As such, it is useful for R to be able to import and export data in this format. Unfortunately, JSON is a little too simple and cannot faithfully represent all of the types and values in R. Most specifically, there is no way to support NA, Inf or NaN. Typically, these values are represented in JSON as "null". However, that is also used to represent a null object. So there is ambiguity in how we interpret null in JSON. We don't know whether it is NA, NaN, Inf or NULL in R. This many-to-one mapping results in a loss of information.

In spite of the shortcomings of the format, we can still work with JSON. However, how we convert null values to R and how we convert NA values from R is not automatic and uniquely defined. For that reason, the caller must control how these are mapped. We provide some mechanisms to do this in the fromJSON() and toJSON() functions.

When converting R objects to JSON via toJSON(), one can specify how to map NA values to JSON. One provides a value for the parameter .na to control this. For example, suppose we want to transform the R list

x = list(1, 2, NA, c(TRUE, NA, FALSE))

to JSON and want NA values to map to null. We can achieve this with

toJSON(x, .na = "null")

In some applications, we represent a missing value with a fixed number that is unlikely to occur in actual data, e.g. -99999. We can map NA values to such a number with

toJSON(list(1, 2, list(NA)), .na = -99999)

Now consider round-tripping NA, e.g.

o = toJSON ( NA )
[1] "[ null ]"
fromJSON( o )
[[1]]
NULL

So we have lost information.

We can correct this loss of information by specifying how to map null values in JSON to R values. We use the nullValue

fromJSON( toJSON ( NA ), nullValue = NA)

Again, here we as the caller of fromJSON() (and also toJSON()) we are providing information about how to transfer the null value from JSON to R. Only we know what it means in this case. If we knew that the null corresponded to Inf, we could specify that:

 fromJSON( "[null]", nullValue = Inf)

Where this mechanism breaks down is when we have multiple null values in our JSON content and they map to different R values, e.g. NULL, NA and NaN. The nullValue parameter is a global replacement for null entries in the JSON. To adaptively process these null entries in a context specific manner, we have to use a customized parser. We can do this by providing an R function as the callback handler for the JSON parser.