--- title: "When you think `class(.) == *`, think again!" slug: "when-you-think-class.-think-again" author: "Martin Maechler" date: 2019-11-09 categories: ["User-visible Behavior", "Safe Programming"] tags: ["class", "S3", "inheritance", "OO"] --- ```{r global_options, include=FALSE} knitr::opts_chunk$set(collapse=TRUE) ``` ## Historical relict: R `matrix` is not an `array` In a recent discussion on the `R-devel` mailing list, in a thread started on July 8, [head.matrix can return 1000s of columns -- limit to n or add new argument?](https://stat.ethz.ch/pipermail/r-devel/2019-July/078143.html) Michael Chirico and then Gabe Becker where proposing to generalize the `head()` and `tail()` utility functions, and Gabe noted that current (pre R-4.x.y) [`head()` would not treat `array` specially](https://stat.ethz.ch/pipermail/r-devel/2019-October/078606.html). [I've replied, noting](https://stat.ethz.ch/pipermail/r-devel/2019-October/078610.html) that R currently typically needs both a `matrix` and an `array` method: > Note however the following historical quirk : ```{r array-matrix-aint, eval=FALSE} sapply(setNames(,1:5), function(K) inherits(array(7, dim=1:K), "array")) ``` ((As I hope this will change, I explicitely put the current R 3.x.y result rather than evaluating the above R chunk: )) ``` 1 2 3 4 5 TRUE FALSE TRUE TRUE TRUE ``` Note that `matrix` objects are not `array` s in that (inheritance) sense, even though --- many useRs may not be aware of --- ```{r array-makes-matrix} identical( matrix(47, 2,3), # NB " n, n+1 " is slightly special array (47, 2:3)) ``` all matrices can equivalently constructed by `array(.)` though slightly more clumsily in the case of `matrix(*, byrow=TRUE)`. Note that because of that, base R itself has three functions where the `matrix` and the `array` methods are identical, as I wrote in the post: _The consequence of that is that currently, "often" `foo.matrix` is just a copy of `foo.array` in the case the latter exists, with `base` examples of foo in {unique, duplicated, anyDuplicated} ._ ```{r matrix-array-identical} for(e in expression(unique, duplicated, anyDuplicated)) { # `e` is a `symbol` f.m <- get(paste(e, "matrix", sep=".")) f.a <- get(paste(e, "array", sep=".")) stopifnot(is.function(f.m), identical(f.m, f.a)) } ``` ## In R 4.0.0, will a `matrix()` be an `"array"`? In that same post, I've also asked > Is this something we should consider changing for R 4.0.0 -- to > have it TRUE also for 2d-arrays aka matrix objects ?? In the mean time, I've tentatively answered _"yes"_ to my own question, and started investigating some of the consequences. From what I found, in too eager (unit) tests, some even written by myself, I was reminded that I had wanted to teach more people about an underlying related issue where we've seen many unsafe useR's use R unsafely: ## If you think `class(.) == *`, think again:            Rather `inherits(., *)` .... or `is(., *)` Most non-beginning R users are aware of __inheritance__ between classes, and even more generally that R objects, at least conceptually, are of more than one "kind". E.g, `pi` is both `"numeric"` and `"double"` or `1:2` is both `integer` and `numeric`. They may know that time-date objects come in two forms: The [`?DateTimeClasses` (or `?POSIXt`) help page](https://stat.ethz.ch/R-manual/R-devel/library/base/html/DateTimeClasses.html) describes `POSIXct` and `POSIXlt` and says > `"POSIXct"` is more convenient for including in data frames, and > `"POSIXlt"` is closer to human-readable forms. A virtual class > `"POSIXt"` exists from which both of the classes inherit ... and for example ```{r Sys.time-2-classes} class(tm <- Sys.time()) ``` shows that `class(.)` is of length two here, something breaking a `if(class(x) == "....") ..` call. ### Formal Classes: `S4` R's _formal_ class system, called _`S4`_ (implemented mainly in the standard R package `methods`) provides functionality and tools to implement rich class inheritance structures, made use of heavily in [package `Matrix`](https://cran.R-project.org/package=Matrix), or in the [Bioconductor project](https://bioconductor.org) with it's 1800+ R "software" packages. Bioconductor even builds on core packages providing much used S4 classes, e.g., [Biostrings](https://bioconductor.org/packages/Biostrings/), [S4Vectors](https://bioconductor.org/packages/S4Vectors/), [XVector](https://bioconductor.org/packages/XVector/), [IRanges](https://bioconductor.org/packages/IRanges/), and [GenomicRanges](https://bioconductor.org/packages/GenomicRanges/). See also [Common Bioconductor Methods and Classes](https://bioconductor.org/developers/how-to/commonMethodsAndClasses/). Within the formal S4 class system, where extension and inheritance are important and often widely used, an expression such as ```{r call-eq-1, eval=FALSE} if (class(obj) == "matrix") { ..... } # *bad* - do not copy ! ``` is particularly unuseful, as `obj` could well be of a class that _extends_ matrix, and S4 using programmeRs learn early to rather use ```{r rather-is-1, eval=FALSE} if (is(obj, "matrix")) { ..... } # *good* !!! ``` Note that the Bioconductor guidelines for package developers have warned about the misuse of `class(.) == *` , see the section [R Code and Best Practices](https://bioconductor.org/developers/package-guidelines/#rcode) ### Informal "Classical" Classes: `S3` R was created as _dialect_ or implementation of S, see [Wikipedia's R History](https://en.wikipedia.org/wiki/R_(programming_language)#History), and for S, the _"White Book"_ (Chambers & Hastie, 1992) introduced a convenient relatively simple object orientation (OO), later coined _`S3`_ because the white book introduced _S version 3_ (where the blue book described _S version 2_, and the green book _S version 4_, i.e., `S4`). The white book also introduced formulas, data frames, etc, and in some cases also the idea that some S objects could be _particular_ cases of a given class, and in that sense _extend_ that class. Examples, in R, too, have been multivariate time series (`"mts"`) extending (simple) time series (`"ts"`), or multivariate or generalized linear models (`"mlm"` or `"glm"`) extending normal linear models `"lm"`. ### The "Workaround": `class(.)[1]` So, some more experienced and careful programmers have been replacing `class(x)` by `class(x)[1]` (or `class(x)[1L]`) in such comparisons, e.g., in a good and widely lauded useR! 2018 talk. In some cases, this is good enough, and it is also what R's `data.class(.)` function does (among other), or the (user hidden) `methods:::.class1(.)`. However, programmeRs should be aware that this is just a workaround and leads to their working _incorrectly_ in cases where typical S3 inheritance is used: In some situtation it is very natural to slightly modify or extend a function `fitme()` whose result is of class `"fitme"`, typically by writing `fitmeMore()`, say, whose value would be of class `c("fMore", "fitme")` such that almost all "fitme" methods would continue to work, but the author of `fitmeMore()` would additionally provide a `print()` method, i.e., provide method function `print.fMore()`. *But* if other users work with `class(.)[1]` and have provided code for the case `class(.)[1] == "fitme"` that code would *wrongly* not apply to the new `"fMore"` objects. The only correct solution is to work with `inherits(., "fitme")` as that would apply to all objects it should. In a much depended on CRAN package, the following line (slightly obfuscated) which should efficiently determine list entries of a certain class ```{r vapply-class, eval=FALSE} isC <- vapply(args, class, "") == "__my_class__" ``` was found (and notified to the package maintainer) to need correction to ```{r vapply-inherits, eval=FALSE} isC <- vapply(args, inherits, TRUE, what = "__my_class__") ``` ### Summary: **Instead `class(x) == "foo"`, you should use `inherits(x, "foo")`     or maybe alternatively `is(x, "foo")`** #### Corollary: ```{r switch-class-1, eval=FALSE} switch(class(x)[1], "class_1" = { ..... }, "class_2" = { ..... }, ......., ......., "class_10" = { ..... }, stop(" ... invalid class:", class(x))) ``` may look clean, but is is almost always *not good enough*, as it is (typically) wrong, e.g., when `class(x)` is `c("class_7", "class_2")`. ## References * R Core Team (2019). R Help pages: * For S3, [`class` or `inherits`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html) * For S4, e.g., [Basic use of S4 Methods and Classes](https://stat.ethz.ch/R-manual/R-devel/library/methods/html/Introduction.html), and [`is`](https://stat.ethz.ch/R-manual/R-devel/library/methods/html/is.html). * Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S Language_ (the _blue book_, introducing S version 2 (`S2`)); Wadsworth & Brooks/Cole. * Chambers, J. M. and Hastie, T. J. eds (1992) _Statistical Models in S_ (*the* _white book_, introducing S version 3 (`S3`); Chapman & Hall, London. * Chambers, John M. (1998) _Programming with Data_ (*the* _green book_, for `S4` original); Springer. * Chambers, John M. (2008) _Software for Data Analysis: Programming with R_ (`S4` etc for R); Springer.