Historical relict: R matrix
is not an array
In a recent discussion on the R-devel
mailing list, in a thread started on July 8,
head.matrix can return 1000s of columns – limit to n or add new argument?
Michael Chirico and then Gabe Becker where proposing to generalize the head()
and tail()
utility functions, and Gabe noted that current (pre R-4.x.y) head()
would not treat array
specially.
I’ve replied, noting
that R currently typically needs both a matrix
and an array
method:
Note however the following historical quirk :
sapply(setNames(,1:5),
function(K) inherits(array(7, dim=1:K), "array"))
((As I hope this will change, I explicitely put the current R 3.x.y result rather than evaluating the above R chunk: ))
1 2 3 4 5
TRUE FALSE TRUE TRUE TRUE
Note that matrix
objects are not array
s in that (inheritance) sense,
even though — many useRs may not be aware of —
identical(
matrix(47, 2,3), # NB " n, n+1 " is slightly special
array (47, 2:3))
## [1] TRUE
all matrices can equivalently constructed by array(.)
though slightly more clumsily in the case of matrix(*, byrow=TRUE)
.
Note that because of that, base R itself has three functions where the matrix
and the array
methods are identical, as I wrote in the post:
The consequence of that is that currently, “often” foo.matrix
is just a copy of foo.array
in the case the latter exists, with base
examples of foo in {unique, duplicated, anyDuplicated} .
for(e in expression(unique, duplicated, anyDuplicated)) { # `e` is a `symbol`
f.m <- get(paste(e, "matrix", sep="."))
f.a <- get(paste(e, "array", sep="."))
stopifnot(is.function(f.m),
identical(f.m, f.a))
}
In R 4.0.0, will a matrix()
be an "array"
?
In that same post, I’ve also asked
Is this something we should consider changing for R 4.0.0 – to have it TRUE also for 2d-arrays aka matrix objects ??
In the mean time, I’ve tentatively answered “yes” to my own question, and started investigating some of the consequences. From what I found, in too eager (unit) tests, some even written by myself, I was reminded that I had wanted to teach more people about an underlying related issue where we’ve seen many unsafe useR’s use R unsafely:
If you think class(.) == *
, think again: Rather inherits(., *)
…. or is(., *)
Most non-beginning R users are aware of inheritance between classes,
and even more generally that R objects, at least conceptually, are of more than one “kind”.
E.g, pi
is both "numeric"
and "double"
or 1:2
is both integer
and numeric
.
They may know that time-date objects come in two forms: The ?DateTimeClasses
(or ?POSIXt
) help page
describes POSIXct
and POSIXlt
and says
"POSIXct"
is more convenient for including in data frames, and"POSIXlt"
is closer to human-readable forms. A virtual class"POSIXt"
exists from which both of the classes inherit …
and for example
class(tm <- Sys.time())
## [1] "POSIXct" "POSIXt"
shows that class(.)
is of length two here, something breaking a if(class(x) == "....") ..
call.
Formal Classes: S4
R’s formal class system, called S4
(implemented mainly in the standard R package methods
)
provides functionality and tools to implement rich class inheritance structures, made use of heavily
in package Matrix
,
or in the Bioconductor project with it’s 1800+ R “software” packages.
Bioconductor even builds on core packages providing much used S4 classes, e.g.,
Biostrings,
S4Vectors,
XVector,
IRanges,
and
GenomicRanges.
See also
Common Bioconductor Methods and Classes.
Within the formal S4 class system, where extension and inheritance are important and often widely used, an expression such as
if (class(obj) == "matrix") { ..... } # *bad* - do not copy !
is particularly unuseful, as obj
could well be of a class that extends matrix, and S4 using
programmeRs learn early to rather use
if (is(obj, "matrix")) { ..... } # *good* !!!
Note that the Bioconductor guidelines for package developers have warned about the misuse of class(.) == *
, see the section
R Code and Best Practices
Informal “Classical” Classes: S3
R was created as dialect or implementation of S,
see Wikipedia’s R History,
and for S, the “White Book” (Chambers & Hastie, 1992) introduced a convenient relatively simple
object orientation (OO), later coined S3
because the white book introduced S version 3 (where the blue book described S version 2, and the green book S version 4, i.e., S4
).
The white book also introduced formulas, data frames, etc, and in some cases also the idea that some S objects
could be particular cases of a given class, and in that sense extend that class.
Examples, in R, too, have been multivariate time series ("mts"
) extending (simple) time series ("ts"
),
or multivariate or generalized linear models ("mlm"
or "glm"
) extending normal linear models "lm"
.
The “Workaround”: class(.)[1]
So, some more experienced and careful programmers have been replacing class(x)
by class(x)[1]
(or class(x)[1L]
) in such comparisons, e.g., in a good and widely lauded useR! 2018 talk.
In some cases, this is good enough, and it is also what R’s data.class(.)
function does (among other), or the
(user hidden) methods:::.class1(.)
.
However, programmeRs should be aware that this is just a workaround and leads to their working incorrectly
in cases where typical S3 inheritance is used: In some situtation it is very natural to slightly modify
or extend a function fitme()
whose result is of class "fitme"
, typically by writing
fitmeMore()
, say, whose value would be of class c("fMore", "fitme")
such that almost all “fitme” methods
would continue to work, but the author of fitmeMore()
would additionally provide a print()
method, i.e.,
provide method function print.fMore()
.
But if other users work with class(.)[1]
and have provided code for the case
class(.)[1] == "fitme"
that code would wrongly not apply to the new "fMore"
objects.
The only correct solution is to work with inherits(., "fitme")
as that would apply to all
objects it should.
In a much depended on CRAN package, the following line (slightly obfuscated) which should efficiently determine list entries of a certain class
isC <- vapply(args, class, "") == "__my_class__"
was found (and notified to the package maintainer) to need correction to
isC <- vapply(args, inherits, TRUE, what = "__my_class__")
Summary:
Instead class(x) == "foo"
, you should use inherits(x, "foo")
or maybe alternatively is(x, "foo")
Corollary:
switch(class(x)[1],
"class_1" = { ..... },
"class_2" = { ..... },
.......,
.......,
"class_10" = { ..... },
stop(" ... invalid class:", class(x)))
may look clean, but is is almost always not good enough, as it is (typically) wrong,
e.g., when class(x)
is c("class_7", "class_2")
.
References
R Core Team (2019). R Help pages:
For S3,
class
orinherits
For S4, e.g., Basic use of S4 Methods and Classes, and
is
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language (the blue book, introducing S version 2 (
S2
)); Wadsworth & Brooks/Cole.Chambers, J. M. and Hastie, T. J. eds (1992) Statistical Models in S (the white book, introducing S version 3 (
S3
); Chapman & Hall, London.Chambers, John M. (1998) Programming with Data (the green book, for
S4
original); Springer.Chambers, John M. (2008) Software for Data Analysis: Programming with R (
S4
etc for R); Springer.