Methods play a different role in the S language than they do in other major programming languages. Analogies with other languages can be useful, but only if they are re-interpreted keeping the different underlying philosophy in mind. Two distinctions are particularly important: S is a functional language, and its use covers a spectrum from casual interaction to large-scale programming.
The most commonly encountered languages emphasizing methods follow what is generally called the Object-Oriented Programming (OOP) model; more precisely, the languages follow a class-based organization of software. For the most part, programming is organized around the definition of classes of objects. Methods are invoked on an object. In S-style notation,
x$plot(y)
invokes the plot
method on the object
x
, passing it an argument y
.
The languages are class-based in that a method is determined
based only on the class of the object; generally, no other
properties of x
are relevant in determining the
method to be invoked. (There have been some
instance-based languages in the past, but as far as I
know none of these are in serious use now.)
The essential organizing category of these languages is the
class. In Java, for example, essentially all programming is
done by defining classes.
In particular, aside from possible inheritance there is no necessary
connection between the method called plot
defined
for two different classes.
The OOP model is useful in many contexts. It can be added to the S language model as a specialized package, with many useful applications (see the Omegahat OOP package for an experiment in this direction). However, OOP computations must be a specialized addition to the basic model.
The basic organizing principle of programming in the S language is the function definition. Programmers spend most of their time writing function definitions, and the S evaluator spends most of its time evaluating calls to functions. Other program-organizing concepts have evolved as important adjuncts or extensions to function definitions. Packages or libraries allow sets of functions to be grouped together meaningfully. Formal classes and methods organize information about objects and functions in a more explicit and distributed way.
But the model (or at least my model) is that most programming in the S language evolves from a user/programmer wanting do do something with data. The something to be done is expressed as one or more functions, which typically start out simple and then evolve. Methods arise as part of the evolution: The definition of functions very often depends on the properties of the objects passed as arguments, and method and class definitions are often the best way to encapsulate such properties.
The distinction between function-based and class-based methods has implications for many aspects of the language. For example, function-based methods need to be integrated with the user's understanding of the function they specialize, which has implications for argument definition (see below).
On the other hand, nearly all discussions of the S language, even those emphasizing programming, are written against the assumed background of S expressions being evaluated interactively, typically in response to something a user types. Programming in the S language aims to extend the computations available for such interaction. In contrast, for example, to Dylan, S does not have a concept of a ``complete program''. Instances of the S evaluator are processes that go on indefinitely, waiting for expressions to evaluate.
A corollary to the importance of interaction is that it comes as a continuum from the user/programmer's perspective. At one end are expressions so simple that they are typed straight-off without pause (or, perhaps, hidden behind a graphical interface). But even a fairly simple user interface supplements this with the ability to recall expressions, cut-and-paste editing changes, and navigate around the text of the expression before evaluating it. The recent history of the interactive session becomes an informal programming environment.
Moving from this stage to defining functions is the major leap from interaction to programming. But simple function definitions can be entered directly from the command line, and editing a small source file that will be parsed immediately is only a little less interactive.
Many of the innovations throughout the history of the language have tried to help the user's evolution from simple interaction to increasingly extensive programming. Both the early, informal method definitions and the formal class/method mechanisms are best seen from this perspective. An implication for the design of the language, and in particular for that of the method mechanism, is that we should avoid making the user do a large amount of programming in order to add a conceptually simple extension to what exists.
This section discusses questions related to the formal arguments for generic functions and how these can be coordinated with the design of methods.
As the term method suggests, a method is a definition of how a function call should be evaluated, as determined by the classes of the actual arguments. It is part of the current API for formal methods that argument matching takes place at the call to the (generic) function: Arguments are not re-matched after the method is selected. Therefore, the arguments of the method are treated as identical to those of the generic function. Not re-matching arguments is moderately important for efficiency of method dispatch, but more fundamentally, any general departure from this requirement would confuse the semantics of method dispatch. (If an argument that was used to select the method is then re-matched to a different formal argument, is the method selection still valid and meaningful?)
This model for method dispatch has implications for choosing formal arguments for generic functions, and for the design of methods. With a little care, the designer of methods can have full flexibility in dealing with arguments.
From the viewpoint of someone designing a method for a particular generic function, possible formal arguments may fall into three categories:
The first category raises no problems. The second and third can be handled in a reasonably convenient way as well, but need some consideration. And just which arguments are important enough to be included in the generic (i.e., should they fall in the second or third category) will always be open to discussion.
Two examples will illustrate the issues. The function plot
is defined in
the R methods
package to have arguments:
plot(x, y, ...)
The arguments x
and y
represent the datasets
providing values to plot on the x- and y-axis respectively.
Both arguments are included in the generic, since it may well be
useful to define methods based on either dataset.
Some methods, however, will be defined for only a single object.
At the same time, the definition of the generic function implies that
additional arguments are not relevant in dispatching methods for
plot
. Individual methods can have specific needs for
additional arguments, however.
As a second example, consider the "["
operator.
The formal arguments for this operator in the R methods
package are:
x[i, j, ..., drop]
Here, there are three arguments included in the generic that
might reasonably be part of a method signature: the object
x
for which as subset is extracted or replaced, and the
first and second subscripts, i
and j
.
(Including these enables methods to be defined separately, for
example, for text-based and numerical subsetting, or for other more
specialized subsetting situations.)
Once again, for many methods the second subscript will not be
meaningful, but matrix and matrix-like objects are so central to
applications of the S language that we need to enable methods for
subsetting such objects.
The drop
argument, on the other hand, is also specialized
to matrix- or array-like objects and is not likely to be useful
in method selection. If we were starting over, this argument would
more naturally fall into our third category, but it's there (for now) as part of
the traditional S language definition.
The suggested approaches to handling conceptual differences in function and method arguments are as follows.
"missing"
.
The corresponding method will then never be selected if the call
includes the meaningless argument.
The point for the method designer to keep in mind is the distinction
between including the argument with class "missing"
and
omitting the argument from the signature.
The latter implies that any object may appear as this argument
(it corresponds formally to class "ANY"
in the
signature), which is not correct if the argument is not meaningful.
The R implementation of methods provides for an equivalent way of
specifying that an argument should not be included.
If the method definition is a function whose formal arguments are a
subset of those to the generic (in the same order) then the
setMethod
function infers that missing formal arguments
are to have class "missing"
in the signature.
While this corresponds to a notion of conforming arguments in the
Dylan language, it is pretty much just syntactic sugar in S.
...
as a
formal argument to allow the arguments to be passed down.
Again, this is not an extension to the language, just making use of
existing features.
For example, suppose a method is defined for subsetting objects
maintained in some particular remote database, and suppose we want an
argument copy
to the method that says whether to copy the
subset or create a remote reference for it.
There is no copy
argument to the generic, and it's
reasonable to say that the concept isn't sufficiently general to
justify redefining the generic.
The suggestion is to define a function, say subsetRemote
with the appropriate argument list, and to call that function as the
method:
subsetRemote <- function(x, i, copy = TRUE)
{ .... }
setMethod("[", "remoteObject",
function(x, i, ...) subsetRemote(x, i, ...)
)
With this definition, an expression like
newObj <- myObj[sample(length(myObj), 1000), copy = FALSE]
would invoke the method, assuming myObj
had a suitable
class extending "remoteObject"
.
A few points of detail are relevant. First, notice that while
the formal method uses the ...
argument from the generic,
the actual function defining the method does not, with the result that
invalid argument names will be detected.
On the other hand, the requirement that there be no argument
re-matching means that the special arguments must pass through
...
, which in turn means that copy
must be
supplied by name (otherwise it would match j
).
As a third point, notice that we've used the R feature of omitting the
j
argument from the method definition, forcing that
argument to be missing.
There is an objection to this mechanism, in that as written it
clutters up the name space with the additional function
(subsetRemote
in this case). The proposals for
namespaces in R would be helpful here.
It is possible to embed the new function in the method itself, at the
cost of less readable code and some (trivial) extra computation:
setMethod("[", "remoteObject",
function(x, i, ...) {
subsetRemote <- function(x, i, copy = TRUE)
{ .... }
subsetRemote(x, i, ...)
}
)
We could provide a convenience mechanism for this feature as we did
for the missing arguments.
For example, setMethod
could interpret arguments not
found in the generic to create a function in the form shown. (The
current Splus implementation does something similar.)
On the whole, such a mechanism seems a little dangerous.