The chapter in the ``green book'' (Programming with Data; Springer, 1998) on documentation assumes S documentation objects and SGML-based documentation; until some further work is done, this is not directly usable.
The current implementation of special documentation in R is based on a
mapping from type-topic pairs into a single string, the value of
This corresponds to the binary version of the
topicName(type, topic)
?
operator
(Programming with Data, page 358):
looks for topic
class ? track
topicName(class, track)
. At the moment
the actual topic would be "track-class"
. The string
convention is used in two places:
\alias
entry; and
"track-class.Rd"
"_"
as the separator is based on the notion
that it's unlikely to be part of a class or method name and is valid
as part of a file name, on Unix/Linux and Windows at least.
The planned sequence for generating class and method documentation,
using the current Rd format, is similar to that for functions.
The programmer implements a class or some methods for a function, and
then creates a shell of the documentation by the corresponding prompt
function:
The output of these would currently go to files
promptClass("track")
promptMethods("plot")
"track-class.Rd"
and "plot-methods.Rd"
respectively.
The current documentation for both classes and methods conforms to the
existing Rd format. In particular, the special sections, such as that
for documenting slots of a class, are handled by the standard
\section
command. Eventually, it would be better to have
documentation files specialized to classes and methods, allowing
easier checks of the documentation against the currently defined
objects.
Having specialized documentation would improve the prompt functions as
well, since they currently have much too specific an idea of how the
documentation is to be rendered.
But good specialized documentation really requires the much-desired change in the
overall documentation strategy to XML or other.
Another issue for consideration is the organization of class
and method documentation in relation to an overall project.
The general notion behind current software (e.g.,
promptMethods
) is that the programmer may want either to
document a function and its methods together, or to document
separately some additional methods.
If I own a package that contains the definition of a function, say
plotData
, along with some methods for that function, I
may want to document function and methods together. If you then
define another package that requires mine and in that package add some
more methods for plotData
, the likely scenario is that
you will document your methods separately, with a link to my
documentation of the function itself.
None of this is enforced, but the addTo
argument to
promptMethods
is designed to set up either version.
On the other hand, promptClass
also generates a list of
the locally-defined methods involving the class. If I followed the
design style above, I would likely put a link to the
documentation of the corresponding function into the entries in the
class documentation. Again, nothing is enforced, and it should be
possible to have the other philosophy: put the method documentation
into the relevant class documentation file. Philosophically, this is
more an OOP approach than a function-based approach.
(See also the discussion of class extensions.)
This is essentially a request for a new type of object in R (unless there is a solution within the current implementation not so far uncovered).
An important subset of formal classes either extend one of
the basic vector types, such as "character"
, or
extend the notion of a structure
in S (as would, for
example, a formal definition of matrix or time-series as a class).
For these classes, one would like to inherit much of the behavior of
S3 for vectors and structures.
The present implementation satisfies this fairly well, as far as
current experience has tested.
The prototype of objects in these classes is an object of one of the
``basic classes''.
Most existing non-class-based code for such objects should continue to
work for classes that extend basic classes.
Other formal classes are defined only in terms of their slots, and they should not behave like vectors, unless the designer of the class defines appropriate methods. Nothing in the class definition says that objects from such classes should inherit primitive functions for subsetting, arithmetic, etc.
The API concepts corresponding to this are fairly simple:
vector
or class structure
, it likewise
should have methods pre-defined for the vector-like operations.
(There are a couple of issues here about what happens with S3-like class/attribute structures. The behavior is mostly, but perhaps not entirely, what we would want. It may be that the case of a formally defined class will need to be detected in the current base implementation and dealt with specially. But this seems likely to be fairly easy.)
vector
directly or
indirectly, it should not permit vector-like operations
unless these are given explicit definitions.
The current implementation of the methods package does not implement the last item. Vector operations return results (generally meaningless) if applied to non-vector classes. The reason is that the prototype object from the class is a vector (specifically, an empty list).
What one would like is an object for which all vector-like operations
are generally undefined.
For example, if t1
is an
object from some non-vector class, for which no subsetting methods are
defined, an error should result in situations like:
In the current implementation, the operation falls through to an
operation on an empty list, which may or may not fail.
t1[[1]]
Error in t1[[1]] : object is not subsettable
There are non-vector data types in R, but it appears that none of them
are suitable as a prototype.
The two candidates that might be plausible are
environment
and closure
. But environments
are references and do not get duplicated. So ordinary assignment,
will not work according to S semantics. Changes to
y <- x
y
will be reflected in x
.
Closures are partially duplicated, but attempts to use them as
prototypes so far seem to fail (in less obvious ways). In addition,
they are only partially non-vectors: subsetting fails but
length
does not.
"classRepEnvironment"
did not themselves come from
a formally defined class; instead, properties in these objects were
dealt with in a special way.
It is fairly obvious that the model for a language should apply as uniformly as possible. Exceptions tend to be made for efficiency of basic computations, for better or worse, but otherwise we would like as few special cases as possible. Such uniformity applies particularly for an implementation of the S language, in which much of the computation on the language can be done in the language itself.
The revision in the previous section makes it possible to implement class objects as a true class. Aside from the philosophical desirability, having a true class opens up the possibility of extending that class later on. The bootstrap process for generating the methods package also simplifies substantially: we basically just need to create the initial definition of the ``class class'' and other computations are largely within the model.
The first step in implementing this is simple, given the general model. A method is allowed if its argument list conforms to the arguments of the generic in the following sense:
A method conforms to its generic function if the formal arguments of
the method are a subset of those in the generic, appearing in the same
order, and if the omitted arguments in the method do not appear in the
signature associated with the method. In this case, the signature
method is treated as if it were augmented by class
"missing"
for each of the omitted arguments.
Note that with this definition, the extension to conforming arguments only affects method
specification, not method dispatch.
The plan is to modify the rule for setting methods so that conforming, but not identical argument lists are allowed (there will probably continue to be a message noting the extended interpretation of the signature). Non-conforming method definitions will likely become errors, rather than the current warning (but this should get some discussion).
R saves an image only of the global environment. Therefore, all information about classes must reside in this environment, in order to be saved and reloaded correctly in a future session. But relationships between classes may involve classes whose definition is not (and should not be) local to the current global environment.
The following is the simplest example:
setClass("numbersAndStuff") ## a virtual class
setIs("numeric", "numbersAndStuff")
The purpose here is to create a virtual class of which various actual
classes will be subclasses. Slots declared to be
"numbersAndStuff"
are then constrained to be one of those
classes.
The definition of class "numeric"
resides (currently) in
the methods package.
Further, the user is not allowed to redefine this class, for fairly
obvious reasons.
The issue then is how (and where) to store the metadata recording the
call to setIs
.
Previous versions (R 1.4.1 and earlier) of the package stored the information in the class
metadata object for "numeric"
, regardless of where that
object resides.
That approach produces incorrect results if the user saves the current
global environment and then restores it.
The setIs
information will not be restored, since
it was assigned in the environment of the methods package.
There appear to be two approaches that alleviate the difficulty:
"numeric"
in the global environment;
"numbersAndStuff"
.
The call to setIs
defines a link in the graph of
inter-class relations. Essentially, the prior approach always stored
an ``upward-pointing'' link. The two alternatives respectively create
a new kind of metadata to store the link information, and store the
information (upward or downward) in whichever class is local.
There are some issues with either proposed solution. Overall the first approach is
slightly more general (and perhaps more efficient; see comments
below), but is a major complication of the metadata structure and has
some issues about class versions.
The second approach is not quite as general, but is a relatively modest change to
the current behavior.
For that reason, we will follow the second approach for now.
There are some issues with the second approach however.
Generally, though not always, one or the other of the two classes in
the setIs
call will be local.
If not, using subclass information clearly doesn't solve the basic
problem. Should we prohibit setIs
calls if the local
package or environment does not own either class definition?
For the mechanism to work, however, we must be prepared to modify cached class definitions. The first solution makes it possible to search explicitly for all the extensions of a given class: they must reside in the class's own definition or in one of the extension meta-data objects for that class. But the second mechanism leaves the information about some extensions in the definition of another class.
In our example, suppose class "numeric"
is needed before
class "numbersAndStuff"
has been encountered.
The methods package caches all the available information about class
"numeric"
when it is first needed (not caching the
information would be a major performance penalty).
But there is no practical way to find all the relevant subclass information at the time the
class definition of "numeric"
is needed: One would have
to examine the subclass information for all classes currently visible.
So the cached (and nominally complete) definition of class
"numeric"
fails to say that it extends "numbersAndStuff"
.
To avoid errors from caching subclasses, one needs to insert new link
information into the cached version, once that information is available.
In the example, when we do need the definition of class
"numbersAndStuff"
, the code that completes that class's
definition must insert extends information into all the subclasses in the definition (and
their subclasses as well). Does this ``downward completion'' resolve
all the potential errors? It seems to, in the sense that before we
can need the information that "numeric"
extends
"numbersAndStuff"
surely we must have encountered
class "numbersAndStuff"
, and the methods package is
designed to complete a class definition as soon as that class is
needed. This is, however, a distinctly heuristic argument, waiting
for counter-examples!
The first approach, the use of extension metadata objects, would be a change in the
organization of the metadata. It certainly affects more aspects of
the package than using subclass information: the process of
collecting extension information or subclass information is now
decentralized. To implement the solution generally, the code that
completes a class definition would need to search for all the
extension meta-data objects for that class in the current search list,
and merge the result into the cached definition. Intuitively, it
seems both desirable and straightforward to make the link information
fully symmetric in this approach. That is, a setIs
call
(or the same implication within a class definition) would provide
extends information for both the classes: upward-pointing for the
first and downward-pointing for the second. It's a straightforward
example of doubly-linked list information, and once we agree to store
links explicitly, why not be complete. Having this information seems
likely to make class completion somewhat more efficient.
Separating the link information from the rest of the class definition does further fragment the metadata. In principle there isn't anything radically different needed to keep the information up to date in the two possible solutions. But can link information exist in a package that owns neither of the class definitions? How could we know if the information is correct? (So we don't really escape the similar problem for the other solution.)