get
, assign
, etc.
The functions get
, exists
, and
assign all take a second argument that
refers, generally, to where
the action of the function should take
place. The functions objects
and rm
also take similar arguments.
From the user's view, where
can be a workspace image, an attached
package, an environment (maybe of a currently active call or maybe a
special environment). In the future we will likely want it to
represent other things as well, for example when we're interfacing to
database software.
The functions should allow any meaningful object and should then behave consistently, up to differences in what they are doing.
Right now, that's not entirely the case. For example, to supply an
environment as an argument, you must use a separate envir=
argument.
The following list suggests some changes for this and related problems. Items 1 and 2 are being pushed for immediate action; the remainder seem desirable, but are either not quite as back-compatible or else depend on other changes.
The proposed changes for 1 and 2 have been added to the development branch.
There's really quite a clean concept operating here: whatever is specified as an argument is essentially coerced to be an R environment. Unfortunately, it's done a different way depending on what the user supplies and on which function is called. Numbers are passed to pos.to.env. Character strings are first explicitly matched (by some code that relies on lazy evaluation to work). And the simplest case, when the user supplies an environment, requires the actual call to be different.
That last problem causes messy, inefficient, and error-prone programming in higher-level functions that try to treat environments and other databases uniformly.
The proposed solution is to replace the calls to pos.to.env
by
calls to a new function, as.environment
, which would work
uniformly for the various cases.
At the same time, this eliminates the ad hoc code dealing with character string arguments, turning the body of get, exists, and assign just into .Internal calls. The claim is that this change is back-compatible with existing code.
For efficiency, the current as.environment
implementation should be replaced by a version in C.
In principle, the first argument to objects is just another
where
argument. The current implementation has 3 related
arguments, name
, pos
and envir
, and the treatment of the
first one is, well, strange.
if (!missing(name)) {
if (!is.numeric(name) || name != (pos <- as.integer(name))) {
name <- substitute(name)
if (!is.character(name))
name <- deparse(name)
pos <- match(name, search())
}
envir <- pos.to.env(pos)
}
I _think_ the intention here is to allow the names on the search
list to be supplied with or without quotes, so "package:base"
or
package:base
would both work.
If so: Alas, it won't fly. The culprit is that expression
is.numeric(name)
, which will have to evaluate name
. Lots of
luck with package:base
! Generally, you can't put a substitute(x)
call into anything conditional on the value of x and not expect to
go down in flames.
Because the subsitute is used unless the evaluation produces a numeric, one can't supply an expression that evaluates to a string or an environment.
The proposed modified version retains the basic idea, but
evaluates name
in a try() expression and performs the subsitute
only if the try fails. This should (usually!) even work for
package:base
.
The argument lists of get, exists, and assign aren't consistent with each other, or with S-Plus. (The S-Plus arguments aren't entirely consistent either, but closer.)
Two issues are whether it's where
(as in exists and in S-Plus)
or pos
(as in the other functions) and whether a separate
frame
argument is allowed (yes in exists and in S-Plus, no
elsewhere). Where frame
is allowed, it's equivalent to
supplying the environment sys.frame(frame), but the semantics are
a bit confusing.
3.(a) For the first issue, it would be nice to use where
throughout,
but there is clearly a serious compatibility issue.
3.(b) For the second, one could add a frame argument and treat it consistently by making the default expression for the environment be (if(missing(frame)) as.environment(where) else sys.frame(frame)) This is what the revised version of exists does.
Right now, these are identical. That doesn't seem too useful, and it would be mildly more convenient if remove expected a character vector, as it does in S-Plus. It would just let one say remove(objects(....)) rather than remove(list = objects(....)) (On the other hand, at least both rm and remove in R take a position argument, which is more consistent than S-Plus, where remove does but rm doesn't.)
What's really going on with the where
argument is that we want
to make the various functions behave correctly for the database
defined by where
.
In other words, we would like get and the other functions to have
methods based on the where
argument.
We can't do that directly until S4-style methods are introduced, since for most of the functions the corresponding argument is not the first one.
As an interim solution, we could make the as.environment function mentioned in item 1 into an S3-style generic. If it became a primitive (as pos.to.env is now) we could dispatch it from the C code if efficiency is a big deal here (which I doubt).
The interim solution is not as good, though, for future work, because in some cases (imagine various database interfaces that let you attach a database table) you don't want to create in intermediate R environment object, but rather to go off to the appropriate interface directly for each of the functions. For that, you really do want methods for get, exists, etc.
pos
and the inherits
argument.
In terms of the current implementation, this amounts to what pos.to.env(-1) means. From the implementation and the error message, it means "the environment of the parent of the call to pos.to.env". The documentation of pos.to.env, on the other hand, claims you have to give a positive integer as an argument.
The actual semantics are mostly OK but the situation is a bit subtle.
When no where
or pos
argument is given exists
or get
, the
intended semantics is "search for the name in this function call
and then in the search list". What seems confusing, if not
downright inconsistent, is that the -1
argument
behaves rather differently depending on the function.
For example, the functions rm
and assign
have
optional inherits
arguments also. If
inherits
is TRUE
, then both functions
search back through parents to match names.
Since both functions are ``destructive'' to the environments they
work in, I don't think it's a great idea to have them swinging
back into, for example, the global environment from inside a
function call.
For rm
this is inconsistent with S-Plus, and rather nervous-making. For
assign
, it seems bizarre--why would the
existence of an object with the same name affect the behavior of
assign
?
(At least inherits
is FALSE
by default
in these functions.)