THE R TASK LIST
``Somebody, somebody has to, you see ...''
The Cat in the Hat Comes Back.
----------------------------------------------------------------------
TASK: Multiple Graphics Device Drivers
STATUS: Open
FROM: Everyone
R needs to have multiple active device drivers and a means for
copying pictures from one device to another, etc. etc.
[ This is a medium-sized task. It would be most useful to ]
[ do this in conjunction with moving to an event driven model. ]
[ Greg Warnes has written some code which maintains, a device ]
[ "display list". How much memory this might devour in the ]
[ multiple device case is an open question. There is also ]
[ the question of what to do about the graphics parameters. ]
[ Should each device maintain a complete "par" state, or ]
[ should some parameters (like col, lty, font ...) be global. ]
[ Could a user have any memory of the last values in effect ]
[ for a driver which had been idle for a while. ]
[ This is just about to hit the top of the list. ]
----------------------------------------------------------------------
TASK: complex gamma and log gamma function not implemented
STATUS: Open
FROM: R@stat.auckland.ac.nz
[ This is quite low priority. Complain if you need it. ]
[ The Fullerton library has complex gamma function code. ]
----------------------------------------------------------------------
TASK: solution of complex linear systems
STATUS: Open
FROM: R@stat.auckland.ac.nz
[ Really just a matter of grabbing the correct linpack code. ]
[ How general do we want to be here ... ]
----------------------------------------------------------------------
TASK: "nlm" documentation inaccuracies
STATUS: Open
FROM: jlindsey@luc.ac.be
The help for nlm is still called minimize although the
contents have been updated. As well, when an illegal
value is fed to nlm, the error message contains msg
instead of print.level.
[ The documentation looks ok. The function needs to be ]
[ rewritten so that it uses derivative information. ]
----------------------------------------------------------------------
TASK: "data.entry" problems
STATUS: Open
FROM: p.dalgaard@kubism.ku.dk
the as.character problem in de() - probably better to fix even
though it does make lists out of frames.
there's no way to change a data value to NA in data.entry, etc.
... earlier message ...
(Peter Dalgaard) data.entry et al do not seem to have been
adjusted for the new data frame structure. This is actually
a problem where a list is passed where a vector of character
strings is expected. To fix it change
snames <- substitute(list(...))[-1]
to
snames <- as.character(substitute(list(...))[-1])
However, there needs to be a look at the de... code. When
a data frame is edited it is returned as a list. This can
be cured with judicious use of "data.frame".
[ The indicated change has been made, but other changes ]
[ are needed. ]
----------------------------------------------------------------------
TASK: "x11" printcmd
STATUS: Open
FROM: maechler@stat.math.ethz.ch
There is in theory a "printcmd" argument to x11, which
is ignored. Make it do something.
----------------------------------------------------------------------
TASK: "source" requires a terminating newline on EOF
STATUS: Open
FROM: Kurt.Hornik@ci.tuwien.ac.at
source() fails in many cases where a file has no final
newline. (R&R, sorry for being ridiculouly nasty about
things that don't work for files without a final newline.
I have Emacs' next-line-add-newlines set to nil ...)
This seems to be a problem with parse() in src/main/source.c
in combo with the code in gram.y ...
I know this is NOT something to quickly fix over the weekend.
Please simply put it into your PROJECTS file.
[ This is actually a syntax error according to the R grammar ]
[ but maybe we can do something. ]
----------------------------------------------------------------------
TASK: help file ALIAS() and LINK() constructions
STATUS: Open
FROM: R@stat.auckland.ac.nz
How do we know which file to LINK to? There needs to a step
which fills in the file name on the basis of all ALIAS
declarations.
[ A proprocessing step is needed. First we build a table ]
[ of aliases and corresponding file names. Then we pass ]
[ throught the files building the correct LINK references. ]
----------------------------------------------------------------------
TASK: "paste" problem
STATUS: Open
FROM: maechler@stat.math.ethz.ch
in S,
paste(....., collapse = string)
always returns ONE string (a character vector of length 1),
according to documentation and several examples.
in R, this is not true:
R> paste(rep(" ",0), collapse="...") #anything for collapse
character(0)
S> paste(rep(" ",0), collapse="...") #anything for collapse
[1] ""
Again, I think R is more logical than S here, but it was decided
that in minor cases, compatibility comes first...
[ Low priority. Complain if you really need it. ]
----------------------------------------------------------------------
TASK: missing functionality - modelling
STATUS: Open
FROM: maechler@stat.math.ethz.ch
aov, print.aov, summary.aov,... (!)
which I really missed for teaching a few months ago.
[ We'll get to this - it actually should be fun. ]
----------------------------------------------------------------------
TASK: warnings option
STATUS: Open
FROM: maechler@stat.math.ethz.ch
which reminds me that we/I also would like something similar as S's
options(warn = k)
k= 0 : [default] print warnings
k= -1 : do nothing (maybe append warnings to some temp-file)
k= 1 : produce an error ('warning' becomes 'stop').
----------------------------------------------------------------------
TASK: R has no stderr
STATUS: Open
FROM: Friedrich.Leisch@ci.tuwien.ac.at
When I invoke R like
R 2>errlog
I would error messages expect to go to the file errlog
instead of the screen.
[ We don't have standard error. This is problematic on ]
[ platforms other than Unix.
----------------------------------------------------------------------
TASK: "print.default" fix
STATUS: Open
FROM: la-jassine@aix.pacwan.net
When you fix print.default, please also add prefix=
----------------------------------------------------------------------
TASK: "print.default" fix
STATUS: Open
FROM: jlindsey@luc.ac.be
print.default in S has an option, right=T, but R does not
----------------------------------------------------------------------
TASK: "postscript" fix
STATUS: Open
FROM: la-jassine@aix.pacwan.net
postscript() also needs the options onefile, print.it, and
append (even if they are not supported yet it would be nice if
the arguments could be accepted and ignored).
[ I added these as arguments, but they have no effect. ]
----------------------------------------------------------------------
TASK: task scheduling
STATUS: Open
FROM: gwhite@cabot.bio.dfo.ca
More generally, the range of things that can be done in R would
be greater if there was a simple scheduling mechanism. Is
there a way to have a specific function invoked just before the
command prompt returns after a function? Such a function could
be used to run save(...) or check for various external cues
(update of a file's timestamp) to control an analysis.
I doubt it would make sense to have full context switching in
R, but perhaps save() could be done in a way that would allow
it to be used even in a long calculation under some timer
control. I expect the user would need to provide a list of the
data objects that need to be saved.
----------------------------------------------------------------------
TASK: Inf numerics
STATUS: Open
FROM: plummer@iarc.fr
Could we have an Inf object in R? I would find it useful.
[ Sigh. I wish we had designed this in. ]
[ It will be a pain to ADD. ]
----------------------------------------------------------------------
TASK: Auto-save
STATUS: Open
FROM:
> BTW: How about putting auto-save-workspace on the task list?
> Or just a manual save.work() currently, you can lose quite a
> bit of work to an unexpected segfault. (And q()+restart is
> cumbersome, esp. if you need to reattach subsetted dataframes, etc.)
Perhaps call it save.image() instead and use
save(list = ls(), file = ".RData")
as was suggested some time ago?
(Whatever the result is, it needs to go in the FAQ, which goes into
great length about that under R data can get lost when a crash occurs,
but does not say how to save them ...)
----------------------------------------------------------------------
TASK: "chisquare.test" problem
STATUS: Open
FROM:
Can you change the explicit "cat" statement in the
"chisquare.test" function which insists on writing to the
screen even when the output is redirected to a variable? (Using
"htest" class as in "t.test" function.)
[ Should we switch to the library one. ]
----------------------------------------------------------------------
TASK: Graphics inconsistencies
STATUS: Openish
FROM: Bill.Venables@adelaide.edu.au
While transferring some old S-code I came across some minor
inconsistencies between R and S that are probably more nuisance
value than they would take to fix. I report them here for
reference, (but not in any campaigning mood, of course...)
1. No frame() command in R and so no graceful way to clear a
plotting screen. (Or is there?)
[ Added ]
2. There is a dev.off() function, but no other dev.xxx functions.
(The dev.xxx group are S-PLUS and not vanilla S, by the way.)
There is no graphics.off() function.
[ Long-term project ]
3. If dfr is a data frame with components "x", "y" and some
others then points(dfr) uses dfr as an xy-list in S but not in
R. If there is some non-numeric component it actually fails
in R. This may be S being a bit inconsistent, but the
behaviour is different.
[ Fixed? ]
4. The plotting marks are a bit gappy in R and even the ones
that are there do not correspond to their S counterparts.
Here is a little function to make a wall chart showing the
gaps:
[ We now have all the S symbols and a new set of R ones. ]
show.marks <- function()
{
if(!exists(".Device") || is.null(.Device)) x11()
plot(1, type="n", axes=F, xlab="", ylab="")
oldpar <- par()
par(usr = c(-0.01, 5.01, -0.01, 5.01), pty = "s")
for(i in 0:18) {
x <- 1/2 + (i %% 5)
y <- 4.5 - (1/2 + (i %/% 5))
points(x + 1/5, y - 1/5, pch = i, cex = 3)
text(x - 1/5, y + 1/5, i, adj = 0.5, cex = 1.5)
}
abline(h = 1:5 - 0.5, lty = 1)
segments(0:5, rep(0.5, 5), 0:5, rep(4.5, 5))
par(oldpar)
invisible()
}
5. In S you may extend a list by assigning to a new component.
For example if lis has components "x" and "y", only, you can
extend it by assigning to lis$z, lis["z"] or lis[, "z] (the
last if it is also a data frame). In R only the first of
these works; the others give a "subscript out of bounds"
error. (This may have been discussed while I was not paying
attention, in which case I apologize.)
[ Fixed in 0.50. ]
----------------------------------------------------------------------
TASK: Function pointer access
STATUS: Open
FROM:
I want to report two problems with the Fortran code of R.
1) Configure does not adapt GETSYMBOLS.in if the Fortran Compiler
does not add underscores to the symbol names.
2) There is a name conflict if the Fortran Compiler does not add
underscores because there exist a Fortran function FMIN and a
C function fmin(). Thus the name of the Fortran FMIN should be
changed.
[ This is fixed I think. ]
Currently I am rewriting my robust location-scale code in C. I
intend to make this new code available as a library once a
standard for such libraries has been agreed upon. As I would
like to allow prospective users to experiment with private
psi/chi functions I need access to the hash table of available
function pointers. Is it possible that you insert a function
into dotcode.c that contains the code fragment form lines 482
to 495 and returns a function pointer?
----------------------------------------------------------------------
TASK: Partial string matching
STATUS: Open
FROM:
Is there an existing partial string match function which could
be used in place of pstrmatch in subset.c???
If not can pstrmatch take on the functions of all partial match
functions?
----------------------------------------------------------------------
Post 0.49 Additions
----------------------------------------------------------------------
TASK: Name Attributes on Calls
STATUS: Closed (almost)
FROM:
A call with tagged arguments is something like a list, the tags
can be used to access elements, but the names attribute is absent,
until the call is coerced to a list. (Attempting to set the names()
causes evaluation. Changing "list" to "blipblop" causes an 'Error:
couldn't find function "blipblop"' at that point.)
> j<-substitute(list(a=1, b=2))
> j
list(a = 1, b = 2)
> j$b
[1] 2
> names(j)
NULL
> names(j)<-NULL
> j
[[1]]
[1] 1
[[2]]
[1] 2
[At least under SunOS this is fixed. RG]
[However, 'names(j) <- NULL' has no effect in R, but does in S. MM]
----------------------------------------------------------------------
TASK: String NAs Via the Back Door.
STATUS: Open
FROM:
Ok, the right solution seems to be names(as.list(j)), but then we run
into some other fun with NA's... Shouldn't the real NA print without
quotes?
> ch[1]<-paste("N","A",sep="")
> is.na(ch)
[1] FALSE FALSE FALSE
> ch
[1] "NA" "a" "b"
> ch[1]=="NA"
[1] TRUE
> ch[1]<-"NA"
> is.na(ch)
[1] TRUE FALSE FALSE
[ We need a real NA. At present there is confusion between ]
[ the string "NA" and the NA value for strings. One solution ]
[ would be to use R_NilValue to indicate the missing string ]
[ value, and let NA be just an ordinary string in all cases. ]
[ This would be incompatible with S, but still an improvement. ]
----------------------------------------------------------------------
TASK: Directory Structure
STATUS: Open
FROM: + Friedrich + Paul Gilbert
> Regarding the location of data for libraries it might be easier if
> everything for one library is included in one subdirectory. At least
> it would certainly be easier to clean-up, which I like to do every few
> years. Thus the code file, data, and any compiled code would be in
> one subdirectory under $RHOME/library.
Like
library//
library//data
library//exec (scripts and or binaries which
only make sense for the add-on)
library//funs
library//help
library//html
library//objs (*.so)
???
> I realize this means a small change to the way libraries are now
> found, but in the end I think it would be much cleaner.
I think the changes would not be too hard, and we need to do something
about the directory structure anyway.
Actually, I think if R&R ok'ed something like that, Fritz and I would
take a look.
(In a way, I NEED to do something like that anyway, because I promised
it for making an official Debian package ...)
Would it mean that we also employ the S library/section concept?
----------------------------------------------------------------------
TASK: Startup Processing
STATUS: Open
FROM:
The x11() window can be a nuisance to have popping up at startup (esp.
on small screens) when you're not working with graphics. However,
currently you can't get rid of it without modifying the systemwide
Rprofile.
Current logic is:
Run $RHOME/library/Rprofile
if ./.Rprofile exists
run it
else if $HOME/.Rprofile exists
run that
endif
I think it should be
Run $RHOME/library/Rsetup
if ./.Rprofile exists
run it
else if $HOME/.Rprofile exists
run that
else if $RHOME/library/Rprofile exists
run that
endif
i.e. essential system initialisation goes in Rsetup, the rest in
Rprofile, which can be overridden by the user. Currently, the line
if(interactive()) x11()
is the candidate to move from one to the other. BTW, it really should read
if(interactive() && getenv("DISPLAY")!="") x11()
[BTW2: getenv() implemented using system()? is that really necessary?]
>>
I more or less agree, BUT:
I'd like (in the future) to have the system-wide Rprofile searched in a
site-specific location as well (similar to Emacs, following the idea of
keeping the distribution and the site-specific things apart).
So it would be
system-wide Rsetup (which should basically be platform-specific
stuff, cause otherwise it could go into base as well?)
if .Rprofile exists run it else
if ~/.Rprofile exists run it else
if Rprofile exists on the default library search path, run it
and that search path could e.g. specify all `library' trees with a
compile-time default of
~/lib/R:/usr/local/lib/R/site:/usr/local/lib/R/${version}
and settable at run time via e.g. the environment variable R_PATH.
----------------------------------------------------------------------
TASK: Old Unfixed Problems
STATUS: Open
FROM:
I noticed the following problems (all already reported, but not in
TASKS).
* TASKS.OLD has
Btw, here's another way to produce a segfault with admittedly
nonsense code:
R> x <- 1:5
R> dimnames(x)[1,2] <- NULL
Segmentation fault
[ Hmmm. This seems to have gone away. I get the error ]
[ message "Error: incorrect number of subscripts on array". ]
[ Verified by Rossini ...]
On my Linux system, I still get the segfault. Perhaps others could
check that?
* File permissions in data should be 644.
* In src/unix/system.c, one `Rdata' should be `RData' (d -> D).
* The documentation for the noncentral chisquare distribution is not
quite correct. (rnchisq does not exist, the existing functions have x,
df and the noncentrality parameter as args, and the density should be
pnchisq(x, df, lambda)
= exp(-lambda / 2)
* sum_{r=0}^\infty \frac{lambda^r}{2^r r!} pchisq(x, df + 2r)
(semiTeX notation only, sorry).
----------------------------------------------------------------------
TASK: New Problems
STATUS: Open
FROM:
New minor remarks:
* The documentation for `image' still has the old order z, x, y.
* Perhaps one should add `par(ask = T)' in the image demo?
* Perhaps one should save the original value of par() at the beginning
of the graphics demo, and restore that at its end (s.t. typically asking
is turned off again).
----------------------------------------------------------------------
TASK: Multiplatform Support
STATUS: Open
FROM:
I've modified the "$RHOME/bin/R" and "$RHOME/cmd/filename" so that you
can use the same directories for multiple machines. That is, machines
running various flavors of UNIX can access the same directories.
The modified structure adds the directories
$RHOME/bin/$OSTYPE/
$RHOME/lib/$OSTYPE/
to hold the machine specific binaries.
For instance, here the $RHOME directory contains two subdirectories,
$RHOME/bin/solaris/
$RHOME/bin/sunos4/
which each hold the appropriate R.binary file.
These two modified functions assume that the environment variable $OSTYPE
is appropriately set, as is done automatically by the shell tcsh. If it
is not set, the directory names collapse to the original values,
$RHOME/bin/ and $RHOME/lib/
To use them, create the approprate directories and place the correct
binaries therein. ( Note that the makefiles will not do this
automatically!) Then replace $RHOME/bin/R and $RHOME/cmd/filename with
the modified ones.
----------------------------------------------------------------------
TASK: Platform Independence
STATUS: Open
FROM: Friedrich.Leisch@ci.tuwien.ac.at
IMHO we should definetely have platform-dirs for everything that's
possibly platform-dependent ... resulting in something like
//
e.g. for R code and
///
e.g. for exec and dynload-objects.
for exec there's a problem though, as some exec's are
shell/perl/whatever-scripts and *should* work on any platform
...
----------------------------------------------------------------------
TASK: Poly
STATUS: Open
FROM:
PS1. There was also `poly' function in your snapshot WORK tree
... do you already have a final version of that?
----------------------------------------------------------------------
TASK: Naming with Numeric Values and "unlist"
STATUS: Open
FROM:
R> l <- list("11" = 1:5)
R> l
$11
[1] 1 2 3 4 5
R> unlist(l)
111 112 113 114 115
1 2 3 4 5
[ Bug or feature ? ]
----------------------------------------------------------------------
TASK: all.names needed
STATUS: Open
FROM:
I could not find the all.names function in R so I created the
enclosed. Comments, criticisms, or changes to a one-liner by creating
nested anonymous functions are welcome. I'll try to work out a
corresponding all.vars function.
### $Id: TASKS,v 1.1 1997/09/18 04:36:42 r Exp $
### Some replacement functions that are missing in R
### Determine all the names (symbols) occuring in an object.
### This is probably grossly inefficient.
all.names <-
function (x)
{
if (mode(x) == "symbol")
return(as.character(x))
if (length(x) == 0)
return(NULL)
if (is.recursive(x))
return(unlist(lapply(as.list(x), all.names)))
character(0)
}
----------------------------------------------------------------------
TASK: "sys.function" problem
STATUS: Open
FROM:
I attempted to create a recursive anonymous function to be called
within another function. You may want to stop reading for a bit and
consider how that would be done. That is, how do you recursively call
a function that has never been assigned a name?
OK, you're back. You probably came up with a better solution than I
did but I used (sys.function())(arg) to do the recursion. The piece
of code looks like
flist <- (function(x) {
if (mode(x) == "call") {
if (x[[1]] == as.name("/"))
return(c(sys.function()(x[[2]]), sys.function()(x[[3]])))
if (x[[1]] == as.name("(")) # for R
return(sys.function()(x[[2]]))
}
if (mode(x) == "(") return(sys.function()(x[[2]])) # for S
list(x)
})(getGroupsFormula(data, form, ...)[[2]])
## I know it's horribly obscure. Blame Bill Venables for teaching me this.
Regretably, it doesn't work in R. Using the debugger one finds that
sys.function() returns the function being called the first time
through but the second time through it returns NULL. Is this a bug or
a feature?
----------------------------------------------------------------------
TASK: Matrix multiply problems
STATUS: Open
FROM:
Both of these used to work and seem useful and harmless:
R> matrix(1,ncol=1)%*%c(1,2)
Error in matrix(1, ncol = 1) %*% c(1, 2) : non-conformable arguments
R> matrix(1,ncol=1)*(1:2)
Error: dim<- length of dims do not match the length of object
----------------------------------------------------------------------
TASK: "update" comments and fixes
STATUS: Open
FROM:
1. To make update() work with a new formula for glms, change the
first line of the glm() function from
call <- sys.call(
to
call<-match.call()
(this means that the formula component of the returned call is
labelled so that update can find it)
2. update.lm doesn't do anything with its weights= argument
Add
if (!missing(weights))
call$weights<-substitute(weights)
Similarly, to get update to work properly on glms you need a lot
more of these if statements (see update.glm at the end of the message).
3. update.lm evaluates its arguments in the wrong frame.
It creates a modified version of the original call and evaluates
it in sys.frame(sys.parent()). If update.lm is called directly
this is correct, but if it is called via update() the correct
frame is sys.frame(sys.parent(2)). Worse still, if it is called
by NextMethod() from another update.foo() the correct frame is
still higher up the list.
My solution (a bit ugly) is to move up the list of enclosing calls
checking at each stage to see if the call is NextMethod, update or an
update method. It can be seen at the end of update.glm at the bottom of
this message, and something of this sort needs to be added to other update
methods.
update.glm<-function (glm.obj, formula, data, weights, subset,
na.action, offset, family, x)
{
call <- glm.obj$call
if (!missing(formula))
call$formula <- update.formula(call$formula, formula)
if (!missing(data))
call$data <- substitute(data)
if (!missing(subset))
call$subset <- substitute(subset)
if (!missing(na.action))
call$na.action <- substitute(na.action)
if (!missing(weights))
call$weights <- substitute(weights)
if (!missing(offset))
call$offset <- substitute(offset)
if (!missing(family))
call$family <- substitute(family)
if (!missing(x))
call$x <- substitute(x)
notparent <- c("NextMethod", "update", methods(update))
for (i in 1:(1+sys.parent())) {
parent <- sys.call(-i)[[1]]
if (is.null(parent))
break
if (is.na(match(as.character(parent), notparent)))
break
}
eval(call, sys.frame(-i))
}
----------------------------------------------------------------------
TASK: Wisdom
STATUS: Open
FROM:
Some of the "eternal truths" about the S language are:
- every object has a mode obtainable by mode(object)
- every object has a length obtainable by length(object)
- every object can be coerced to a list of the same length
One can imagine that code that messes around with functions and
other expressions in R will break fairly quickly when these
conditions do not hold. I don't know how much work would be
involved in patching over these differences between R and S
but I suspect it would not be a trivial undertaking.
----------------------------------------------------------------------
TASK: frametools
STATUS: Open
FROM:
The following three functions are designed to make manipulation of
dataframes easier. I won't write detailed docs just now, but if you
follow the example below, you should get the general picture. Comments
are welcome, esp. re. naming conventions.
Note that these functions are definitely not portable to S because
they rely on R's scoping rules. Not that difficult to fix, though: The
nm vector and the "parsing" functions need to get assigned to
(evaluation) frame 1 (the "expression frame" of S), and preferably
removed at exit.
data(airquality)
aq<-airquality[1:10,]
select.frame(aq,Ozone:Temp)
subset.frame(aq,Ozone>20)
modify.frame(aq,ratio=Ozone/Temp)
Notice that in modify.frame(), any *new* variable must appear as a
tag, not as the result of an assignment, i.e.:
modify.frame(aq,Ozone<-log(Ozone)) works as expected
modify.frame(aq,lOzone<-log(Ozone)) does not.
This is mainly because it was tricky to figure out what part of a left
hand side constitutes a new variable to be created (note that indexing
could be involved). So assignments to non-existing variables just
create them as local variables within the function. Making a virtue
out of necessity, that might actually be considered a feature...
----------------------------------------
"select.frame" <-
function (dfr, ...)
{
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
e
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
n <- match(as.character(e), nm)
if (is.na(n))
e
else n
}
nm <- names(dfr)
e <- substitute(c(...))
dfr[, eval(subst.expr(e))]
}
"modify.frame" <-
function (dfr, ...)
{
nm <- names(dfr)
e <- substitute(list(...))
if (length(e) < 2)
return(dfr)
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
substitute(e)
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
if (is.na(n <- match(as.character(e), nm)))
if (is.atomic(e))
e
else substitute(e)
else substitute(dfr[, n])
}
tags <- names(as.list(e))
for (i in 2:length(e)) {
ee <- subst.expr(e[[i]])
r <- eval(ee)
if (!is.na(tags[i])) {
if (is.na(n <- match(as.character(tags[i]),
nm))) {
n <- length(nm) + 1
dfr[[n]] <- numeric(nrow(dfr))
names(dfr)[n] <- tags[i]
nm <- names(dfr)
}
dfr[[tags[i]]][] <- r
}
}
dfr
}
"subset.frame" <-
function (dfr, expr)
{
nm <- names(dfr)
e <- substitute(expr)
subst.call <- function(e) {
if (length(e) > 1)
for (i in 2:length(e)) e[[i]] <- subst.expr(e[[i]])
e
}
subst.expr <- function(e) {
if (is.call(e))
subst.call(e)
else match.expr(e)
}
match.expr <- function(e) {
if (is.na(n <- match(as.character(e), nm)))
e
else dfr[, n]
}
r <- eval(subst.expr(e))
r <- r & !is.na(r)
dfr[r, ]
}
----------------------------------------------------------------------
TASK: General Problems
STATUS: Open
FROM:
1. A gentle reminder that the default has not been changed for saving
.RData in batch mode (as was promised).
2. The degrees of freedom for the null deviance in glm are wrong when
some observations are weighted out. This can give silly answers, for
example when applying anova. The number of weighted out observations
should be subtracted, as in other df calculations.
3. The null deviance itself is wrong in glm when an offset is used. It
can be smaller than that when variables are added to the model!
4. R gave a segmentation fault when I tried to fit a model with 49
factor levels in glm (using R -v4). All these glm problems were with
poisson.
5. R still does not read my environmental variables to set memory
size.
Suggestions:
1. d, p, q, and r functions for inverse Gauss and Laplace
distributions.
2. Add a fifth function for continuous distributions, the hazard
function, h. For example, ht <- function(...) dt(...)/(1-pt(...))
is the Student t hazard function.
For writing likelihood functions, these would be much faster in C than
R and some such as Weibull can be simplified.
3. Add the five functions for three parameter distributions such as
generalized F, extreme value, etc., Box-Cox,... (I have the densities,
cumulative, and hazard as R functions.)
4. Philippe Lambert and I have d and p functions working in R for the
four-parameter stable family by inverting the characteristic function
with a Fourier transform (requires C code). S-plus only has the r
function for stables.
----------------------------------------------------------------------
TASK: Generic Print
STATUS: Open
FROM: Paul Gilbert
I have always thought that typing the name of an object generated
a call to the print method for the object, however, (in 0.49)
I redefined the generic print method as
print <- function(x, ...)
{if (is.tframe(x)) UseMethod("print.tframe")
else UseMethod("print")
}
Now I have an object z which returns TRUE to is.tframe(z) and
> class(z)
[1] "ts" "tframe"
Then
> print(z)
[1] 1981.50 2006.25 4.00
But
> z
Error: comparison is possible only for vector types
> traceback()
[1] "c(\"print.ts(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
[2] "c(\"print(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
This is generating a call to the class method print.ts
rather than to print.tframe.ts as is done when I use
print(z). If my understanding that typing the name of an
object should generate a call to the print method for the
object then this is a bug. Otherwise, could someone please
explain to me what it does. Thanks.
----------------------------------------------------------------------
TASK: getenv()
STATUS: Open
FROM: Paul Gilbert
Here are two small problems I've pointed out before, but still
seem to be in 0.49.
1/ getenv() should return everything, not complain missing item.
----------------------------------------------------------------------
TASK: summary.default
STATUS: Open
FROM: Paul Gilbert
2/ In summary.default
...
sumry[i, 2] <- if (is.object(ii))
class(ii)
should be changed to
...
sumry[i, 2] <- if (is.object(ii))
paste(class(ii), collapse=" ")
so that it works with lists of lists. (This fix was suppose to be
added to Splus 4.)
----------------------------------------------------------------------
TASK: Time Series Problems
STATUS: Open
FROM:
Here are four problems with ts:
1/ ts matrix subscripting should support drop=F:
> z<- matrix(1:10,5,2)
> z <-ts(z)
> z[,1,drop=F]
Error in [.ts(z, , 1, drop = F) : unused argument to function
2/ == and other comparisons with non-ts matrices should work:
> z <- matrix( 1:10,5,2)
> ts(z)
Time-Series:
Start = c(1, 1)
End = c(5, 1)
Frequency = 1
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> z == ts(z)
Error: invalid time series parameters specified
3/ The generic functions start and end need default methods to return a
result for matrices as previously and in S. The following seems to work.
start.default <- function (x) start(ts(x))
end.default <- function (x) end(ts(x))
4/ In the function start.ts (and in end.ts) ts[1] in the last line
is not defined. Perhaps I am missing something?
start.ts
function (x)
{
ts.eps <- .Options$ts.eps
if (is.null(ts.eps))
ts.eps <- 1e-06
tsp <- attr(as.ts(x), "tsp")
is <- tsp[1] * tsp[3]
if (abs(is - round(is)) < ts.eps) {
is <- floor(tsp[1])
fs <- floor(tsp[3] * (tsp[1] - is) + 0.001)
c(is, fs + 1)
}
else ts[1]
}
----------------------------------------------------------------------
TASK: Recycling problems
STATUS: Open
FROM: Paul Gilbert
In R 0.49 comparison of logic matrices with & and | seems
to sometimes generate false warning messages about longer
object length is not a multiple of shorter object length.
I have not been able to isolate the exact circumstances.
----------------------------------------------------------------------
TASK: Generic "write" function
STATUS: Open
FROM:
Following my posting of a write.table() function, Martin
suggested that one could have a generic write() function
and special methods for e.g. time series, data frames, etc.
Well, a month has passed since ...
What does everyone think? Is it a good idea, or would
write.table() be enough? If we think that it is not enough,
which arguments should the write methods typically allow?
What about
write.xxx (x, # object
file = # filename, default stdout
append = # obvious
sep = # obvious
eol = # end of line char
...)
???
On the other hand, it seems clear that something like
write.table() is nice, and what it should do. But what
about classes other than data.frame?
Note that S has a write(.) function which would be our
write.default(.)
your write.table would be our
write.data.frame
The only addition would be a 'write.matrix' which would be 'like'
write.data.frame, the only problem being that 'matrix' is not a class
(yet).
[Note that in S4, everything has a class;
I'm voting for matrices to have a class in R ..]
write.default could 'despatch' to write.matrix if x is a matrix.
----------------------------------------------------------------------
TASK: Comparison with NA and Zero-Length Vectors
STATUS: Open
FROM: +
Thomas: Any comparison with NULL generates an error
Error: comparison is possible only for vector types
whereas in S(-PLUS) it gives NA, which seems more sensible.
Along similar lines, comparison with a length 0 vector
returns logical(0) in R but NA in S.
Martin: Isn't logical(0) more logical than NA ?
I agree that it would be best (convenience)
if 'NULL==1' returned the same as 'numeric(0)==1'.
At the moment, I don't see why compatibility with S should be
important here:
if( NULL == anything)
or, e.g., if( numeric(0) == numeric(0) )
give an error anyway, i.e., you have to test for length 0 _anyway_
in the cases where one comparison argument may have zero length.
Thomas: I didn't (previously) make any comment on this --
I only said that NA was more logical than an error message.
However, the advantage of returning NA is that NA | TRUE
is TRUE, NA & FALSE is FALSE, which doesn't happen with
logical(0). Also, from a compatibility point of view one
of them is tested with is.na(), the other with length(),
so it can matter which one you use. Of course no-one should
deliberately write code where it matters, but these things
happen.
It seems in fact that logical(0) | TRUE causes R to freeze
(R0.49, sparc solaris).
Robert: Well, we thought
logical(0) & T should return logical(0)
logical(0) | T should return logical(0)
already we have NA | T returns T and NA & T returns NA
----------------------------------------------------------------------
TASK: Modules
STATUS: Open
FROM:
I came across a paper on scheme module design that may be
under consideration for rnrs -- I'm a bit hazy on that. At
any rate, it is at http://www.cs.princeton.edu/~blume/modules.dvi.
I haven't read it carefully yet, but it is fairly heavily
influenced by SML but doesn't go too far overboard (well
maybe a bit).
----------------------------------------------------------------------
May 1.
----------------------------------------------------------------------
TASK: "abline" incompatibility
STATUS: Closed; Fixed uncertain why.... (Aug 6, 97).
FROM:
I found a little different behavior of R with S.
at R-0.49:
> a
[1] 12 23 22 34 44 54 55 70 78
> plot(a)
> abline(lsfit(seq(1,len=length(a)), a))
Error: no applicable method for "coefficients"
at S (from AT&T '92) result draw coefficient line without error.
Then I think to need define a function as followed:
coefficients.default <- function(x) x$coef
----------------------------------------------------------------------
TASK: Legend problems
STATUS: Open
FROM:
When legend is used, the box around it has the line-type
of the last call to lines or plot instead of solid always.
----------------------------------------------------------------------
TASK: "rnorm" change
STATUS: Open
FROM: Paul Gilbert
For some reason I cannot determine, the function rnorm
seems to be returning different values in R 0.49 than it
did in R 0.16.1 (in Linux ELF). The function runif is
unchanged.
[ I believe I changed the underlying generator. ]
[ I was worried about behavior in the extreme tails. ]
[ Should we change back again? ]
----------------------------------------------------------------------
TASK: "formula" problems
STATUS: Open
FROM:
Several bugs (no solutions, yet). These might be well known.
1) If one does, e.g., mymod <- lm(y ~ x); formula(mymod)
then one does not get back the formula (one gets,
Error: invalid formula)
CLOSED (Aug 6, 97, RG).
2) if x is of mode numeric, then the model formula
mymod <- lm(y ~ x + x^2)
is not processed as S would do it. The model is fit
ignoring the x^2 term, however mymod$call includes the x^2
term. This seems to be a bug (or maybe feature) in applying
model formulae operators to numeric quantities. I expect
(from experience with S) that x^2 will be interpreted as
a math operator. Whatever the right thing to do is, it
needs to be documented.
----------------------------------------------------------------------
TASK: formula problems
STATUS: Open
FROM:
Mike Meyer writes:
> Several bugs (no solutions, yet). These might be well known.
> 1) If one does, e.g., mymod <- lm(y ~ x); formula(mymod)
> then one does not get back the formula (one gets, Error: invalid formula)
Yep. Seems that we need a
formula.lm<-function(x)formula(x$terms)
> 2) if x is of mode numeric, then the model formula
> mymod <- lm(y ~ x + x^2)
> is not processed as S would do it. The model is fit ignoring the x^2 term,
We had that topic a while back. I think it was concluded
that it is a feature, because mixing model formulas and
arithmetic ditto is bad practice. (I don't have any strong
feeling about this, personally. As long as R won't introduce
those awful Helmert contrasts as default...)
----------------------------------------------------------------------
TASK: formula problems
STATUS: Open
FROM:
Peter Dalgaard writes:
> >
> > 2) if x is of mode numeric, then the model formula
> > mymod <- lm(y ~ x + x^2)
> > is not processed as S would do it. The model is fit[ted]
> > ignoring the x^2 term...
>
> We had that topic a while back. I think it was concluded that
> it is a feature, because mixing model formulas and arithmetic
> ditto is bad practice.
I don't recall we did, but in any case I'd like to re-open it.
There is an anomaly in the way : and ^ terms are handled in the
sense that the logical and useful thing is obvious but does not
happen. Let me give an example. Suppose a and b are factors, x
and y are not.
A term such as (a + b + x + y)^2 should be expanded out binomial
fashion, coefficients stripped away and the remaining products
treated as : products. Then S copes with terms like a:a, a:b and
a:x fine, even x:y is handled by having it generate a column of
xy-products, as it should.
But a term such as x:x does not generate a column of x-squares,
it is merely removed as it would be if it were a factor. This is
a complete anomaly, and one that I don't think would be hard or
dangerous for R to rectify. Indeed it would be very useful to
generate a complete second degree regression in three variables
using y ~ (1 + x1 + x2 + x3)^2. As it is now it generates linear
and product terms only and omits the powers. Go figure.
> (I don't have any strong feeling about this, personally. As
> long as R won't introduce those awful Helmert contrasts as
> default...)
Ah, the Helmert contrasts b\^ete noir. For ANOVA the contrast
matrix used is mostly irrelevant. For regression models I agree,
treatment contrasts would be generally more easily interpreted.
I presume the reason they were used at all is because if you have
equal replication of everything the Helmert contrasts give you a
model matrix with orthogonal columns, so all estimates are
uncorrelated. Whenever do you get equal replication, though?
----------------------------------------------------------------------
TASK: formula problems
STATUS: Open
FROM:
Bill Venables writes:
> A term such as (a + b + x + y)^2 should be expanded out binomial
> fashion, coefficients stripped away and the remaining products
> treated as : products. Then S copes with terms like a:a, a:b and
> a:x fine, even x:y is handled by having it generate a column of
> xy-products, as it should.
I tend to agree.
> Ah, the Helmert contrasts b\^ete noir. For ANOVA the contrast
> matrix used is mostly irrelevant. For regression models I agree,
> treatment contrasts would be generally more easily interpreted.
Understatement of the year... Last time I bumped into them, it took me
and a colleague more than an hour to figure out how to interpret the
regression coefficients, and, I may add, the solution was *not* what
the white book said it was (it's not just one level minus the average
of the preceding, the parameter is also scaled by the reciprocal of
the level number). [There's a split-second solution -- see below --
but we sort of didn't think of it at the time...]
> I presume the reason they were used at all is because if you have
> equal replication of everything the Helmert contrasts give you a
> model matrix with orthogonal columns, so all estimates are
> uncorrelated. Whenever do you get equal replication, though?
Hardly ever. Actually, I though that the point was not so much
ortogonality, but the successive testing (A=B, A=B=C, A=B=C=D,...).
However that is just plainly wrong outside of balanced ANOVA's.
And, even in that case, once the first two levels differ, the rest
of the coefficients lose all meaning.
----------------------------------------------------------------------
TASK: formula problems
STATUS: Open
FROM:
We also need to fix formula.default. At the moment it only
looks for x$formula. Other standard places to keep a
formula are x$call$formula and x$terms. How about
formula.default<-function (x)
{
if (!is.null(x$formula))
return(eval(x$formula))
if (!is.null(x$call$formula))
return(eval(x$call$formula))
if (!is.null(x$terms))
return(x$terms)
switch(typeof(x), NULL = structure(NULL, class = "formula"),
character = formula(eval(parse(text = x)[[1]])),
call = eval(x), stop("invalid formula"))
}
One disdvantage to extracting the formula from $terms
instead of $call$formula is that in S a terms object is
not a formula. On the other hand it doesn't really matter
as long as people use the formula() function.
----------------------------------------------------------------------
TASK: formula problems
STATUS: Open
FROM:
Peter Dalgaard writes:
> Bill Venables writes:
> > Ah, the Helmert contrasts b\^ete noir. For ANOVA the contrast
> > matrix used is mostly irrelevant. For regression models I agree,
> > treatment contrasts would be generally more easily interpreted.
> Understatement of the year... Last time I bumped into them, it took me
> and a colleague more than an hour to figure out how to interpret the
> regression coefficients, and, I may add, the solution was *not* what
> the white book said it was (it's not just one level minus the average
> of the preceding, the parameter is also scaled by the reciprocal of
> the level number). [There's a split-second solution -- see below --
> but we sort of didn't think of it at the time...]
A few weeks ago I gave a fairly detailed discussion of how to
relate contrast matrices and their interpretation in s-news. I
could re-issue it or post it to people if that was their wish.
There is also to be an extended discussion of the subject in V&R2
due out in July, with a further elaboration to appear (real soon
now...) in the online complements.
> > I presume the reason they were used at all is because if you have
> > equal replication of everything the Helmert contrasts give you a
> > model matrix with orthogonal columns, so all estimates are
> > uncorrelated. Whenever do you get equal replication, though?
>
> Hardly ever. Actually, I though that the point was not so much
> ortogonality, but the successive testing (A=B, A=B=C, A=B=C=D,...).
> However that is just plainly wrong outside of balanced ANOVA's.
> And, even in that case, once the first two levels differ, the rest
> of the coefficients lose all meaning.
Indeed. That's why I tended to discount that possibility myself.
Here is a contrast matrix generator I sometimes prefer to use
that corresponds to testing A=B, B=C, C=D, ... Of course the
contrasts are not mutually orthogonal. How it works is left as a
little puzzle. (This function works in S. I haven't tested it
in R, but it should work if lower.tri() is available.)
contr.sdif <- function(n, contrasts = T)
{
# contrasts generator giving `successive difference' contrasts.
if(is.numeric(n) && length(n) == 1) {
if(n %% 1 || n < 2)
stop("invalid number of levels")
lab <- as.character(seq(n))
}
else {
lab <- as.character(n)
n <- length(n)
if(n < 2)
stop("invalid number of levels")
}
if(contrasts) {
contr <- col(matrix(nrow = n, ncol = n - 1))
upper.tri <- !lower.tri(contr)
contr[upper.tri] <- contr[upper.tri] - n
structure(contr/n, dimnames = list(lab, paste(
lab[-1], lab[ - n], sep = "-")))
}
else structure(diag(n), dimnames = list(lab, lab))
}
> contr.sdif(4)
2-1 3-2 4-3
1 -0.75 -0.5 -0.25
2 0.25 -0.5 -0.25
3 0.25 0.5 -0.25
4 0.25 0.5 0.75
----------------------------------------------------------------------
TASK: startup processing
STATUS: Open
FROM:
2) Again, along the lines of something that S does that is
actually useful. In S you can set the S_FIRST environment
variable and have this used as the equivalent of the R
.Rprofile file. Might it be a good idea to allow an
R_FIRST environment variable as well. That way I could
set user specific preferences that apply no matter what
directory I have working in.
----------------------------------------------------------------------
TASK: Function Argument Naming
STATUS: Open
FROM:
There is a problem with 'default argument evaluation'
when I use an existing function name as argument name :
sintest <- function(x, y = 2, sin= sin(pi/4))
{
## Purpose: Test of "default argument evaluation"
## -------- Fails for R-0.49. Martin Maechler, Date: 9 May 97.
c(x=x, y=y, sin=sin)
}
## R-0.49:
R> sintest(1)
##> Error in sintest(1) : recursive default argument reference
## S-plus 3.4 (being 100% ok):
S> sintest(1)
x y sin
1 2 0.7071068
Warning messages:
looking for function "sin", ignored local non-function in: sintest(1)
-------------------------------------------------------
The following shows bugs, both in R and S:
sintest2 <- function(x ,y = 2)
{
## Purpose: Test of "default argument evaluation"
## -------- Fails for S-plus 3.4. Martin Maechler, Date: 9 May 97.
c(x=x, y=y, sin=sin)
}
R> sintest2(1)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
--------------- is almost okay,
the buglet being that the names have been dropped from the list.
But watch this:
S> sintest2(1)
function(x = 1, y = 2, sin.x)
sin2 = .Internal(sin(x), "do_math", T, 109)
--- returning a function
((now we see, why S's way of treating functions as
lists sometimes badly sucks)).
----------------------------------------------------------------------
TASK: Function argument naming
STATUS: Open
FROM:
Martin Maechler wrote:
[ Stuff above. ]
For better or worse, S and R allow default expressions to
contain references variables that are (or rather may be)
created in the function body, so (in R and Splus)
> x<-1
> f<-function(a,b=x) { if (a) x<-2; b}
> f()
Error: Argument "a" is missing, with no default
> f(T)
[1] 2
> f(F)
[1] 1
More traditional lexical scoping would make the reference
to x in the default always be global, but lots of code
would break. I think we're stuck with this behavior as a
corollary to the way S wants default arguments to work.
Actually S is a bit inconsistent in its error message --
if you have a non-function argument it gives the same
message as R,
> g<-function(x=x) x
>g()
Error in g(): Recursive occurrence of default argument "x"
Dumped
Also in R's lexical scoping you probably do want the argument
name to shadow any outer definitions if you want to be able
to define default arguments that are recursive functions,
e.g.
> g<-function(n, nfac=function(x) {
if (x <= 1) 1 else nfac(x-1)*x }) nfac(n);
> g(6)
[1] 720
----------------------------------------------------------------------
TASK: Adding List Elements by Name
STATUS: Open
FROM:
This works in Splus:
> x<-list()
> x[["f"]]<-1
> zz<-"g"
> x[[zz]]<-2
In R both variants fail unless the name is already on the
list. The first one can be replaced by x$f, but there's
seems to be no substitute for the other one (oh yes I found
one, but it's not fit to print!). This comes up if you e.g.
want to create a variable in a data frame with a name given
by a character string.
----------------------------------------------------------------------
TASK: Bug in "approx"
STATUS: Open
FROM:
When the function approx is called with the argument rule=2, one gets
the error message
Error: NAs in foreign function call (arg 6)
Besides, the meaning of rule=1 or rule=2 is opposite to that described
in the help text and used in S-plus.
For example, in R:
R> approx(1:10,2:11,xout=5:15,rule=1)
$x
[1] 5 6 7 8 9 10 11 12 13 14 15
$y
[1] 6 7 8 9 10 11 11 11 11 11 11
R> approx(1:10,2:11,xout=5:15,rule=2)
Error: NAs in foreign function call (arg 6)
but in S-plus:
> approx(1:10,2:11,xout=5:15,rule=1)
$x:
[1] 5 6 7 8 9 10 11 12 13 14 15
$y:
[1] 6 7 8 9 10 11 NA NA NA NA NA
> approx(1:10,2:11,xout=5:15,rule=2)
$x:
[1] 5 6 7 8 9 10 11 12 13 14 15
$y:
[1] 6 7 8 9 10 11 11 11 11 11 11
The reason for this bug can be found in the last lines of the code of
approx:
if (rule == 1) {
low <- y[1]
high <- y[length(x)]
}
else if (rule == 2) {
low <- NA
high <- low
}
else stop("invalid extrapolation rule in approx")
y <- .C("approx", as.double(x), as.double(y), length(x),
xout = as.double(xout), length(xout), as.double(low),
as.double(high))$xout
return(list(x = xout, y = y))
If (rule == 2) the values of low and high are set to NA. Immediately
afterwards, the foreign function "approx" is called with these values,
leading to the error
Error: NAs in foreign function call (arg 6)
To obtain the same behavior as in S-plus (and as in the help-text) the
commands for (rule == 1) and (rule == 2) have to be exchanged.
----------------------------------------------------------------------
TASK: Names and unlisting (bug/feature)
STATUS: Open
FROM: hornik@ci.tuwien.ac.at
R> l <- list("11" = 1:5)
R> l
$11
[1] 1 2 3 4 5
R> unlist(l)
111 112 113 114 115
1 2 3 4 5
I ran into this weekend ...
----------------------------------------------------------------------
TASK: "all.names" function needed
STATUS: Open
FROM:
I could not find the all.names function in R so I created the
enclosed. Comments, criticisms, or changes to a one-liner by creating
nested anonymous functions are welcome. I'll try to work out a
corresponding all.vars function.
### $Id: TASKS,v 1.1 1997/09/18 04:36:42 r Exp $
### Some replacement functions that are missing in R
### Determine all the names (symbols) occuring in an object.
### This is probably grossly inefficient.
all.names <-
function (x)
{
if (mode(x) == "symbol")
return(as.character(x))
if (length(x) == 0)
return(NULL)
if (is.recursive(x))
return(unlist(lapply(as.list(x), all.names)))
character(0)
}
### Local variables:
### mode: R
### End:
And from Martin:
Doug,
your 'all.names' function
[wow, I didn't even know it in S..]
seems to have been written with S in your mind;
you are exactly demonstrating some of the 'fine' differences between R & S
>1> all.names <- function (x)
>2> {
>3> if (mode(x) == "symbol") return(as.character(x))
>4> if (length(x) == 0) return(NULL)
>5> if (is.recursive(x)) return(unlist(lapply(as.list(x), all.names)))
>6> character(0)
>7> }
1) length(x) is not always defined in R; e.g. it is NOT for functions.
--> Delete line 4
2) functions are NOT lists and cannot be coerced to,
which makes line 5 fail for function objects.
As a matter of fact, I once was also a bit interested in this.
At that time, 'args' did not yet exist, and I wanted to abuse 'as.list' for
functions.
I was told that this is 'bad' (functions have nothing to do with
lists; in R, functions can have a defining environment going with
them...) and 'args' is now provided which helps my most immediate
need.
In short: I don't think you can define an all.names(.)
function which works with functions arguments, in the
current version of R.
----------------------------------------------------------------------
TASK: "sys.function" problem
STATUS: Open
FROM: +
This was either an attempt to get an early lead in some
future obfuscating R contest or a way of getting around
the different scoping rules of R and S.
I attempted to create a recursive anonymous function to be
called within another function. You may want to stop
reading for a bit and consider how that would be done.
That is, how do you recursively call a function that has
never been assigned a name?
OK, you're back. You probably came up with a better solution
than I did but I used (sys.function())(arg) to do the
recursion. The piece of code looks like
flist <- (function(x) {
if (mode(x) == "call") {
if (x[[1]] == as.name("/"))
return(c(sys.function()(x[[2]]), sys.function()(x[[3]])))
if (x[[1]] == as.name("(")) # for R
return(sys.function()(x[[2]]))
}
if (mode(x) == "(") return(sys.function()(x[[2]])) # for S
list(x)
})(getGroupsFormula(data, form, ...)[[2]])
## I know it's horribly obscure.
## Blame Bill Venables for teaching me this.
Regretably, it doesn't work in R. Using the debugger one
finds that sys.function() returns the function being called
the first time through but the second time through it
returns NULL. Is this a bug or a feature?
----------------------------------------------------------------------
TASK: Matrix multiply problems
STATUS: Open
FROM:
Both of these used to work and seem useful and harmless:
R> matrix(1,ncol=1)%*%c(1,2)
Error in matrix(1, ncol = 1) %*% c(1, 2) : non-conformable arguments
R> matrix(1,ncol=1)*(1:2)
Error: dim<- length of dims do not match the length of object
----------------------------------------------------------------------
TASK: "write" function
STATUS: Open?
FROM:
Following my posting of a write.table() function, Martin
suggested that one could have a generic write() function
and special methods for e.g. time series, data frames, etc.
Well, a month has passed since ...
What does everyone think? Is it a good idea, or would
write.table() be enough? If we think that it is not enough,
which arguments should the write methods typically allow?
What about
write.xxx (x, # object
file = # filename, default stdout
append = # obvious
sep = # obvious
eol = # end of line char
...)
???
On the other hand, it seems clear that something like
write.table() is nice, and what it should do. But what
about classes other than data.frame?
Martin Maechler:
Note that S has a write(.) function which would be our
write.default(.)
your write.table would be our
write.data.frame
The only addition would be a 'write.matrix' which would be 'like'
write.data.frame, the only problem being that 'matrix' is not a
class (yet). [Note that in S4, everything has a class;
I'm voting for matrices to have a class in R ..]
write.default could 'despatch' to write.matrix if x is a matrix.
----------------------------------------------------------------------
TASK: "ls.print" problem
STATUS: Closed, Aug 6, 97 RG.
FROM:
ls.print produces error that I don't seem to be able to
trace. Output of the commands as follows: (hyeung is a
24x2 matrix of data)
-------------------------------------------------
> summary(hyeung)
x.1 x.2
Min. : 28.0 Min. : 10.0
1st Qu.: 72.0 1st Qu.: 87.5
Median : 86.5 Median : 92.5
Mean : 81.0 Mean : 82.5
3rd Qu.: 97.0 3rd Qu.:100.0
Max. :100.0 Max. :100.0
> summary(lsfit(hyeung[,1],hyeung[,2]))
Length Class Mode
coef 2 -none- numeric
residuals 24 -none- numeric
intercept 1 -none- logical
qr 6 -none- list
> ls.print(lsfit(hyeung[,1],hyeung[,2]))
trace: ls.print(lsfit(hyeung[, 1], hyeung[, 2]))
Error: missing value in ``n1 : n2''
----------------------------------------------------------------------
TASK: Comparisons with zero length things
STATUS: Open
FROM:
Thomas:
Any comparison with NULL generates an error
Error: comparison is possible only for vector types
whereas in S(-PLUS) it gives NA, which seems more sensible.
Along similar lines, comparison with a length 0 vector returns
logical(0) in R but NA in S.
Martin:
Isn't logical(0) more logical than NA ?
I agree that it would be best (convenience)
if 'NULL==1' returned the same as 'numeric(0)==1'.
At the moment, I don't see why compatibility with S should be
important here:
if( NULL == anything)
or, e.g., if( numeric(0) == numeric(0) )
give an error anyway, i.e., you have to test for length 0
_anyway_ in the cases where one comparison argument may
have zero length.
Thomas:
I didn't (previously) make any comment on this -- I only
said that NA was more logical than an error message.
However, the advantage of returning NA is that NA | TRUE
is TRUE, NA & FALSE is FALSE, which doesn't happen with
logical(0). Also, from a compatibility point of view one
of them is tested with is.na(), the other with length(),
so it can matter which one you use. Of course no-one should
deliberately write code where it matters, but these things
happen.
It seems in fact that logical(0) | TRUE causes R to freeze
(R0.49, sparc solaris).
Robert:
Well, we thought
logical(0) & T should return logical(0)
logical(0) | T should return logical(0)
already we have NA | T returns T
and NA & T returns NA
Martin:
Ok, given the above argument, returning NA is logical, too.
However, I'd also argue that
logical(0) | TRUE -> TRUE
logical(0) & FALSE -> FALSE
logical(0) & TRUE -> logical(0)
logical(0) | FALSE -> logical(0)
ThLu> It seems in fact that logical(0) | TRUE causes R to freeze
ThLu> (R0.49, sparc solaris).
Yes:
> logical(0) | TRUE
Warning in logical(0) | TRUE : longer object length
is not a multiple of shorter object length
Floating exception
~~~~~~~~~~~~~~~~~~ [and 'core' dump]
----------------------------------------------------------------------
TASK: Misc
STATUS: Open
FROM:
Here are two small problems I've pointed out before, but
still seem to be in 0.49.
1/ getenv() should return everything, not complain missing item.
2/ In summary.default
...
sumry[i, 2] <- if (is.object(ii))
class(ii)
should be changed to
...
sumry[i, 2] <- if (is.object(ii))
paste(class(ii), collapse=" ")
so that it works with lists of lists.
(This fix was suppose to be added to Splus 4.)
----------------------------------------------------------------------
TASK: Method lookup for "print"
STATUS: Open
FROM:
I have always thought that typing the name of an object
generated a call to the print method for the object, however,
(in 0.49) I redefined the generic print method as
print <- function(x, ...)
{if (is.tframe(x)) UseMethod("print.tframe")
else UseMethod("print")
}
Now I have an object z which returns TRUE to is.tframe(z) and
> class(z)
[1] "ts" "tframe"
Then
> print(z)
[1] 1981.50 2006.25 4.00
But
> z
Error: comparison is possible only for vector types
> traceback()
[1] "c(\"print.ts(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
[2] "c(\"print(structure(c(1981.5, 2006.25, 4), class = c(\\\"ts\\\",
\\\"tframe\\\"\", "
This is generating a call to the class method print.ts
rather than to print.tframe.ts as is done when I use
print(z). If my understanding that typing the name of an
object should generate a call to the print method for the
object then this is a bug. Otherwise, could someone please
explain to me what it does. Thanks.
----------------------------------------------------------------------
TASK: Time Series Problems
STATUS: Open
FROM:
Here are four problems with ts:
1/ ts matrix subscripting should support drop=F:
> z<- matrix(1:10,5,2)
> z <-ts(z)
> z[,1,drop=F]
Error in [.ts(z, , 1, drop = F) : unused argument to function
2/ == and other comparisons with non-ts matrices should work:
> z <- matrix( 1:10,5,2)
> ts(z)
Time-Series:
Start = c(1, 1)
End = c(5, 1)
Frequency = 1
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> z == ts(z)
Error: invalid time series parameters specified
>
3/ The generic functions start and end need default methods
to return a result for matrices as previously and in S.
The following seems to work.
start.default <- function (x) start(ts(x))
end.default <- function (x) end(ts(x))
4/ In the function start.ts (and in end.ts) ts[1] in the
last line is not defined. Perhaps I am missing something?
start.ts
function (x)
{
ts.eps <- .Options$ts.eps
if (is.null(ts.eps))
ts.eps <- 1e-06
tsp <- attr(as.ts(x), "tsp")
is <- tsp[1] * tsp[3]
if (abs(is - round(is)) < ts.eps) {
is <- floor(tsp[1])
fs <- floor(tsp[3] * (tsp[1] - is) + 0.001)
c(is, fs + 1)
}
else ts[1]
}
----------------------------------------------------------------------
TASK: False warnings
STATUS: Open
FROM:
In R 0.49 comparison of logic matrices with & and | seems
to sometimes generate false warning messages about longer
object length is not a multiple of shorter object length.
I have not been able to isolate the exact circumstances.
----------------------------------------------------------------------
TASK: ISO-latin1 characters
STATUS: Open
FROM:
There seems to be a problem in print.default with some
ISO-latin1 characters (the chars AFTER ASCII in western
Europe...) if they appear in strings. (no problem if they
are part of a function comment, see below).
Some of the characters lead to 4 character Hex-codes being
printed instead: "" ## ^u prints as "0xFB"
If you use the funny characters in comments of functions,
they are stored and printed properly.
HOWEVER: In a few rare cases, the strings are not even
PARSED properly; the line 'ISOdiv <- ..' below gives a
SYNTAX error.
The following code shows the symptoms :
-- ONLY if the e-mail between here and your place is
8-bit clean! -- (else: get it
ftp://ftp.stat.math.ethz.ch/U/maechler/R/string-test.R )
frenchquotes <- "«...»" ## <<...>>
frenchquotes
Umlaute <- "äöü ÄÖÜ" # = "a "o "u "A "O "U
Umlaute #- only the last one is not printed properly...
A.accents <- "àáâãäåæ ÀÁÂÃÄÅÆ" # `a 'a ^a "a oa ae `A 'A ^A "A oA AE
A.accents
EI.accents <- "ÈÉÊËÌÍÎÏ èéêëìíîï"
EI.accents
O.accents <- "ÒÓÔÕÖØòóôõöø"
O.accents
U.accents <- "ÙÚÛÜÝùúûüý"
U.accents
ISO24x <- "¡¢£¤¥¦§ ¨©ª«¬®¯" #octal 241..257
ISO26x <- "°±²³´µ¶· ¸¹º»¼½¾¿" #octal 260..277
##--- THIS IS a Problem: It gives a SYNTAX error !
ISOdiv <- "×÷ Ðð Ññ Þþ ßÿ"
##-- One of these characters even was producing the same as 'q()' !!
aa_ function(x) {
x^2
##- frenchquotes <- "«...»" ## <<...>>
##- Umlaute <- "äöü ÄÖÜ" # = "a "o "u "A "O "U
##- A.accents <- "àáâãäåæ ÀÁÂÃÄÅÆ" # `a 'a ^a "a oa ae `A 'A ^A "A oA AE
##- EI.accents <- "ÈÉÊËÌÍÎÏ èéêëìíîï"
##- O.accents <- "ÒÓÔÕÖØòóôõöø"
##- U.accents <- "ÙÚÛÜÝùúûüý"
##-
##- ISO24x <- "¡¢£¤¥¦§ ¨©ª«¬®¯" #octal 241..257
##- ISO26x <- "°±²³´µ¶· ¸¹º»¼½¾¿" #octal 260..277
##- ISOdiv <- "×÷ Ðð" ##-- OMITTED further: SYNTAX error !!
}
aa
----------------------------------------------------------------------
TASK: String length problems
STATUS: Closed ?
FROM:
This is not a cat(.) but a string storing/parsing problem:
nchar("\n\n") # gives 2 instead of 3
[ Hmmm. Was this typed to readline I wonder? There it ]
[ seems that ^L must be escaped with ^V. Using the ANSI ]
[ \f will now produce a literal formfeed. Indeed, using ]
[ any of the ANSI C escapes will work. ]
[ However, using the '^L' (emacs C-q C-l) in a string is still dropped:
> "\n\n"
[1] "\n\n"
----------------------------------------------------------------------
TASK: Fontend
STATUS: Open
FROM:
Some time ago there was the suggestion to add a PLATFORM
subdir level for bin (and eventually the library subdirs
with `binaries'), and the idea to have the shell wrapper
automagically call the right binary.
I mentioned that one might be able to use the shell variables
OSTYPE and HOSTTYPE for that, noticing however that e.g on
my Debian Linux/GNU/ix86
bash tcsh
OSTYPE Linux linux
HOSTTYPE i386 i386-linux
Hmm ... It seems (a colleague just checked that) that these
variables are not POSIX either, and hence I'd say rather
useless for our purpose.
In the absence of a reliable run-time possibility to
determine the current platform, it seems to be natural to
use `platform' as obtained at compile-time for possibly
distinguishing the various binaries etc, and leave it at
the discretion of the sysadmin to ensure that the R script
in the path calls the right binary.
If I am missing something obvious, please let me know.
----------------------------------------------------------------------
TASK: Resetting Graphical Parameters
STATUS: Open
FROM:
BY THE WAY:
It would be nice to be able to say
par(reset = TRUE)
(or similar) for resetting all the graphical parameters to their
(device-dependent) default values.
[ This will require a little work. Perhaps the easiest thing ]
[ to do is to add a new device driver call "reset". This would ]
[ be best left to the multiple acyive device driver project. ]
----------------------------------------------------------------------
TASK: .Options not working in all cases
STATUS: Open
FROM:
The .Options vector had been introduced a while ago after my
suggestion (see Ross's E-mail below). .Options$digits is used
be default in several print methods (eg print.lm), however,
deparse(.) e.g., uses options()$width, and not .Options$width.
Another problem is that .Options
is still not in the documentation (on-line help).
Before one could add it there, we'd need ``the specs''.
I think the (at least my) idea was that
options(.) queries or sets elements in the .Options list
and all functions -- including the internal ones -- use .Options.
As far as I know, this is what S does.
Currently, this is NOT the case in R.
Ross said a while ago:
>>> From: Ross Ihaka
>>> Date: Wed, 11 Dec 1996 17:10:59 +1300 (NZDT)
>>> To: Martin Maechler
>>> Cc: R-testers mailing list
>>> Subject: R-alpha: options() and .Options -- ?
Ross> Martin Maechler writes:
>> This is not a bug report, rather than some remarks as a
>> "request for comments":
>>
>> It is clear that options( foo = bar )
>> sets the option and also updates the builtin() .Options list :
>>
>> > options(myopt = pi)
>> > .Options$my
>> [1] 3.14159265
>>
>> In S-plus, it was (is) possible to use .Options locally in a function
>> frame in order to just affect some options during evaluation of that
>> function.
Ross> I have made some changes so that such local assignments to .Options
Ross> will work. The down side is that such assignments will also work at
Ross> top level with the changes shadowing the real system options.
Ross> This also may be ok. It would have the advantage that options would
Ross> then be preserved from session to session. Is this a good idea or a
Ross> bad idea?
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------
TASK:
STATUS:
FROM:
----------------------------------------------------------------------