-- This is private to me, Martin Maechler, the package maintainer. -- a much smaller ``public TODO'' is part of the file ./README -- Things done are moved to ./DONE-MM ~~~~~~~~ ----------- ~~~~~~~~~ o clusGap() : now have "original" in addition to PCA-rotation with 'scaleH0 string. In addition should provide 'H0gen = a RNG *function*': Chris Fields (in March 2012) had proposed to alternatively allow uniform on a n-simplex instead of an n-cube. (~/R/MM/Pkg-ex/cluster/clusGap-ChrisField-thoughts) o clusGap() : print() method: note that Tibshirani et al. proposed (see their large data example !) a different method than the one we implemnt currently --> provide both(!) o clara() to work with *daisy*-like distances, not just L2 & L1. o clara() - bug?: tests/clara.Rout.save.~13~ (and ~12~ , ~11~ ...) gives a clearly better result clara(ru4, k=3, met="manhattan", sampsize=4) than current (later) clara.Rout files... -> bug since 'Mar 11 2004' (= cluster-1.[89].1 for R 1.[89]) ?? FIXME ?!?!?! o plot.silhouette() : if an observation's width is == 0., draw a small stripe instead of nothing at all o clara(): Should be possible to "re"start with a GIVEN "best sample" o pam() and clara(): Should be possible to "re"start with GIVEN medoids. now possible for pam(), not for clara(); --> "synchronize src/pam.c and src/clara.c in particular bswap() vs bswap2() !! o pam() and clara(): With NA's, medoids "often" contain NAs even when there are only few NAs. ==> use modified d(.,.) which make NAs "bad" somehow. o pamila(): Major smart idea: Do save the d(i,j) i=1,..k j=1,..n only between *medoids* and everything else -- speedup(?) -> optional o R/agnes.q , R/diana.q and R/pam.q have almost identical clause if(data.class(x) != "dissimilarity") { if(!is.numeric(x) || is.na(sizeDiss(x))) stop("x is not of class dissimilarity and can not be converted to this class." ) ## convert input vector to class "dissimilarity" class(x) <- ..dClass attr(x, "Size") <- sizeDiss(x) attr(x, "Metric") <- "unspecified" } which can be modularized out into a NAMESPACE-local fixupDiss() function [ agnes() and diana() even more in common --> namespace-local functions! see also "8b)" below!] o diana() {divisive hierarchical}: Should allow ---early stopping--- (for speed and size) -- simultaneously, could think of ``diss() on the fly'' instead of diss() matrix, but see ./src/NOTES-MM (and "pamila" below) ! ~~~~~~~~~~~~~~ Dec. 2002: o clara(ruspini, 4) BUG in clara.c (see below) -- worked fine in cluster-1.5.2 (with clara.f!) -- gives error in " 1.6.1 [and later] ==== AARGH (the problem is *not* an integer/double one, here!) Status 28.Dec.2002: - The August-2002 fortran code doesn't seem to have a problem ==> ~/R/Pkgs/TMP/cluster/ - The F2C code (called via .Fortran()) seems the same ==> ~/R/Pkgs/T_F2C/cluster/ - A very slight change of the F2C code (using .C()) has one problem but not all of the "modern" C version ==> ~/R/Pkgs/T_F2C-2/cluster/ Fixed most of the above 2002-12-28 _late_ -- still one small problem! but it seems clear this was even in early clara.f (at least, final result is the same for that example) src/clara.c << needs more o diana(ruspini) --> ok (again) o bannerplot() is now `standalone' and has a help, man/bannerplot.Rd . HOWEVER it's "details" are found in man/plot.agnes.Rd (and ???) instead --> centralize this info (and keep short ref.s in the man/plot.* o agnes() and hclust() should be merged {and based on C, not Fortran} o agnes() for large objects needs TWICE the time of hclust(); both need MUCH MORE time than hcluster() in pkg 'amap', which is said to be the same as 'hclust' but just only malloc()ating the "huge" dissimilarities inside C. --> translate agnes, i.e. src/twins.f to C July 2002: o Idea for new functionality : e.g., pamila() := PAM In Large Application should not *save* dissimilarities but rather re-compute them on the fly --> save huge storage ==> should give identical results but be faster for larger n, or at least feasible for n = 10'000 or so where it currently aint. June 2002: mona() : I think it should be possible to write an [ as.hclust.mona() or as.twins.mona() method and hence also draw a dendrogram of a mona object. Jan. 2002: clusellipses() ``like part of clusplot'' for *adding* ellipses to plot; maybe do this with "add = TRUE, plotchar = !add, labels = 0 May 23, 2001 / Jan.2002 : ------------------------- I found problems with missing values / NAs treatment : o Also, I'm not sure if the NAs are dealt with sensibly in clara() : The result changes too much with very few NAs o --> look at all the subroutine dysta*() s in src/*.f Clean these up and merge in one single! Aug.02: partly done -- fanny() is different than others. In the future: When "mva" will have a C API, use dist()'s C function! 7) Get rid of the many \section{GENERATION}, {METHODS} and {INHERITANCE} sections in man/*.Rd -- make sure that info is available, at least partially otherwise. 6a) The \references{} mostly contain the same things. man/plot.agnes.Rd has some of them nicely. Collect in a few places (*.Rd files), and refer to these {partly ok} b) Similarly for the \section{BACKGROUND} which appears in quite a few *.Rd files. --> done partly ( ./ChangeLog 2002-01-24 ) 8b) Think about "merging" the plot.agnes and plot.diana methods. ------------------------------------------------ older TODO (were in ./README_MM which is now eliminated) ==== 3) daisy() for the case of mixed variables should allow a weight vector (of length p = #vars) for up- or downweighing variables. daisy() really should accept the other methods mva's dist() does _and_ it should use dist's C API -- but we have no C API for package code, ARRGH! 4) Eliminate the many Fortran (g77 -Wall) warnings of the form >> mona.f:101: warning: `jma' might be used uninitialized in this function --------- 9) man/daisy.Rd should mention 'Gower (1971)' ; mention that Kaufman & Rousseeuw *generalize* this; and probably show the full formula from Kauf+Rouss p.35 10) Implement the plot for "fuzzy cluster membership" of section 5.4, from Kauf+Rouss p. 195 ff : I.e. PCA of the membership matrix for the points + "the pure clusters" <---> Export as.membership() and toCrisp() in ./R/fanny.q