Private Maintainer -- TODO				-*- markdown -*-

A much smaller _public TODO_ is part of the file `./README`
Things done are moved to `./DONE-MM`

### `clusGap()`

* currently fails [in `svd()`] when `x` has any NA's

* clusGap() : now have `"original"` in addition to PCA-rotation with
   `scaleH0` string.  In addition should provide `H0gen = a RNG *function*`:
   Chris Fields (in March 2012) had proposed to alternatively allow uniform
   on a n-simplex instead of an n-cube.
    ->  `~/R/MM/Pkg-ex/cluster/clusGap-ChrisField-thoughts`

   ==> Master Thesis of Emmanuel Profumo (~/Betreute-Arbeiten/Profumo_Emmanuel/)
   has a __generalized__ clusGap()
   --> added but not yet {exported+documented} >>> R/clusGapGen.R
   NB: See more in
   ~/R/MM/Pkg-ex/cluster/clusGap/clusGapGen/            notably
   ~/R/MM/Pkg-ex/cluster/clusGap/clusGapGen/README.md
   --------------------------------------------------

* clusGap `print()` method: note that Tibshirani et al. proposed
  see their large data example (!) a different method than the one we
  implemnt currently --> provide both(!)

### `clara()`

* clara() to work with *daisy*-like distances, not just L2 & L1.

   -- Possibility: using stats package C code
	   `~/R/D/r-devel/R/src/library/stats/src/distance.c`
      does not have Gower, but a few more kinds of distance methods.

* clara() - bug?: tests/clara.Rout.save.~13~ (and ~12~ , ~11~ ...)
		  gives a clearly better result
		  clara(ru4, k=3, met="manhattan", sampsize=4)
		  than current (later) clara.Rout  files...
     -> bug since 'Mar 11 2004' (= cluster-1.[89].1 for R 1.[89]) ??
     FIXME ?!?!?!

* It should be possible to _re_start `clara()` with a **given** "best sample"

* Have introduced  as.data.frame(<silhouette>) via `S3method()` in `NAMESPACE` + 
  copy (from base) of `as.data.frame.matrix`. 
  Nicer (but less back compatible ==> need rev.dep checks!) 
  would have been to change the *class* to
  `c("silhouette", class(matrix())` such that all matrix (and array)
  methods would work with silhouette.

* `silhouette.clara()`

	- silhouette(*, full=TRUE):
	 Allow option of *NOT* pass full-size length n(n-1)/2 dist object but compute
	 `daisy()`-like distance *on the fly*  inside `src/sildist.c`
     ==> still needs O(n^2) CPU-effort, but not O(n^2) memory
	  { which is even not *possible*, e.g., for n=70'000 }

    - Consider `silhouette(*, full = 0.50)1 to compute silhouette for a (random ?)
	  subset of {0.50 * n} of the observations

*    - `plot.silhouette()` : if an observation's width is == 0., draw a small stripe
	  instead of nothing at all

### `pam()` (and also `clara()`)

* clara() got metric="jaccard" (donated in 2018) ==> copy to pam(), too !!
  		well: actually that is *BUGGY* 

* pam() and clara(): Should be possible to "re"start with GIVEN medoids.
		     now possible for pam(), not for clara();
		     --> "synchronize src/pam.c and src/clara.c in
		     particular bswap() vs bswap2() !!
* pam() and clara(): With NA's, medoids "often" contain NAs even when there
		     are only few NAs. ==> use modified d(.,.) which make
		     NAs "bad" somehow.

* pamila(): Major smart idea: Do save the d(i,j) i=1,..k  j=1,..n
	    only between *medoids* and everything else -- speedup(?) -> optional
* R/agnes.q , R/diana.q  and R/pam.q  have almost identical clause

	if(data.class(x) != "dissimilarity") {
	    if(!is.numeric(x) || is.na(sizeDiss(x)))
		stop("x is not of class dissimilarity and can not be converted to this class." )
	    ## convert input vector to class "dissimilarity"
	    class(x) <- ..dClass
	    attr(x, "Size") <- sizeDiss(x)
	    attr(x, "Metric") <- "unspecified"
	}

  which can be modularized out into a NAMESPACE-local fixupDiss() function

  [ agnes() and diana() even more in common --> namespace-local functions!
    see also  "8b)" below!]

### diana()  {divisive hierarchical}:

* Should allow __early stopping__
  (for speed and size) -- simultaneously, could think of
  _`diss()` on the fly_ instead of diss() matrix,
  but see `./src/NOTES-MM`  (and `pamila()`) !


Dec. 2002:

  o	clara(ruspini, 4)  BUG  in clara.c (see below)
	-- worked fine in cluster-1.5.2 (with clara.f!)
	-- gives error in   "     1.6.1 [and later]   ==== AARGH
	(the problem is *not* an integer/double one, here!)
    Status 28.Dec.2002:
     - The August-2002 fortran code doesn't seem to have a problem
	 ==> ~/R/Pkgs/TMP/cluster/
     - The F2C code (called via .Fortran()) seems the same
	 ==> ~/R/Pkgs/T_F2C/cluster/
     - A very slight change of the F2C code (using .C())
       has one problem but not all of the "modern" C version
	 ==> ~/R/Pkgs/T_F2C-2/cluster/

    Fixed most of the above 2002-12-28 _late_ -- still one small problem!
    but it seems clear this was even in early clara.f (at least, final
    result is the same for that example)  src/clara.c << needs more

  o	diana(ruspini) --> ok (again)

  o     bannerplot() is now `standalone' and has a help, man/bannerplot.Rd .
	HOWEVER it's "details" are found in  man/plot.agnes.Rd (and ???)
        instead --> centralize this info (and keep short ref.s in the man/plot.*

  o	agnes() and hclust() should be merged {and based on C, not Fortran}
  o	agnes() for large objects needs TWICE the time of hclust();
	both need MUCH MORE time than hcluster() in pkg 'amap',
	which is said to be the same as 'hclust' but just only
	malloc()ating the "huge" dissimilarities inside C.
	--> translate agnes, i.e.  src/twins.f  to C


July 2002:

  o	Idea for new functionality :
	e.g., pamila() := PAM In Large Application
	should not *save* dissimilarities but rather re-compute them on
	the fly --> save huge storage
	==> should give identical results but be faster for larger n,
	or at least feasible for n = 10'000 or so where it currently aint.


June 2002:

  mona() :  I think it should be possible to write an
[	    as.hclust.mona()  or as.twins.mona() method
	    and hence also draw a dendrogram of a mona object.

Jan. 2002:

  clusellipses() ``like part of clusplot'' for *adding* ellipses to plot;
		 maybe do this with "add = TRUE, plotchar = !add, labels = 0


May 23, 2001 / Jan.2002 :
-------------------------
  I found problems with missing values /  NAs treatment :

  o Also, I'm not sure if the NAs are dealt with sensibly
    in clara() :  The result changes too much with very few NAs

  o --> look at all the  subroutine dysta*() s in  src/*.f
    Clean these up and merge in one single!
	 Aug.02: partly done -- fanny() is different than others.

  In the future: When "mva" will have a C API, use  dist()'s C function!


7) Get rid of the many  \section{GENERATION}, {METHODS} and {INHERITANCE}
   sections  in man/*.Rd
   -- make sure that info is available, at least partially otherwise.

6a) The \references{} mostly contain the same things.
   man/plot.agnes.Rd has some of them nicely.
   Collect in a few places (*.Rd files), and refer to these  {partly ok}

 b) Similarly for the \section{BACKGROUND} which appears in quite a few
   *.Rd files.

    --> done partly ( ./ChangeLog 2002-01-24 )

8b) Think about "merging" the plot.agnes and plot.diana methods.

------------------------------------------------

Older TODO
==========
		     (were in `README_MM` which is now eliminated)

 3) daisy() for the case of mixed variables should allow
    a weight vector (of length p = #vars) for up- or downweighing variables.
    daisy() really should accept the other methods mva's dist() does _and_
    it should use dist's C API -- but we have no C API for package code, ARRGH!

 4) Eliminate the many Fortran (g77 -Wall) warnings of the form
    >> mona.f:101: warning: `jma' might be used uninitialized in this function

---------

9) man/daisy.Rd  should mention 'Gower (1971)' ;
   mention that Kaufman & Rousseeuw *generalize* this;
   and probably show the full formula from Kauf+Rouss p.35

10) Implement the plot for "fuzzy cluster membership" of section 5.4,
    from Kauf+Rouss p. 195 ff :
    I.e. PCA of the membership matrix for the points + "the pure clusters"
   <---> Export  as.membership() and toCrisp()  in  ./R/fanny.q