mala::home Davide “+mala” Eynard’s website

18Jun/130

Octave clustering demo part 5: hierarchical clustering

[This post is part of the Octave clustering demo series]

So, if you read this you probably have already unpacked the tgz file containing the demo and read the previous article about k-medoids clustering. If you want to check out the older Octave clustering articles, here they are: part 0 - part 1 - part 2 - part 3.

Run the hierarchicalDemo script (if launched without input parameters, it will open the "crescents.mat" data file by default). The script will plot the dataset, then it will try to perform clustering using first k-means and then an agglomerative hierarchical clustering algorithm. Run the algorithm with all the datasets available (note that you will need to modify the code to make it work with different numbers of clusters).

Questions:

  1. how does k-means perform on the non-globular datasets (crescents, circles01, and circles02)? What is the reason of this performance? how does hierarchical algorithm perform instead?
  2. now let us play with the "agglomerative" dataset. Information is missing so you know neither the number of clusters in advance nor the ground truth about points (i.e. which class they belong to). First of all, run the hclust function with different numbers of clusters and plot the results. Are you satisfied with them? What do you think is the reason of this "chaining" phenomenon?
  3. complete the code in hclust.m so it also calculates complete and average linkage, and run the algorithm again. How do clustering results change? Plot and comment them.
18Jun/130

Octave clustering demo part 4: k-Medoids

[This post is part of the Octave clustering demo series]

Here follows the description of the first of two new Octave clustering demos. If you have attended the PAMI class this year you probably know what I am talking about. If instead you are new to these demos please (1) check the page linked above and (2) set up your Octave environment (feel free to download the old package and play with the old demos too ;-)). Then download this new package.

Run the kMedoidsDemo script (if launched without input parameters, it will open the "noisyBlobs01.mat" data file by default). The script will plot the dataset, then it will try to perform clustering using first k-means and then k-medoids. Run the experiment many times (at least 4-5 times) for each of the three noisyBlobs datasets (you can pass the dataset path and name as a parameter to the kMedoidsDemo function).

Questions:

  1. comment the "worst case" results you get from k-means: what happens and how does noise influence this behavior?
  2. compare the results you get with k-means with the ones you get with k-medoids: how does k-medoids deal with the presence of noise?
  3. what would happen if noise points were much farther from the original clusters? Try to get few (1-2) of them and bring them very far from the rest of the data… what are the results you get from k-means and k-medoids?

Hints:

  • the demo stops at each step waiting for you to press a key. You can disable this feature by setting the "interactive" variable to zero at the beginning of the script;
  • there are two different k-means implementations you can run: feel free to try both just by uncommenting their respective code.