mala::home Davide “+mala” Eynard’s website


Octave clustering demo part 0: introduction and setup

As promised to my PAMI students, I have prepared a couple of demos to get a better grasp of how different clustering algorithms work. Hoping they could be useful for somebody else (and sure that this way I will not lose them so easily ;-)) I have decided to post more information about them here.

All the demos are in one single file and can be downloaded from the course page together with slides and other material (or if you only want the demo file follow this direct link). Once unpacked, you will find the following:

  • data/*.mat (some example datasets)
  • accuracy.m (calculates clustering accuracy)
  • adjacency_weighted.m (calculates the weighted adjacency matrix used by spectral clustering)
  • kMeansDemo*.m (k-means demos)
  • L2_distance.m (calculates L2 distance between two vectors/matrices)
  • laplacian.m (builds the Laplacian used by spectral clustering)
  • myKmeans.m (performs k-means clustering)
  • plotClustering.m (given clustered data and ground truth, plots clustering results)
  • repKmeans.m (automatically repeats k-means ntimes and keeps the result with lowest SSE)
  • spectralDemo.m (spectral clustering demo)
  • SSE.m (given clustered data and centroids, calculates total SSE)
  • uAveragedAccuracy.m (another way to calculate clustering accuracy)

To run the code just open Octave, cd into the spectralClustering directory, and call the different functions. The demo files are executed in the following ways:

    - or -

where dataset is one of the files in the data directory, nn is the number of nearest neighbors for the calculation of the adjacency matrix, and t is the parameter for the Gaussian kernel used to calculate similarities in the weighted adjacency matrix (0 can be used for auto-tuning and usually works fine enough). For example:

    - or -

Good values of nn for these examples range between 10 and 40... I will let you experiment which ones are better for each dataset ;-). Now feel free to play with the demos or read the following tutorials:

Comments (2) Trackbacks (0)
  1. How can we evaluate the accuracy result of Kmeans and Spectral Clustering ?
    Can you give some advices to compute the rate ?
    Thank you

    • Hi Birk! Accuracy is an external evaluation measure, which means you first need some ground truth (i.e. the “correct” grouping of your data) to evaluate how good your clustering results are. Then after you run your clustering algorithm you can compare the labels you get with the correct ones. For this comparison you can use different evaluation measures, such as (micro-averaged) accuracy or normalized mutual information. You can find the code for accuracies in my previous posts on clustering evaluation (e.g. here and here), while you can find code for NMI easily on the Web.

Leave a comment

Trackbacks are disabled.