Octave clustering demo part 6: (more) evaluation
[This post is part of the Octave clustering demo series]
This new clustering demo goes deeper into evaluation, showing some techniques to get an idea about the clustering tendency of a dataset, find visual clues about the clusters, and get a hint about the "best" number of clusters. If you have attended the PAMI class this year you probably know what I am talking about. If instead you are new to these demos please (1) check the page linked above and (2) set up your Octave environment (feel free to download the old package and play with the old demos too ;-)). Then download this new package.
If you want to check out the older Octave clustering articles, here they are: part 0 - part 1 - part 2 - part 3 - part 4 - part 5. I strongly suggest you to run at least parts 0-3 before this demo, as they provide you all the basics you need to get the best out of this exercise.
Note that some of the functions available in previous packages are also present in this one. While it is ok to run the previous examples with those functions, make sure you are using the most up-to-date ones for the current experiments, as I debugged and optimized them for this specific demo.
Run the evaluationDemo1 script. This will generate a random dataset and walk you through a sequence of tests to evaluate the clustering tendency and ultimately perform clustering on it. As random clusters might be more or less nasty to cluster, I suggest you to try running the demo few times and see how it behaves in general. Note that part of the code is missing and you will have to provide it for the demo to be complete.
When you feel confident enough with evaluating the random dataset, run the evaluationDemo2 script. This will be a little more interactive, asking you to choose, between three datasets, the one which has the most "interesting" content, and requiring you to write small pieces of code to complete your evaluation.
At the end of your experiment, answer the following questions:
- comment the different executions of evaluationDemo1. How sensitive to "bad" overlapping clusters are SSE elbow and distance matrix plot? Does the presence of overlapping clusters affect the clustering tendency test? How do you think it would be possible (and in this case, would it be meaningful) to distinguish between two completely overlapping clusters?
- comment the different executions of evaluationDemo2. Which is the "interesting" dataset? Why? Is the SSE elbow method useful to automatically detect the number of clusters? Why? What additional information does the distance matrix convey? Is Spectral clustering better than k-Means? Did you happen to find parameters that give better accuracy than the default ones?
If you are a PAMI student, please write your answers in a pdf file, motivating them and providing all the material (images, code, numerical results) needed to support them.
Hints:
- the demo stops at each step waiting for you to press a key. You can disable this feature by setting the "interactive" variable to zero at the beginning of the script;
- the second demo file has some "return" commands purposely left in the code to stop the execution at given points (typically when you need to add some code before proceeding, or to increase the suspance ;-));
- I tested the code on Octave (3.6.4) and MATLAB (2011a) and it runs on both. If you still have problems please let me know ASAP and we will sort them out.
On Cracking
Some time ago I cracked my first Mac app. Overall it was a nice experience and reminded me of good old times. Here are some comments about it:
- it was the first commercial app (without considering MATLAB which I use for work) that I actually found useful after 1 year of Mac. I think that is good, because it means opensource software still satisfies most of my needs (or is it bad, because it means I am becoming a lazy hipster now?)
- the tutorial by fG! has been precious to me, especially to quickly find the tools of the trade. I suggest it to anyone willing to start reversing on Mac
- I am not half bad, after all: I managed to do that with the trial version of Hopper, so only deadlisting + time limit, and that added some spice to the game (I know everyone is thinking about Swordfish now... but no, there was no bj in the meanwhile ;-))
- cracking is still pure fun, especially when you find that the protection is hidden in functions with names purposely chosen to mislead you (no, I won't tell more details, see below)
- I have immediately bought the app: it was cheaper than going to a cinema and cracking it was more entertaining than the average blockbuster movie, plus I am left with a great program to use... That's what I call a bargain!
I still do not agree with Daniel Jalkut, the developer of MarsEdit: I think he wasted time on a trivial protection to sell some closed-source code he should have shared freely (as in freedom). But don't misunderstand me... Who am I to judge what somebody should or should not do? The only reason why I say this is that MarsEdit is a cool program (which btw I am using right now) and, while it is worth all the money I payed, not being able to see it open sourced is a real pity. But I respect Daniel's thought and I think his work deserved to be supported.
I know not all of you think about it this way, and probably I might have thought about it differently too, years ago. One thing, however, never changed: cracking/reversing is so much more than getting software for free, and if you stop there you are missing most of the fun ;-)