Display an image with k-representative colours. (k < number of colours in original image)
python driver.py <k-output-colours> {kmeans, kmeans++} /path/to/image.jpg
example: python driver.py 6 kmeans++ ./park.jpg
Basic usage with default inputs. Will default to 'kmeans' and './park.jpg'.
python driver.py <k-output-colours>
example: python driver.py 6
In the park picture below, we can see that the trees are very brightly coloured and distinct from the rest of the scenary.
With k=6, we see that Kmeans colour clustering does not really distinguish the trees from the buildings, where as kmeans++ is able to capture this contrast. This maybe due to the image having comparatively fewer orange/red pixels (colour of the trees) to blue/grey pixels (colour of buildings, sky, water), and due to the way Kmeans and Kmeans++ initialize their means.
Since there are fewer orange/red pixels to blue/grey pixels, the uniformally random initial means selected by Kmeans is unlikely to be orange/red. Even in further iterations, since there are so few orange/red pixels (even within their own cluster), it becomes difficult to shift the weight of their closest mean to more of the red spectrum.
In comparison, Kmeans++ gives more weight to selecting means that are significantly different from the means chosen so far. In this way, even though the first selected mean may be from grey/blue pixels, there is a much higher chance for a orange/red mean to be selected afterwards. Hence the distinction between the red/orange trees and everything else in Kmeans++.
Also note that Kmeans took 99 loops until convergence, while Kmeans++ took 53 loops.
My photograph of George Wainborn Park, Vancouver, Canada. |
Kmeans, where k=6. | Kmeans++, where k=6. |
In the Lego pictures below, where k=4, we can see the Kmeans and Kmeans++ produce very similar outputs. This is in part, due to Lego's discretize colour palette, which makes determining the colour clusters more clear-cut.
Note for the output below, Kmeans took 26 loops, while Kmeans++ took 17 loops.
My photograph of a Lego living room. |
Kmeans, where k=4. | Kmeans++, where k=4. |
- Run Kmeans and Kmeans++ multiple times and compare their convergence times
- Compare the level of variance in the means of Kmeans and Kmeans++