"LEARNING SIGNAL" IN THE K-MEANS ALGORITHM
One very powerful way to think about any iterative machine learning algorithm is the idea of the "learning signal." The idea here is that if the iterative procedure is to be effective, the values of the parameters in the next iteration should be a little "better" than they currently are. In order for the parameters to improve, there has to be some sort of push or pull in the right direction. A good example of a learning signal might be some measure of the "error"— larger error might be a signal that we are going in the wrong direction, and a reduction in the error might be a signal that we are going in the right direction. In the case of the K-means algorithm, the parameters we update are the means of each cluster, and the error that we would like to minimize is the distance between each observation and its corresponding mean. Each observation therefore "pulls" the mean of the cluster toward it. Consider an iteration where we will add one observation, q, to the kth cluster. If we let the number of observations assigned to the kth cluster be nk = Zik, the new
mean for the kth cluster once the new observation had been added is exactly
where I have added the qth observation to all the others previously in the cluster and increased the denominator (the number of points in the average) by one. So the change in the mean due to the additional observation being assigned to the cluster is
which is a vector in the direction of the difference between the new observation and the mean. Similarly for an observation being removed from a cluster, we have
and the change is
which is also a vector in the direction of the difference between the observation and the mean. We can add up all of these effects, and write the change in mean in one iteration using clever indicator variable notation:
If you can understand this formula, you are a master of indicator variables and are ready to tackle machine learning literature. Notice how only the observations whose cluster assignments change will be included in the sum (all others will be multiplied by 0).
Anyway, the point of this formula was to show that the change in mean in the next iteration will be a weighted sum of the observations that have been added and removed from the cluster, along the direction of the vector between the observation and the mean. This can be thought of as the "learning signal": The new observations that have arrived have pulled the mean in their direction, and the observations that have left the cluster have given up pulling (or pushed away) the cluster mean. The observations are telling the mean that it's not quite in the right place to have the smallest error.