11.1.1 Expectation Maximization

The expectation maximization (EM) algorithms can be used for clustering. Given the data, EM learns a theory that specifies how each example should be classified and how to predict feature values for each class. The general idea is to start with a random theory or randomly classified data and to repeat two steps until they converge on a coherent theory:

E:
Classify the data using the current theory.
M:
Generate the best theory using the current classification of the data.

Step E generates the expected classification for each example. Step M generates the most likely theory given the classified data. The M step is the problem of supervised learning. As an iterative algorithm, it can get stuck in local optima; different initializations can affect the theory found.

The next two sections consider two instances of the EM algorithm.