LIKELIHOOD FOR GAUSSIAN DATA
To make the likelihood specific, we have to choose the model. If we assume that each observation is described by a Gaussian distribution, we have two parameters—the mean (p) and the standard deviation (a).
Admittedly, the formula looks complicated. But in fact, this is a very simple likelihood function. I’ll illustrate this likelihood function by directly computing it for an example (Table 4.1).
Notice that in this table I have chosen values for the parameters, p and a, which is necessary to calculate the likelihood. We will see momentarily that these parameters are not the maximum likelihood estimates (the parameters that maximize the likelihood), but rather just illustrative values. The likelihood (for these parameters) is simply the product of the five values in the right column of the table.
It should be clear that as the dataset gets larger, the likelihood tends to get smaller and smaller, but always is still greater than zero. Because the likelihood is a function of all the parameters, even in the simple case of the Gaussian, the likelihood is still a function of two parameters (plotted in Figure 4.1) and represents a surface in the parameter space. To make this figure, I simply calculated the likelihood (just as in Table 4.1) for a large number of pairs of mean and standard deviation parameters. The maximum likelihood can be read off (approximately) from this graph as the place where the likelihood surface has its peak. This is a totally reasonable way to calculate the likelihood if you have a model with one or two
TABLE 4.1 Calculating the Likelihood for Five Observations under a Gaussian Model
Observation (i) |
Value (X) |
P(X|0) = N(Xp = 6.5, о = 1.5) |
i |
5.2 |
0.i8269 |
2 |
9.i |
0.0592i2 |
3 |
8.2 |
0.i3993 |
4 |
7.3 |
0.23070 |
5 |
7.8 |
0.i8269 |
L = 0.000063798 |
Note: Each row ofthe table corresponds to a single observation (1-5).
FIGURE 4.1 Numerical evaluation of the likelihood function. The surface shows the likelihood as a function of the two parameters (the mean and standard deviation).
parameters. You can see that the standard deviation parameter I chose for this table (1.5) is close to the value that maximizes the likelihood, but the value I chose for the mean (6.5) is probably too small—the maximum looks to occur when the mean is around 7.5.
Figure 4.1 also illustrates two potential problems with numerical methods to calculate the likelihood. If your model has many parameters, drawing a graph becomes hard, and more importantly, the parameter space might become difficult to explore: you might think you have a maximum in one region, but you might miss another peak somewhere else in the high-dimensional space. The second problem is that you have to choose individual points in the parameter space to numerically evaluate the likelihood—there might always be another set of parameters between the points you evaluated that has a slightly higher likelihood. Numerical optimization of objective functions in machine learning is a major area of current research, and we will only scratch the surface in this book.