Parameter Estimation and Multivariate Statistics
Having reviewed basic statistics in the previous chapters, we now turn to the statistical background we need to understand the probability models that underlie simple statistical modeling techniques and some of the probabilistic machine learning methods. Although not all data analysis techniques and machine learning methods are based on statistical models, statistical models unify many of the basic machine learning techniques that we will encounter later in the book.
FITTING A MODEL TO DATA: OBJECTIVE FUNCTIONS AND PARAMETER ESTIMATION
Given a distribution or “model” for data, the next step is to “fit” the model to the data. Typical probability distributions will have unknown parameters (numbers that change the shape of the distribution). The technical term for the procedure of finding the values of the unknown parameters of a probability distribution from data is “estimation.” During estimation, one seeks to find parameters that make the model “fit” the data the “best.” If this all sounds a bit subjective, that’s because it is. In order to proceed, we have to provide some kind of objective (mathematical) definition of what it means to fit data the best. The description of how well the model fits the data is called the “objective function.” Typically, statisticians will try to find “estimators” for parameters that maximize (or minimize) an objective function. And statisticians will disagree about which estimators or objective functions are the best.
In the case of the (univariate) Gaussian distribution, the parameters are called the “mean” and the “standard deviation” often written as p (mu) and о (sigma). In using the Gaussian distribution as a model for some data, one seeks to find values of mu and sigma that fit the data. As we shall see, what we normally think of as the “average” turns out to be an “estimator” for the p parameter of the Gaussian.