# Laplace's Bayesian Analysis (1774-1781)

Pierre-Simon Laplace was the pre-eminent mathematician of the second half of the eighteenth century. He made particularly important contributions in the fields of astronomy and celestial mechanics. Over his lifetime, he also sporadically committed significant time and energy to the study of probability and statistics. He did more than anyone else of his time to develop the mathematics of Bayesian statistical inference (including Bayes). Laplace’s employment of the Bayesian approach and the use of the uniform prior distribution doubtless did much to add credence to the logic of the Bayesian approach. Writing in the 1950s, Sir Ronald Fisher noted:

The superb pre-eminence of Laplace as a mathematical analyst undoubtedly inclined mathematicians for nearly fifty years to the view that the logical approach adopted by him had removed all doubts as to the applicability in practice of Bayes’ theorem ... In spite of the high prestige of all that flowed from Laplace’s pen, and the great ability and industry of his expositors, it is yet surprising that the doubts which such a process of reasoning from ignorance must engender should begin to find explicit expression only in the second half of the nineteenth century, and then with caution.^{[1]}

Laplace produced two important papers on Bayesian statistical inference in the two decades following the publication of Bayes’ seminal paper: in 1774, ‘Memoir on the Probability of Causes of Events’^{[2]} was published in the journals of the French Royal Academy of Sciences; and ‘Memoir on Probabilities’^{[3]} was presented to the Academy in 1780 and published in 1781. Bayes’ paper did not become known in continental Europe until around 1780—the first Laplace paper was published without any knowledge of Bayes’ developments. But Laplace developed a remarkably similar line of attack to the problem of statistical inference. He introduced his 1774 paper with the following statement:

If an event can be produced by a number *n* of different causes, then the probabilities of these causes given the event are to each other as the probabilities of the event given the causes, and the probability of the existence of each of these is equal to the probability of the event given that cause, divided by the sum of all the probabilities of the event given each of these causes.

The above statement is Bayes’ Theorem with a uniform prior distribution!^{[4]} Laplace had independently developed the Bayesian framework for ‘inverting’ probability statements about the behaviour of a sample given a population into statements about a population given sample data via the assumption of a uniform prior distribution. Unlike Bayes however, who presented a proof that was detailed to the point of philosophical obscurity, Laplace presents this principle without any proof or even justification. Bayes had thought deeply and philosophically about the fundamentals of probability and inference. Laplace, on the other hand, was a mathematician whose primary interest was in applying his renowned analytical capabilities to the mathematical problems that probability presented.

Laplace considered the same binomial problem as Bayes. That is, given a sample of *n* binary observations of which *x* are ‘successes’, what is the probability distribution of the population probability of success, ffl? He obtained the same result as Bayes for the posterior distribution of ffl. We noted above that Bayes was unable to find an analytical solution, or indeed a good approximation, for the integral when *n* and (n-x) were large. Laplace, however, used his superior mathematical skills to manipulate the integral so as to provide a much more accurate numerical solution than the crude limits that Bayes (and Price) had been able to find. The posterior probability distribution that could be inferred from a finite sample of binomial observations could now be quantified quickly and accurately.

In his papers of 1774 and 1781, Laplace took a number of further important steps beyond Bayes. Bayes’ paper had presented how to generate a posterior probability distribution for the probability of the ‘success’ of a binary event, based on a sample of *n* observations and the assumption of a uniform prior distribution. But he had not explicitly considered how to use this posterior distribution to infer a single ‘best estimate’ for the success probability. Some measure of central tendency of the posterior distribution would be a natural candidate for the estimate, but Bayes never explicitly discussed this topic. Laplace addressed this directly, and suggested two possible approaches for determining the best estimate: the value that makes it equally likely that the ‘true’ value will be larger or smaller according to the posterior distribution, i.e. the posterior median; and the value that minimises the probability-weighted sum of the absolute differences between the observed value and the best estimate, where the probability weights were obtained from the posterior distribution. Laplace then proved that these two approaches were mathematically identical and hence would always produce the same value for the best estimate.

Laplace also moved beyond Bayes in terms of to what he applied the Bayesian prior/posterior framework. Bayes’ paper had focused on binary events (success or failure; a ball landing to the right or left of another). Laplace moved onto variables that could take a continuum of sizes. His main motivation for this was found in astronomical observation. Astronomers of the time found that observations of an object in the sky, such as the planets Jupiter or Saturn, were not perfectly consistent with each other—some empirical observation error would inevitably arise from the physical, manual process of observing planetary positions in the sky. How should these observations be combined to find the best estimate of the position of the planet? This problem had bedevilled some of the greatest mathematicians of the age, including Euler, who could not find a satisfactory way of ‘solving’ a system of inconsistent linear equations.^{[5]}

Laplace wished to find a best estimate for some continuous variable *V,* based on a limited sample of observations, say, v_{1}, ... v_{n}. He tackled this using the Bayesian framework—that is, by obtaining the posterior distribution produced by assuming a uniform prior distribution for *V*, and then integrating the conditional probabilities of observing v_{1}, ... v_{n}, over all possible values of V Laplace’s best estimate would then be the median of this posterior distribution. In Bayes’ specification of the problem, the conditional probabilities that he had to sum across were already defined: because he was considering binary events, the sum of observations had a binomial distribution. Laplace’s more general problem meant that he had to specify the conditional probability distribution for observing v1, . vn for a given value of *V*. But how could he determine a universally useful form of distribution for these conditional probabilities?

In order to tackle this generic specification of the sampling distribution, he made a subtle change of tack. Instead of considering the conditional probabilities for v_{1}, ... v_{n}, he considered the probabilities of the *differences* or observation *errors* between v_{i} and *V:* e_{i} = v_{i} - *V.* This simple change immediately allowed some intuitive general criteria to be established for the shape of the sampling probability distribution. In his 1774 paper, Laplace specified three criteria: the error probability distribution should be symmetric around zero (as the errors should be as likely above as below); the error probability should tend to zero as the error tends to infinity in both directions (small errors should be more probable than large errors); and, of course, the area under the error probability distribution must integrate to one. This still left an unlimited number of possible functions that could be specified for the error probability distribution and at the time no good reason existed to choose a particular one amongst them. He considered some specific error distribution choices, but was unable to find approaches that were amenable to analytical solution for large sample sizes and that did not involve the introduction of additional arbitrary parameters. A ‘best estimate’ of a population parameter as a function of sample data could not be mathematically defined without an explicit error distribution. Laplace’s journey to a general solution to statistical inference had reached a dead end.