# WHY ARE THERE MATHEMATICAL CALCULATIONS IN THE BOOK?

Although most molecular biologists don’t (and don’t want to) do mathematical derivations of the type that I present in this book, I have included quite a few of these calculations in the early chapters. There are several reasons for this. First of all, the type of machine learning methods presented here are mostly based on probabilistic models. This means that the methods described here really are mathematical things, and I don’t want to “hide” the mathematical “guts” of these methods. One purpose of this book is to empower biologists to unpack the algorithms and mathematical notations that are buried in the methods section of most of the sophisticated primary research papers in the top journals today. Another purpose is that I hope, after seeing the worked example derivations for the classic models in this book, some ambitious students will take the plunge and learn to derive their own probabilistic machine learning models. This is another empowering skill, as it frees students from the confines of the prepackaged software that everyone else is using. Finally, there are students out there for whom doing some calculus and linear algebra will actually be fun! I hope these students enjoy the calculations here. Although calculus and basic linear algebra are requirements for medical school and graduate school in the life sciences, students rarely get to use them.

I’m aware that the mathematical parts of this book will be unfamiliar for many biology students. I have tried to include very basic introductory material to help students feel confident interpreting and attacking equations. This brings me to an important point: although I don’t assume any prior knowledge of statistics, I do assume that readers are familiar with multivariate calculus and something about linear algebra (although I do review the latter briefly). But don’t worry if you are a little rusty and don’t remember, for example, what a partial derivative is; a quick visit to Wikipedia might be all you need.

Across Statistical Modeling and Machine Learning on a Shoestring ? 9

**A PRACTICAL GUIDE TO ATTACKING A MATHEMATICAL FORMULA**

For readers who are not used to (or afraid of) mathematical formulas, the first thing to understand is that unlike the text of this book, where I try to explain things as directly as possible, the mathematical formulas work differently. Mathematical knowledge has been suggested to be a different kind of knowledge, in that it reveals itself to each of us as we come to "understand" the formulas (interested readers can refer to Heidegger on this point). The upshot is that to be understood, formulas must be contemplated quite aggressively— hence they are not really read, so much as "attacked." If you are victorious, you can expect a good formula to yield a surprising nugget of mathematical truth. Unlike normal reading, which is usually done alone (and in one's head) the formulas in this book are best attacked out loud, rewritten by hand, and in groups of 2 or 3.

When confronted with a formula, the first step is to make sure you know what the point of the formula is: What do the symbols mean? Is it an equation (two formulas separated by an equals sign)? If so, what kind of a thing is supposed to be equal to what? The next step is to try to imagine what the symbols "really" are. For example, if the big "sigma" (that means a sum) appears, try to imagine some examples of the numbers that are in the sum. Write out a few terms if you can. Similarly, if there are variables (e.g., x) try to make sure you can imagine the numbers (or whatever) that x is trying to represent. If there are functions, try to imagine their shapes. Once you feel like you have some understanding of what the formula is trying to say, to fully appreciate it, a great practice is to try using it in a few cases and see if what you get makes sense. What happens as certain symbols reach their limits (e.g., become very large or very small)?

For example, let's consider the Poisson distribution:

First of all, there are actually three formulas here. The main one on the left, and two small ones on the right. Let's start with the small ones. The first part is a requirement that X is a positive number. The other part tells us what *X *is. I have used fancy "set" notation that says "X is a member of the set that contains the numbers 0, 1, 2 and onwards until infinity." This means *X* can take on one of those numbers.

The main formula is an equation (it has an equals sign) and it is a function—you can get this because there is a letter with parentheses next to it, and the parentheses are around symbols that reappear on the right. The function is named with a big "P" in this case, and there's a "|" symbol inside the parentheses. As we will discuss in Chapter 2, from seeing these two together, you can guess that the "P" stands for probability, and the "|" symbol refers to conditional probability. So the formula is giving an equation for the conditional probability of *X* given X. Since we've guessed that the equation is a probability distribution, we know that *X* is a random variable, again discussed in Chapter 2, but for our purposes now, it's something that can be a number.

Okay, so the formula is a function that gives the probability of X. So what does the function look like? First, we see an "e" to the power of negative X. *e* is just a special number (a fundamental constant, around 2.7) and X is another positive number whose value is set to be something greater than 0. Any number to a negative power gets very small as the exponent gets big, and goes to 1 when the exponent goes to 0. So this first part is just a number that doesn't depend on X. On the bottom, there's an X! The factorial sign means a! = *a* x (a - 1) x (a - 2) x ••• x (2 x 1), which will get big "very" fast as *X *gets big. However, there's also a *X ^{X}* which will also get very big, very fast if X is more than 1. If X is less than 1,

*X*will get very small, very fast as

^{X},*X*gets big. In fact, if X is less than 1, the X! will dominate the formula, and the probability will simply get smaller and smaller as

*X*gets bigger (Figure 1.1, left panel). As X approaches 0, the formula approaches 1 for

*X*= 0 (because any number to the power of 0 is still 1, and 0! is defined to be 1) and 0 for everything else (because a number approaching zero to any power is still 0, so the formula will have a 0 in it, no matter what the value of X). Not too interesting. If X is more than 1, things get a bit more interesting, as there will be a competition between

*X*and X! The

^{X}*e*term will just get smaller. It turns out that factorials grow faster than exponentials (Figure 1.1, right panel), so the bottom will always end up bigger than the top, but this is not something that would be obvious, and for intermediate values of X, the exponential might be bigger (e.g., 3! = 6 < 2

^{3}= 8).

Another interesting thing to note about this formula is that for *X* = 0 the formula is always just e^{-X} and for *X* = 1, it's always Xe^{-X}. These are

FIGURE 1.1 Graphs illustrating some things about the formula for the Poisson distribution. The left panel shows the value of the formula for different choices of X. On the right is the competition between *X ^{х}* and

*X!*for X = 4. Note that the у-axis is in log scale.

equal when *X* = 1, which means that the probability of seeing 0 is equal to the probability of seeing 1 only when *X* = 1, and that probability turns out to be 1/e.

So I went a bit overboard there, and you probably shouldn't contemplate that much when you encounter a new formula—those are, in fact, thoughts I've had about the Poisson distribution over many years. But I hope this gives you some sense of the kinds of things you can think about when you see a formula.