WHAT WON'T THIS BOOK COVER?
Despite several clear requests to include them, I have resisted putting R, python, MATLAB ®, PERL, or other code examples in the book. There are two major reasons for this. First, the syntax, packages, and specific implementations of data analysis methods change rapidly—much faster than the foundational statistical and computational concepts that I hope readers will learn from this book. Omitting specific examples of codes will help prevent the book from becoming obsolete by the time it is published. Second, because the packages for scientific data analysis evolve rapidly, figuring out how to use them (based on the accompanying user manuals) is a key skill for students. This is something that I believe has to be learned through experience and experimentation—as the PERL mantra goes, “TMTOWTDI: there’s more than one way to do it”—and while code examples might speed up research in the short term, reliance on them hinders the self-teaching process.
Sadly, I can’t begin to cover all the beautiful examples of statistical modeling and machine learning in molecular biology in this book. Rather, I want to help people understand these techniques better so they can go forth and produce more of these beautiful examples. The work cited here represents a few of the formative papers that I’ve encountered over the years and should not be considered a review of current literature. In focusing the book on applications of clustering, regression, and classification, I’ve really only managed to cover the “basics” of machine learning. Although I touch on them briefly, hidden Markov models or HMMs, Bayesian networks, and deep learning are more “advanced” models widely used in genomics and bioinformatics that I haven’t managed to cover here. Luckily, there are more advanced textbooks (mentioned later) that cover these topics with more appropriate levels of detail.
The book assumes a strong background in molecular biology. I won’t review DNA, RNA proteins, etc., or the increasingly sophisticated, systematic experimental techniques used to interrogate them. In teaching this material to graduate students, I’ve come to realize that not all molecular biology students will be familiar with all types of complex datasets, so I will do my best to introduce them briefly. However, readers may need to familiarize themselves with some of the molecular biology examples discussed.