Section III Regression
Of the major topics in machine learning and statistical modeling, regression is probably the one that needs least introduction. Regression is concerned with finding quantitative relationships between two or more sets of observations. For example, regression is a powerful way to test for and model the relationship between genotype and phenotype, still one of the most fundamental problems in modern biology. In this chapter, we’ll also see how regression (and some of its extensions) can be used to explore the relationship between mRNA and protein abundance at the genome scale.
Regression was probably the first statistical technique: Gauss was trying to fit the errors in regression when he proposed his eponymous distribution, although the name regression was apparently coined by the nineteenth-century geneticist Francis Galton. As we have done in previous chapters, we’ll first spend considerable effort reviewing the traditional regression that is probably at least somewhat familiar to most readers. However, we will see that with the advent of numerical methods for local regression, generalized linear models, and regularization, regression is now among the most powerful and flexible of machine learning methods.