# Introduction

- Background
- Probability Background
- Probability Distributions for Observations
- Gaussian Distribution
- Uniform Distribution
- Laplace Distribution
- Cauchy Distribution
- Logistic Distribution
- Exponential Distribution
- Location and Scale Families
- Sampling Distributions
- Binomial Distribution
- ˜2-distribution
- T-distribution
- F-distribution

Preface

This book is intended to accompany a one-semester MS-level course in non- parametric statistics. Prerequisites for the course are calculus through multivariate Taylor series, elementary matrix algebra including matrix inversion, and a first course in frequentist statistical methods including some basic probability. Most of the techniques described in this book apply to data with only minimal restrictions placed on their probability distributions, but performance of these techniques, and the performance of analogous parametric procedures, depend on these probability distributions. The first chapter below reviews probability distributions. It also reviews some objectives of standard frequentist analyses. Chapters covering methods that have elementary parametric counterparts begin by reviewing those counterparts. These introductions are intended to give a common terminology for later comparisons with new methods, and are not intended to reflect the richness of standard statistical analysis, or to substitute for an intentional study of these techniques.

Computational Tools and Data Sources

Conceptual developments in this text are intended to be independent of the computational tools used in practice, but analyses used to illustrate techniques developed in this book will be facilitated using the program R (R Core Team, 2018). This program may be downloaded for free from https://cran.r-project.org/ . This course will heavily depend on the R package MultNonParam. This package is part of the standard R repository CRAN, and is installed by typing inside of R:

1ibrary(MultNonParam)

Other packages will be called as needed; if your system does not have these installed, install them as above, substituting the package name for MultNonParam.

Calculations of a more heuristic nature, and not intended for routine analysis, are performed using the package NonparametricHeuristic. Since this

*Introduction*

package is so tightly tied to the presentation in this book, and hence of less general interest, it is hosted on the github repository, and installed via

library(devtools)

install_github("kolassa-dev/NonparametricHeuristic")

and, once installed, loaded into R using the library command.

An appendix gives guidance on performing some of these calculations using SAS.

Errata and other materials will be posted at

http://stat.rutgers.edu/home/kolassa/NonparametricBook as they become available.

Acknowledgments

In addition to works referenced in the following chapters, I consulted Stigler (1986) and Hald (1998) for early bibliographical references. Bibliographic trails have been tracked through documentation of software packages R and SAS, and bibliography resources from JSTOR, Citulike.org, Project Euclid, and various publishers’ web sites have been used to construct the bibliography. I am grateful to the Rutgers Physical Sciences librarian Melanie Miller, the Rutgers Department of Statistics Administrative Assistant Lisa Curtin, and the work study students that she supervised, and the Rutgers Interlibrary Loan staff for assistance in locating reference material.

I am grateful to my students at both Rutgers and the University of Rochester to whom I taught this material over the years. Halley Constantino, Jianning Yang, and Peng Zhang used a preliminary version of this manuscript and were generous with their suggestions for improvements. My experience teaching this material helped me to select material for this volume, and to determine its level and scope. I consulted various textbooks during this time, including those of Hettmansperger and McKean (2011), Hettmansperger (1984), and Higgins (2004).

I thank my family for their patience with me during preparation of this volume. I thank my editor and proofreader for their important contributions.

I dedicate this volume to my wife, Yodit.

# Background

Statistics is the solution to an inverse problem: given the outcome from a random process, the statistician infers aspects of the underlying probabilistic structure that generated the data. This chapter reviews some elementary aspects of probability, and then reviews some classical tools for inference about a distribution’s location parameter.

## Probability Background

This section first reviews some important elementary probability distributions, and then reviews a tool for embedding a probability distribution into a larger family that allows for the distribution to be recentered and rescaled. Most statistical techniques described in this volume are best suited to continuous distributions, and so all of these examples of plausible data sources are continuous.

### Probability Distributions for Observations

Some common probability distributions are shown in Figure 1.1. The continuous distributions described below might plausibly give rise to a data set of independent observations. This volume is intended to direct statistical inference on a data set without knowing the family from which it came. The behavior of various statistical procedures, including both standard parametric analyses, and nonparametric techniques forming the subject of this volume, may depend on the distribution generating the data, and knowledge of these families will be used to explore this behavior.

#### Gaussian Distribution

The normal distribution, or Gaussian distribution, has density

FIGURE 1.1: Comparison of Three Densities

The parameter /х is both the expectation and the median, and *a* is the standard deviation. The Gaussian cumulative distribution function is

There is no closed form for this integral. This distribution is symmetric about /t; that is, *fo( ^{x})* = /g(2/x

^{— x}

*)-,*and Fq(x) = 1 —

*F(;(2fi*—

*x).*The specific member of this family of distributions with //. = 0 and

*a =*L is called standard normal distribution or standard Gaussian distribution. The standard Gaussian cumulative distribution function is denoted by Ф(.г’). The distribution in its generality will be denoted by

*Щц, a*

^{2}).This Gaussian distribution may be extended to higher dimensions; a multivariate Gaussian random variable, or multivariate normal random variable, *X.* in a space of dimension *d,* has a density of form exp*(—(x* — *ц)* Y^{-1} (ж — /Lt)/2)detT^^{1}/^{2}(27r)-^{d}/^{2}. Here /i. is the expectation and T is the

variance-covariance matrix E [(X — *ц) ^{т}(X —* /х)].

#### Uniform Distribution

The uniform distribution has density

The cumulative distribution function of this distribution is

Again, the expectation and median for this distribution are both 0, and the distribution is symmetric about 0. The standard deviation is A//l2. A common canonical member of this family is the distribution uniform on [0,1], with 0 = 1/2 and A = 1. The distribution in its generality will be denoted by

**31(0, A).**

#### Laplace Distribution

The double exponential distribution or Laplace distribution has density The cumulative distribution function for this distribution is

As before, the expectation and median of this distribution are both 0. The standard deviation of this distribution is *a.* The distribution is symmetric about 0. A canonical member of this family is the one with 0 = 0 and (7 = 1. The distribution in its generality will be denoted by ia(0. *a ^{2}).*

#### Cauchy Distribution

Consider the family of distributions

with 0 real and у positive. The cumulative distribution function for such a distribution is

This distribution is symmetric about its median 0, but, unlike the Gaussian, uniform, and Laplace examples, does not have either an expectation nor a variance; the quantity у represents not a standard deviation but a more general scaling parameter. Its upper and lower quartiles are 0 ± y, and so the interquartile range is 2y. This distribution is continuous. The family member with 0 = 0 and у = 1 is the Cauchy distribution (and, when necessary to distinguish it from other members (1.1), will be called standard Cauchy), and a member with у = 1 but 0 A 0 is called a non-central Cauchy distribution.

An interesting and important property of the Cauchy relates to the distribution of sums of independent and identical copies of members of this family. If *X* and *Y* are independent standard Cauchy, then *Z = {X + Y)/*2

is standard Cauchy. One may see this by first noting that P *[Z < z] = *JToo f^{2}-~^{V} fc{x)fc(y) *dx dy,* and hence that *f _{z}(z) = fc(2z-y)fc{y) dy. *Dwass (1985) evaluates this integral using partial fractions. Alternatively, this fact might be verified using characteristic functions.

#### Logistic Distribution

The logistic distribution has density

This distribution is symmetric about *0.* and has expectation and variance *в *and ^{2}/3 respectively. The cumulative distribution function of this distribution is

The distribution in its generality will be denoted by *lo(6,a ^{2}).*

#### Exponential Distribution

The exponential distribution has density

The cumulative distribution function of this distribution is

The expectation is 1, and the median is log(2). The inequality of these values is an indication of the asymmetry of the distribution. The standard deviation is 1. The distribution will be denoted by

### Location and Scale Families

Most of the distributions above are or can be naturally extended to a family of distributions, called a location-scale family, by allowing an unknown location constant to shift the center of the distribution, and a second unknown scale constant to shrink or expand the scale. That is, suppose that *X* has density *f(x)* and cumulative distribution function *F(x).* Then *a* + *bX* has density *f((y — a)/b)/b* and the cumulative distribution function *F((y — a)/b).* If *X* has a standard distribution with location and scale 0 and 1, then *Y* has location *a* and scale *b.*

This argument does not apply to the exponential distribution, because the lower endpoint of the support of the distribution often is fixed at zero by the structure of the application in which it is used.

### Sampling Distributions

The distributions presented above in §1.1.1 represent mechanisms for generating observations that might potentially be analyzed nonparametricly. Distributions in this subsection will be used in this volume primarily as sampling distributions, or approximate sampling distributions, of test statistics.

#### Binomial Distribution

The binomial distribution will be of use, not to model data plausibly arising from it, but because it will be used to generate the first of the nonparametric tests considered below. This distribution is supported on {0,1,..., n} for some integer *n.* and has an additional parameter тг € [0,1]. Its probability mass function is (”)7Г^{Ж}(1 — *я) ^{п}~^{х}.* and its cumulative distribution function is

Curiously, the binomial cumulative distribution function can be expressed in terms of the cumulative distribution function of the *F* distribution, to be discussed below. The expectation is *nv.* and the variance is П7г(1 — 7r). The median does not have a closed-form expression. This distribution is symmetric only if 7Г = 1/2. The distribution will be denoted by 33in(n. тг).

The multinomial distribution extends the binomial distribution to the distribution of counts of objects independently classified into more than two categories according to certain probabilities.

### ˜2-distribution

If *XXf.* are independent random variables, each with a standard Gaussian distribution (that is, Gaussian with expectation zero and variance one), then the distribution of the sum of their squares is called the chi-square distribution, and is denoted xl- Hero the index *к* is called the degrees of freedom.

Distributions of quadratic forms of correlated summands sometimes have а у^{2} distribution as well. If У has a multivariate Gaussian distribution with dimension *k.* expectation 0 and variance matrix T. and if T has an inverse, then

One can see this by noting that T may be written as 0© . Then *X = **e ^{l}Y* is multivariate Gaussian with expectation 0, and variance matrix ©

^{-1}©0

^{T}0

^{-lT}=

*I,*where

*I*is the identity matrix with

*к*rows and columns. Then

*X*is a vector of independent standard Gaussian variables, and У

^{Т}Т-!У =

*X*

^{T}X.Furthermore, still assuming

the distribution of

is called a non-central chi-square distribution (Johnson et ah, 1995). The density and distribution function of this distribution are complicated. The most important property of this distribution is that it depends on <5j,..., only through £ = £T_i this quantity is known as the non-centrality parameter, and the distribution of *W* will be denoted by *xt(*0- This dependence on nonzero expectations only through the simple non-centrality parameter may be seen by calculating the moment generating function of this distribution.

If *Y* has a multivariate Gaussian distribution with dimension *k,* expectation 0 and variance matrix Y, and if Y has an inverse, then *X* = ©^{-1} *(Y* — *fi) *is multivariate Gaussian with expectation — and variance matrix *I*

Hence

### T-distribution

When *U* has a standard Gaussian distribution, *V* has a distribution, and *U* and *V* are independent, then *T = Uj JVjk* has a distribution called Student’s t distribution, denoted here by ФТ, with *к* called the degrees of freedom.

### F-distribution

When *U* and *V* are independent random variables, with *x* and distributions respectively, then *F =* (*U/k)/(V/m*) is said to have an F distribution with *к* and *m* degrees of freedom; denote this distribution by If a variable *T* has a distribution, then *T*^{2} has an *f _{m}* distribution.