# APPENDIX D. The Impact of Chance Variability in Simple Random Sampling

In Chapter Five, we discussed the possibility of using a random selection process as a cost- and time-saving method for reducing the number of applicants to process. However, one such method–a simple random sample–has some disadvantages, because chance alone will likely result in a sample that does not exactly match the demographic profile of the pool of people from which the sample was drawn. This appendix illustrates how much the demographic profile of a simple random sample can be expected to vary.

## Overview of the Reasoning for Selecting a Random Sample

Historically, a much larger number of applicants take and pass the written test than can be admitted to the next phase of the process. One option for selecting candidates from among those who passed their written test is to draw a simple random sample from the qualified pool of applicants. The size of the sample would be determined to coincide with a reasonable number that the system can accommodate. The diversity of the sample would reflect the diversity of the pool of applicants. That is, the expected proportion of such a sample that is non-Hispanic white and the expected proportion that is male are each equal to the respective proportions in the qualified pool of applicants.^{[1]} Suppose a given qualified pool of applicants is 50 percent white,^{[2]} and a large number of random samples are drawn from the applicant pool; the average percentage of whites across all the samples would be approximately 50 percent. However, the percentage of whites in any one individual random sample may naturally deviate from 50 percent, and, of course, only a single sample would be drawn when implementing a random sample to determine which qualified applicants move to the next phase. Below, we investigate how far we may expect individual random samples to deviate from the representative percentages of whites and males in the qualified applicant pool.

The 2013 applicant cohort taking and passing the written test was 50.34 percent white and 94.30 percent male. We use these as baseline percentages for a qualified applicant pool for a hypothetical future random sample, and consider random samples of qualified applicants ranging in size from 300 to 1,000. Figure D.l displays a series of probability intervals for the proportion of whites in a random sample from an applicant pool that is 50.34 percent white. The blue lines on the graph represent an 80 percent probability interval for each sample size indicated on the horizontal axis; 80 percent of the time, the percentage of whites in a sample will fall between the upper and lower blue lines in the graph. Ten percent of the time, the percentage of whites in the sample will be above the upper blue line, and 10 percent of the time it will be below the lower blue line. For example, for a sample size of 300 (the leftmost endpoints on the graph), the 80 percent probability interval is [46.6%, 54.0%]. With an individual sample of size 300 from a pool that is 50.3 percent white, there is a 1-in-10 chance that white representation in the sample will exceed 54.0 percent, and, similarly, a l-in-10 chance that the percentage of whites in the sample would fall below 46.6 percent. Figure D.l also displays 50 percent (black line), 90 percent (green line), and 98 percent (red line) probability intervals.

**Figure D.1**

**Probability Intervals for the Percentage of Whites in a Random Sample from an Applicant Pool That Is 50.34 Percent White**

White representation in a simple random sample from an applicant pool that is 50.3 percent white has a l-in-4 chance of falling above the black line, a 1-in-20 chance of falling above the green line, and a 1-in- 100 chance of falling above the red line, with a similar chance of falling below the lower line for each color. As seen in the graph, the width of the probability intervals shrinks as the sample size increases, implying that as sample size increases the chances of experiencing a sample that deviates from applicant pool representation by a particular amount get smaller. However, in sample sizes closer to 1,000, the rate of improvement is smaller than is seen in smaller sample sizes.

**Figure D.2**

**Probability Intervals for the Percentage of Males in a Random Sample from an Applicant Pool That Is 94.30 Percent Male**

Across the full range of sample sizes examined (300 to 1,000), the chance of having white representation in the sample more than five percentage points above the representation of 50.34 percent in the applicant pool is always less than 1 in 20, and for sample sizes of 542 and above, that chance is less than 1 in 100. The same probabilities hold for deviations of five percentage points below applicant pool representation.

Figure D.2 presents analogous probability intervals for the percentage of males in a simple random sample from an applicant pool that is 94.30 percent male. Because the percentage of the applicant pool that is male is close to 100 percent, the probability intervals are smaller in this case.^{[3]} Consequently, the chance of experiencing a male representation in the sample that deviates even three percentage points below applicant pool representation is less than 1 in 100 for sample sizes of 323 and above, with similar chances of a deviation of three percentage points above.

- [1] In this section, we focus on the percentages of non-Hispanic white and males selected. However, the rationale presented applies to all races/ethnicities and both genders, and well as any other sub-group represented in the sample.
- [2] Throughout, “white” is intended to mean non-Hispanic white only.
- [3] In general, such intervals would be largest when representation in the qualified applicant pool is 50 percent and become smaller as the representation moves away from 50 percent in either direction.