# Probability Sampling

Probability sampling techniques allow an investigator to specify the probability that a participant will be selected from a population. With probability sampling, all elements (e.g., individuals, skilled living facilities) in the target population have some opportunity of being included in a sample, and the probability of being included in the sample is known for each element in the population. Use of probability sampling techniques increases the likelihood that the sample included in a study is representative of the target population. There are four basic types of probability sampling techniques: random sampling, systematic sampling, stratified sampling, and cluster sampling. We briefly review each of these techniques in the following sections.

## Random Sampling

Random sampling refers to a procedure whereby each member of the population has an equal probability of being included in the sample. Thus, the probability of someone will be included in the sample in 1/N (N = size of the population) (e.g., if the population has 100 people, the probability of someone being included in the sample is .01 or 1%). The procedures for selecting a random sample can be as simple as selecting names from a hat if the population is small or using a table of random numbers if the population is larger. The advantage of using this approach is that the findings of a study can be generalized to a population with computable estimates of error. However, a disadvantage of using this approach is that the population might be spread out geographically (e.g., across different cities), which creates problems with feasibility. For example, assume the target population for an intervention study was spousal caregivers of patients with breast cancer in the United States. Clearly, this population would be spread across the 50 states, which would make it difficult to contact, recruit, and enroll a random sample of caregivers. Also, as noted, the sampling frame for a population may be incomplete or difficult to obtain. For this reason, behavioral intervention researchers rarely engage in pure random sampling.

An issue that is often discussed in the context of random sampling is *sampling with replacement* and *sampling without replacement,* which refers to methods used to select a random sample. Though not immediately relevant to behavioral intervention research, it is good to have basic familiarity with the concepts. Sampling with replacement means that, once a person is selected for a sample, he or she is put back into the population and could be sampled again; in other words, the person could be sampled again. Sampling without replacement means that, once a person is selected, he or she is not put back into the population for resampling (Frerichs, 2008). In general, sampling with replacement is a “more random” sample because in each case individuals have the same probability of being selected for inclusion in the sample. Referring back to our earlier example, with a sample size of 100, in this case the first person who is selected has a 1% probability of being selected. If that person is returned to the sample, the second person also has a 1% chance of being selected; however, that is not the case if the first person is not put back into the sample, thus they do not have the same probability of being selected. This distinction has relevance to statistical theories of sampling. However, in behavioral intervention research, the most common method of sampling is without replacement. We typically do not want to resample the same individuals. As noted, however, it is important to be aware of the distinction.

A cautionary note is that random sampling is not to be confused with random assignment of participants to a treatment condition. In this latter case, individuals are first selected on the basis of the study inclusion/exclusion criteria and then assigned using random methods to the different treatment conditions. This helps to ensure that the participants within the groups are as similar as possible and thus, if group differences on an outcome of interest are found, the differences are more likely due to treatment as opposed to differences within the composition of the groups.