# Sample Size

The size of a sample is also a critical consideration when developing and evaluating an intervention. Size will vary depending upon the phase along the pipeline. For example, the sample size needed for a small proof of concept study or for testing feasibility or usability of a component of an intervention will differ from a large efficacy trial. The size of a sample must be considered early on in the planning process of any evaluation, in particular, when the goal is to establish the efficacy/effectiveness of an intervention or to examine the comparative effectiveness between two interventions. This typically involves a comparison between two or more groups; that is, treatment versus control; or treatment A versus treatment B. In both cases, one can derive precise and accurate conclusions only with an appropriate sample size.

Whereas statistical power may not be a concern in the developmental phases of an intervention, it is extremely important in the evaluation phases when comparing two or more groups. The size of the sample influences the statistical power of the study—the extent to which the study can detect differences between groups. Power is a function of the criterion established for statistical significance (alpha level), the difference that exists between the groups (effect size) and the sample size (Kazdin, 1994). These four concepts are interrelated in the sense that, when three of them are known, the remaining one can be determined. To determine the sample size needed for a study, decisions can be made regarding the other three parameters: alpha, power, and effect size.

Building on an example from Kazdin (1994), assume you are interested in determining the needed sample size for a study evaluating the effectiveness of two different psychosocial interventions for family caregivers and your primary outcome is caregiver burden as measured by the Zarit Burden Interview (Zarit, Reever, & Bach-Peterson, 1980). The chosen alpha for the study is .05, the power is .80, and the estimated effect size is .40; based on available tables (Cohen, 1988), the needed sample size is 40 participants per group or a total of 80 caregivers. There are also computer programs available to conduct power analyses. If the required sample size is not feasible owing to availability of participants or budgetary/staffing issues, the alpha level can be varied or the power can be reduced slightly (e.g., .75). Estimates for an effect size can be obtained from prior research studies, the literature, or meta-analyses. It is generally recommended to select a conservative estimate of an effect size (Kazdin, 1994). Power estimates must also include any planned subgroup analyses in order to make sure that the comparisons of interest will be sufficiently sensitive to detect differences if, in fact, they exist. It is also important to plan for attrition—typically estimates of 15% to 20% are used, but rates can vary vastly depending upon the targeted population. For example, trials involving caregivers of individuals with dementia can have attrition rates as high as 50% over a short time frame (e.g., 6 months) because of the vulnerabilities of this population and high risk for hospitalizations and death. However, in the previous example, to achieve sufficient power after accounting for an estimated attrition of 20%, a sample of 96 participants would be needed.

It is important to derive sample size estimates prior to the beginning of a study to ensure that the study is sufficiently powered. For most funding agencies, power calculations are an important element of the proposed methodology for which an investigator will be evaluated. Power calculations are also now required by many refereed journals when reporting a randomized intervention trial. Understanding the required number of participants is also important with respect to planning the study recruitment strategy, budget, staffing requirements, and timeline. It may also allow an investigator to make any necessary adjustments to the study design, although this is not particularly desirable. For example, if, upon entry into the field, the required sample size is not obtainable, a decision might be made to reduce the number of experimental groups, or assume effect sizes will be challenging to detect.

For example, assume that an investigator is interested in examining the impact of computer gaming on the cognitive functioning of older adults by comparing computer gaming with crossword puzzles. The initial plan may be to examine this across three age groups: younger, middle-aged, and older. Thus, the study design would be a 2 (gaming vs. crossword puzzles) X 3 (age group) design. However, a power analysis indicates that the required sample size to achieve an effect size of .75 at an alpha level of .05 is 35; thus, a total sample size of 210 participants (35 per 6 cells) would be needed to achieve the desired power. After accounting for attrition (20%), the actual number would be 252 participants or 42 per cell. The investigator might determine that it would not be feasible to recruit this number of participants and thus could decide to eliminate the younger age group as inclusion of this group was not critical to the goals of the research.

Obtaining the appropriate number of participants in a study is a critical aspect of behavioral intervention research. At the efficacy/effectiveness phase, a small sample size may lead to a falsely negative Type II error (accepting the null hypotheses that there is no difference between study groups), and there is a risk that an effective intervention may not be recognized. Of course, a very large sample size is also not recommended as it is costly and can result in a waste of resources and unnecessarily increase the duration of a study. Sometimes, a sampling procedure involves *oversampling* where a large portion of individuals with a particular characteristic are sampled. This strategy is used to help ensure that the study will have sufficient data for a particular group or subgroups. For example, it may be the case that an investigator is recruiting from a geographic region where the prevalence of a particular ethnic/racial group represents a small portion of the population. In this case, an investigator might *oversample* individuals from this ethnic/racial group to ensure that this group is sufficiently represented in the sample.

Suffice it to say that, at the study design phase, there is often a tension between feasibility issues, cost constraints, sample size, and composition considerations. All of these issues need careful consideration before trial implementation as sampling decisions have significant implications for recruitment efforts (Chapter 10) and the internal and external validity of the study. In the following section, we discuss various sampling methods.