SELECTION OF RELIABLE AND VALID STUDY VARIABLES
One of the most fundamental decisions that investigators must make in a prospective study is to select and carefully distinguish between independent variables (IVs) and dependent variables (DVs). This process is guided by both the nature of the research questions that are being posed and previous scientific literature in the area. In a classical experimental design, the IV often represents variables or factors that the investigator may manipulate such as those randomly assigned to receive Treatment A, Treatment B, or some type of control condition. IVs may also be factors that cannot be experimentally manipulated such as gender or ethnic/language group.
In contrast, DVs are measures of outcomes that one might be interested in measuring to answer the scientific questions or shown change as a consequence of exposure to a treatment or intervention. Proper selection of DVs is essential in comparing the results of a study to previous literature in the field. It is also essential that DVs be both reliable (can be measured with consistency) and valid (the test measures what it was intended to measure). Identification of IVs and DVs can occur early on in constructing an intervention as discussed in Chapter 3 in the prephase of discovery.
Equally important is the need to employ measures that have adequate reliability and validity (Campbell, Stanley, & Gage, 1963; Trochim, 2000). A common problem in the literature is the use of measures that may actually not have sufficiently high test-retest reliabilities (stability of measurement over time) or high interrater reliabilities (high agreement for a measure when used on the same study participant by different raters established by a coefficient of agreement such as a weighted kappa). “Validity” refers to the degree to which an instrument measures what it is supposed to measure. Many instruments may have face validity based on item content or event content validity as designated by expert consensus opinion, but this is not a substitute for concurrent validity (examining the association of the proposed measure with established valid measures in the field), factorial validity (the proposed measures load on a predicted construct using traditional factor analytic techniques or linear structural equation modeling [SEM]), or discriminant validity (the proposed measure discriminates among well-defined groups identified by an accepted “gold standard,” using techniques such as discriminant function analysis [DFA], logistic regression, or receiver operator curve [ROC]).
Careful selection of measures that have high levels of reliability and validity can greatly enhance the internal validity of a study, or, in other words, heighten the ability to conclude that outcomes are a consequence of an intervention versus other confounding factors. However, equally important is external validity, which is the generalizability of a measure or finding to a real-world population (Rothwell, 2005). This construct is of critical importance since the goal of inferential statistics is to generalize from a given sample to a population (see Chapter 9 on sampling). One limitation of much of the current research is that samples for intervention studies may be randomly assigned to groups, but the participant pool may not adequately reflect the population as a whole. Research participants are often brighter, more highly motivated, and differ in important characteristics from the population as a whole. Further, in double-blind drug trials or nonpharmacological interventions, inclusion and exclusion criteria may not reflect real clinical populations that may have many more comorbid conditions than the sample included in an initial test of an intervention.
Another issue related to measurement choice is that there is an unfortunate tendency for some investigators to venerate or reify a measure based on the name of a scale or its historical usage. For example, the Center for Epidemiological Studies- Depression (CES-D) (Radloff, 1977) scale is often used as a measure of depression in caregiver research. However, the actual diagnosis of depression requires an extensive structured interview by a well-trained clinician using standard diagnostic criteria such as the DSM-5. The CES-D can be described as a self-report scale of depressive symptoms, but may indicate depression in those without a clinical diagnosis of depression or fail to identify true depression when a person refuses to disclose or underreport symptoms. Further, measures of depression such as the CES-D may not be specific to depression, but may reflect anxiety or generalized psychological distress. As a result, there is a potential world of difference between a measure of reported depressive symptoms and the actual presence of clinical depression (Breslau, 1985).
Another issue to consider for intervention studies in which there is long-term follow-up is to ensure that the same construct is being measured across occasions and groups, referred to as longitudinal or measurement invariance (Schaie, Maitland, Willis, & Intrieri, 1998). This issue is especially important in studies where the scales that were used to assess a construct are changed or a new scale is added to refine the measurement of the construct. Other threats to validity include the effects of history, reactivity to testing, statistical regression, experimental mortality or attrition, and developmental processes (Schaie, 1988).