Validity is also an important factor in selection of outcome measures. At a global level, it is defined as the extent to which a measure is assessing what it is intending to measure; whether the measure actually captures the outcome of interest. It is important to recognize that the validity of a measure varies according to the population of interest. For example, a vocabulary test that is based on items related to current pop culture may not be a valid instrument for older adults or individuals from other cultures. Like reliability, there are various types of validity.
Face validity refers to the extent to which an item appears to measure what it is intended to measure. Of course, this is a subjective assessment. Importantly, this should be based on the perspective not only of the researcher but also of the intended responders.
Content validity refers to the extent to which a measure includes all of items or taps all of the issues that are important to the construct or outcome being assessed or the degree to which a measure accurately and comprehensively captures an outcome of interest. For example, a measure intended to broadly assess personality traits would have low content validity if it contained only items related to extroversion. This aspect of validity is particularly important if one is developing a new measure for a study. For example, in the PRISM trial, it was necessary to develop a measure that evaluated participants’ computer proficiency. In this case, it was important that the items in the measure were current in terms of today’s computer technology and also captured all of the aspects of proficiency (see Boot et al., 2015).
Concurrent validity refers to the degree to which a measure is correlated with or related to another established measure or indicator that taps the same construct. One might examine the correlation or degree of relationship between a scale assessing depression and clinical ratings of depression.
Predictive validity is the extent to which performance on a measure of interest is related to a later performance that the measure was designed to predict. A classic example is the degree to which performance on the SAT (a test taken by high school students) predicts future performance in college or performance on a cognitive measure is predictive of an individual’s development of a future adverse outcome such as dementia.
Discriminative validity is an extremely important type of validity and refers to the extent to which a measure can discriminate among groups or individuals who vary on some dimension. An example might be a measure of cognition or functional performance that discriminates between noncognitively impaired older adults and older adults with mild dementia. It is usually discussed according to two dimensions: sensitivity and specificity. Sensitivity refers to the ability of a measure to correctly identify true cases (e.g., cognitively impaired) of some dimension whereas specificity refers to the ability to correctly identify nonaffected cases. It is important to remember that sensitivity and specificity are specific to a particular test. The utility of a test to discriminate in clinical populations is dependent on the base rate of particular disorders in the population. This gives rise to the terms positive predictive power and negative predictive power. Unfortunately, very low prevalence in a population (e.g., the number of completed suicides, number of patients with a rare illness) will result in poor positive predictive power for low occurrence outcome despite excellent sensitivity and specificity of a test. In general, low base rates result in lower positive predictive values whereas higher base rates result in lower negative predictive values. However, for the purposes of selecting outcome measures, techniques such as logistic regression, discriminant function analyses, and calculation of possible sensitivities and specificities under a receiver operator curve (ROC) will provide an investigator with the best means of determining discriminative validity.
Ecological validity is generally thought of as the ability to generalize results to natural or real-world situations and depends on capturing the critical elements of environments, tasks, and behaviors. In this case, ecological validity refers to the extent to which measures capture the relevant features of tasks and environments. For example, within the realm of cognition, there is a concern that, although standard neuropsychological measures provide important information regarding an individual’s cognitive abilities, they have low ecological validity in the sense that they do not provide information relative to functioning in everyday activities. In this regard, our group has developed a battery of computer-based simulations of common everyday activities such as use of an ATM, refilling a prescription, using a ticket kiosk, and medication management. Preliminary data with diverse older adult populations suggest that the measures have test-retest reliability, face validity, and discriminant validity (Czaja, Harvey, & Loewenstein, 2014; see Chapter 15 for more discussion of this topic).