As with validity and interpretability, responsiveness too needs a construct theory. Responsiveness refers to an instrument’s sensitivity to change, although just what kind of change a responsive instrument should be able to detect is somewhat controversial, i.e., clinically important changes, changes due to treatment effects, or changes in the true value of the underlying construct (Terwee et al. 2003). But regardless of the kind of change an instrument is meant to identify, the development of a measure requires information about the appropriate distance between units of change. Take a simple example. In the USA, infants are measured in grams because measuring in kilograms is not sensitive enough—the extra grams that an infant weighs can make a difference to their prognosis. But older children and adults are typically weighed to the nearest kilogram because the extra grams are negligible to most of their health outcomes. This decision is in part theory driven including (1) our theoretical understanding of mass, (2) the role body mass plays in our understanding of health outcomes, and (3) the application of this theoretical understanding to different populations, e.g., infants and adults.

What level of sensitivity to a change in quality of life, health status, or mobility, for instance, should PROMs employ? The sensitivity of a scale in the context of PROMs is determined by the number and kinds of questions posed to respondents. For example, single item scales—scales that only ask one question—are limited in sensitivity since they must divide rich variables (e.g., spasticity) into only a few levels (Hobart et al. 2007). But just how finely should we divide a variable? In part, the answer to this question can be understood statistically. Questions that are considered “too close” to one another will have overlapping standard errors, but this statistic can be manipulated by increasing the sample size, i.e., the greater the sample size the smaller the standard error around the item estimates.^{[1]} By increasing the sample size, one can increase the precision of the measure.

But as with questions about the appropriate sensitivity of measures of body mass, the responsiveness of a PROM requires a theory of the construct being measured, how that construct relates to other areas of interest, and how our theoretical under?standing relates to different populations. If we are trying to establish the effectiveness of a new drug using a PROM as one of the endpoints, then we need some theory that provides a representation of the measurement interaction in the context of the patient cohort as well as an understanding about how the construct in question relates to the condition or illness that is targeted by the new drug. These theoretical considerations cannot be achieved with statistics alone. Indeed, as I will argue below, determining the correct responsiveness of a scale must include considerations of value, in particular harms and benefits.

[1] Standard errors are a way of telling from a statistical perspective if x is significantly different fromy. If standard errors overlap, this tells us that, in the case of PROMs, two items are similar enoughto be indistinguishable.