# Mediation Analysis

## Introduction to Mediation Analysis

There are many ways to hypothesize the causal relationship between outcome and variables. For example, in the aforementioned example of the SAFI study, we may hypothesize that the association between vasopressors and mortality are influenced by a mediator. The concept of mediation has been used in social science and psychology literature for many decades (e.g., Rucker et al. 2011). Mediation analysis attempts to characterize how the exposure or treatment affects an intermediate variable, and how the affected intermediate variable influences the outcome. This provides insight into biological mechanisms or pathways by which an exposure or treatment affects an outcome. In our SAFI example, it is possible the vasopressor influences a patient outcome through a series of processes involving different intermediate stages. One could build and test such a mediation model, but this is beyond the scope of this chapter.

Like the methods above, meditation methods can be illustrated via a so-called "counterfactual" framework for causal inference, which is now widely being used as a tool in formal methodological work in causal inference, statistics, economics, etc. The framework serves as a formal and technical notation to describe the causation. In this section, we will not use the formal notation in the "counterfactual" framework to introduce the mediation analysis. Instead, we will describe the regression-based mediation intuitively, without resorting to the technical notation of the counter- factual ideology. For the interested reader, several texts give more details of the counterfactual framework.

## The Product Method

The product method was greatly influenced by the work of Baron and Kenny (Baron and Kenny 1986). The causal diagram (Figure 7.2) used in (Baron and Kenny 1986) is a minimalist figure that illustrates the relationship between the cause, mediator, and outcome, in which A denotes an exposure variable, M denotes the mediator, and Y denotes the outcome variable. Let assume other baseline variables as C. Consider the case when the outcome and the mediator are continuous. The following regression models could help explain the effect of "with-the-mediator" and "without-the-mediator".

FIGURE 7.2

Mediation model.

01 can be used to access the direct effect which is a measure to see the influence of the cause C on the outcome that is not through the mediator. But here the effect of the mediator and other covariates should be controlled by directly pooling them together into one regression model. It can be understood as the treatment effect on the outcome at a specific level of the mediator. Or one can treat the estimated effect *в* to the outcome as the effect that is directly relative to the intermediate of interest. There will likely be other intermediates or mechanisms that account for other aspects of the effect of the cause on the outcome (T. VanderWeele 2015). The effect of cause on the outcome through the intermediate is the indirect effect. By this definition, *^62* is the estimator to access the indirect effect, which is the coefficient of the exposure the model 7.3 times the coefficient of the mediator in model 7.4.

## The Difference Method

The difference method has been widely applied in epidemiology and the biomedical sciences. We will continue to use two regression models for an explanation. Consider the following regression models:

If the exposure coefficient of model 7.5, *ф _{х},* without the mediator, drops significantly when comparing it with the exposure coefficient of the model

7.4 with the mediator, *6**,* it may indicate the effect of mediation because it seems the mediator M has some impact on the effect of the exposure on the outcome (T. J. VanderWeele 2016). The difference *ф* — 0] is commonly referred to as a mediated or indirect effect. Similar to the product method, the exposure coefficient in the model 7.6, *в,* is interpreted as the direct effect.

For the two methods, when the outcome and mediator are continuous and we employ the linear regression model by ordinary least squares, the estimated direct effect and indirect effect are the same. But note that for other scenarios wherein the model is fitted by logistic regression, the product method and the difference method will generate different estimates for the effects.

## Other Considerations

To ensure the conclusion drawn from the mediation analysis is reliable, four strong assumptions must be satisfied. Basically, all these assumptions are made for the confounding influence. First, it is essential to justify that the confounding factors have been adjusted in the process of mediation analysis. For the details of the assumptions, the reader can refer to (T. VanderWeele 2015, T. J. VanderWeele 2016). Since the assumptions are strong and can be easily disregarded in practice, sensitivity analysis is necessary to examine the robustness of the conclusion in a situation where some of the assumptions are not met. Commonly used sensitivity analysis techniques for mediation can be found in the related literature (Imai, Keele, and Tingley 2010, Flafeman 2011, T. J. VanderWeele 2016).

When considering the interaction between the exposure and a mediator, the estimated direct effect and the estimated indirect effect formulae will change, and also the interpretation will be different. Fortunately, many studies have been done to account for the interaction in mediation analysis. For the interested reader, (T. VanderWeele 2015, T. J. VanderWeele 2016) would be good materials.

In this section, we only talk about mediation analysis in the context of ordinary regression models. For other statistical models, such as the logistic regression model and survival models, the idea of the indirect effect and the direct effect can also be generalizable to these situations. Readers could review the paper (Vanderweele and Vansteelandt 2010) for further explanation.