 # Instrumental Variables Networks for Treatment Effect Estimation in the Presence of Unmeasured Confounders

## Instrumental Variables Frameworks

Instrumental variables (IV) framework is a popular class of methods for the estimation of treatment effects in the presence of unmeasured confounders (Usaid Awan et al. 2019). Consider a trio for treatment effect estimation (Figure 7.3). The trio intends to estimate a causal effect of a treatment T on an outcome Y. However, both treatment T and outcome Y can be affected by an observed confounder U. To remove the effect of confounder U, we use an instrumental variable Z. The only observed variables are T, Y and Z. The following three assumptions are essential to the success of the IV approach.

• 1. The instrument variable Z is associated with the treatment T.
• 2. The instrument variable Z is independent of the outcome Y given the treatment T and all confounders U, i.e, Y1LZ | (T, U).

3. The instrument variable Z is independent of factors (measured and unmeasured) that confound the treatment-outcome relationship, i.e, ZlL!i.

Any violation of the three assumptions will lead to the failure of the instrument approach. The variation autoencoder (VAE) and principal component methods for dimension reduction can be used for finding instrument variables. However, the instrumental variables are problem- dependent. For example, they are commonly used to estimate exposure- outcome relationships in EHR studies where a proxy that serves as a semi-randomized variable (such as hospital volume) can be discerned from the records. The selection of appropriate instrument variables is key for the success of the treatment effect estimation in the presence of unmeasured confounders and additional details are beyond the scope of this chapter.

## Two-Stage Least Square Methods with Linear Models

The methods for IV analysis include two-stage least squares (2SLS) (Angrist, Graddy, and Imbens 2000), generalized method of moments (Hansen 1982, Bennett, Kallus, and Tobias 2019), the ratio of coefficients method (Burgess, Small, and Thompson 2017), and likelihood-based methods (Hayashi 2000). In this chapter, we focus on the most popular 2SLS method for IV analysis. The 2SLS method consists of two regression stages: the first-stage regression of the treatment variable on the IVs, and the second-stage regression of the potential outcome on the fitted values of the treatment from the first stage.

Simple Linear Models

Consider the causal model in Figure 7.3.  FIGURE 7.3

Trio for treatment effect estimation and instrument variable. where T represents a treatment, У is the outcome, U is the confounding variable and an instrument variable G. We assume that the instrument variable G is independent of the confounding variable U and is associated with the treatment T. The goal of the IV approach is the estimation of a causal effect of the treatment T on the outcome У.

Covariance Analysis

Without loss of generality, we can assume that Var(G) = 1 and Var(T) = 1. By the assumption that the unconfounding variable U and the instrument variable G are independent, we conclude cov(G, U) = 0. (Figure 7.3) Taking covariance with G on both sides of equation (7.7) yields Similarly, taking covariance with G on both sides of equation (7.8), we obtain

cov(Y, G) = /Зу,тсор(Т, G), which implies that Define the regression of У on G: Taking covariance with G, we obtain Substituting equations (7.9) and (7.12) into equation (7.10) leads to The causal effect of the treatment T on outcome У is estimated by fSY г = y~r Generalized Least Square Estimator

Structural equations for modeling IV model in equations (7.7) and (7.8) are where Z, £2 are uncorrelated with U. Let Substituting equations (7.14) and (7.16) into equations (7.13) and (7.14), we obtain where у, T are endogenous variables (its value is determined or influenced by one or more of the independent variables) and denoted by W = [yT], G is an exogenous variable (a variable that is not affected by other variables) and ci, C2 are uncorrelated with G. Then, Equations (7.17) and (7.18) can be rewritten in a matrix form as Multiplying by Г-1 on both sides of Equation (7.19), we obtain W + GBr-1 + ЕГ-1 = 0, which can be reduced to where П = -ВГ-1, П = 0 /З74С and V = -ЕГ-1.

Using least square methods to solve the optimization problem (7.20), we obtain Using the relation ПГ = —B, we obtain (GrG)_1GTWr = —B, which implies GrWT = -GTGB or Expanding Equation (7.22), we obtainGr[—У + Т/Зу|т — T]=Gr[0 G/3T|C]. Therefore, we have Equations (7.23) and (7.24) can be rewritten as Equations (7.17) and (7.18) can be rewritten as , 7 Гт 01 ^ hT U

where2=[o cp=[/3T|CJand‘= [« •

Multiplying both sides of Equation (7.26) by GT, we obtain Note that where Using weighted least square methods, we obtain the generalized least square estimator: If we assume Л = a2l, then Equation (7.29) is reduced to Two-Stage Least Square Method

Now we study the relationship between the generalized least square estimator and two-stage least square estimator.

Let where Equation (7.31) can be reduced to Consider two-stage least square estimator:

Stage 1

Solving the regression we obtain

/3Г|С = (GtG)~'GtT and Stage 2

Solving the regression we obtain Substituting Equation (7.32) into Equation (7.35), we obtain which is the same as the Equation (7.30).

Nonlinear Models for Two-Stage Least Squares Approach

Here we consider nonlinear models for the 2SLS approach to the treatment effect estimation where covariate variable X is included (Figure 7.4). Introducing the following nonlinear structural equation models with additive latent errors that include the confounders (Mohamed et al. 2019, Hartford et al. 2017). where У is the outcome variable, 7is the treatment variable, X is a vector of covariates, g(.,.) is a nonlinear function, e is the error including the unobserved confounders. We assume that E[c] = 0. We also assume that the treatment variable is endogenous and is correlated with the error, i.e., E[c | X, T] * 0 and E[eT| X] * 0.

Define the potential outcome prediction function (Hartford et al. 2017) Recall that we assume that the treatment variable is uncorrelated with the error, i.e., which implies that FIGURE 7.4

A general graphical model for IV and treatment effect estimation. Since E[eX, Г] * 0 and is unknown, Equation (7.40) shows that E[Y| T, .v] cannot be directly estimated and cannot be estimated by It (T, X). Two unknown quantities g(T, X) and E [c> | T, X) require two equations to solve it. Similar to the 2SLS in the linear models, we introduce instrument variables that allow the establishing of two equations. The previous core assumptions for the IV approach need to be modified to take the covariates into consideration. The new three assumptions (Hartford et al. 2017) are

ASSUMPTION 1: (Relevance): The distribution F(T| X, Z) of the treatment T, given X and Z, is not constant in Z, i.e, F(T| X, Z) depends on Z. ASSUMPTION 2: Z1LY| (X, T, e).

ASSUMPTION 3: (Unconfounded Instrument): Instrument variable Z is independent of the error, given the covariates X, i.e., ZlLc|X. Assumption implies It follows from Equation (7.37) that By definition of the conditional distribution, we obtain By assumption 3, we obtain Combining Equations (7.42)—(7.44), we obtain To solve Equation (7.45) for lt(T, X), we solve the following optimization problem: where H is a function space.

To remove the correlation between the treatment and error (confounder), we divide the estimation into two stages. At the first stage, we estimate the conditional distribution F (T | X, Z) of the treatment T, given by the IV Z and covariates X. Then, after replacing F(TX, Z) by F(T|X, Z), we solve the optimization problem (7.46).

Under the linear model discussed at Section 7.4.2: where assuming that £ [ey | X, Z] = 0, £ [ер | X, Z] = 0.

Under the linear model, we obtain Substituting Equation (7.48) into Equation (7.49), we obtain Therefore, we first use least square methods to solve regression problem (7.48), which results in the estimators t and y. Then, we substitute г and у into Equation (7.50). Finally, we substitute Equation (7.50) into the objective function in the optimization problem (7.46) and use the least square methods to solve the resulting optimization problem. The solution to the optimization problem replicates the results in Section 7.4.2.

Now we introduce some principles for solving the optimization problem (7.46) with nonlinear models. Let where nonlinear function h (T, X, в) can be implemented by deep neural networks and в are parameters in the neural networks. The gradient-based algorithms are often used to solve the optimization problem (7.51). The gradient of L (в) is It can be calculated by Monte Carlo integration. The details are beyond the scope of this chapter. We refer the readers to Mohamed et al. (2019).

The conditional distribution F (T | X„ Z,) can be generated by CGANs (Section 7.2.3.4) where X and Z are condition variables and T is a target variable. The generated (T, X, Z) will be substituted into the Equation (7.52) to calculate the gradient After the solution h (T, X) is calculated, we can estimate the treatment effect as follows. 