# Instrumental Variables Networks for Treatment Effect Estimation in the Presence of Unmeasured Confounders

## Instrumental Variables Frameworks

Instrumental variables (IV) framework is a popular class of methods for the estimation of treatment effects in the presence of unmeasured confounders (Usaid Awan et al. 2019). Consider a trio for treatment effect estimation (Figure 7.3). The trio intends to estimate a causal effect of a treatment T on an outcome Y. However, both treatment T and outcome Y can be affected by an observed confounder *U.* To remove the effect of confounder *U,* we use an instrumental variable Z. The only observed variables are T, Y and Z. The following three assumptions are essential to the success of the IV approach.

- 1. The instrument variable Z is associated with the treatment T.
- 2. The instrument variable Z is independent of the outcome Y given the treatment
*T*and all confounders*U,*i.e, Y1LZ | (T,*U).*

3. The instrument variable Z is independent of factors (measured and unmeasured) that confound the treatment-outcome relationship, i.e, ZlL!i.

Any violation of the three assumptions will lead to the failure of the instrument approach. The variation autoencoder (VAE) and principal component methods for dimension reduction can be used for finding instrument variables. However, the instrumental variables are problem- dependent. For example, they are commonly used to estimate exposure- outcome relationships in EHR studies where a proxy that serves as a semi-randomized variable (such as hospital volume) can be discerned from the records. The selection of appropriate instrument variables is key for the success of the treatment effect estimation in the presence of unmeasured confounders and additional details are beyond the scope of this chapter.

## Two-Stage Least Square Methods with Linear Models

The methods for IV analysis include two-stage least squares (2SLS) (Angrist, Graddy, and Imbens 2000), generalized method of moments (Hansen 1982, Bennett, Kallus, and Tobias 2019), the ratio of coefficients method (Burgess, Small, and Thompson 2017), and likelihood-based methods (Hayashi 2000). In this chapter, we focus on the most popular 2SLS method for IV analysis. The 2SLS method consists of two regression stages: the first-stage regression of the treatment variable on the IVs, and the second-stage regression of the potential outcome on the fitted values of the treatment from the first stage.

**Simple Linear Models**

Consider the causal model in Figure 7.3.

FIGURE 7.3

Trio for treatment effect estimation and instrument variable.

where T represents a treatment, У is the outcome, *U* is the confounding variable and an instrument variable G. We assume that the instrument variable G is independent of the confounding variable *U* and is associated with the treatment T. The goal of the IV approach is the estimation of a causal effect of the treatment T on the outcome У.

**Covariance Analysis**

Without loss of generality, we can assume that *Var(G)* = 1 and *Var(T) =* 1. By the assumption that the unconfounding variable *U* and the instrument variable G are independent, we conclude *cov(G, U)* = 0. (Figure 7.3) Taking covariance with G on both sides of equation (7.7) yields

Similarly, taking covariance with G on both sides of equation (7.8), we obtain

*cov(Y, G) =* /Зу,_{т}сор(Т, G), which implies that

Define the regression of У on G:

Taking covariance with G, we obtain

Substituting equations (7.9) and (7.12) into equation (7.10) leads to

The causal effect of the treatment *T* on outcome У is estimated by *fS _{Y}* г

^{=}

*y~r Generalized Least Square Estimator*

Structural equations for modeling IV model in equations (7.7) and (7.8) are

where *Z,* £2 are uncorrelated with *U.* Let

Substituting equations (7.14) and (7.16) into equations (7.13) and (7.14), we obtain

where у, T are endogenous variables (its value is determined or influenced by one or more of the independent variables) and denoted by *W =* [yT], G is an exogenous variable (a variable that is not affected by other variables) and ci, C2 are uncorrelated with G.

Then, Equations (7.17) and (7.18) can be rewritten in a matrix form as

Multiplying by Г^{-1} on both sides of Equation (7.19), we obtain *W* + GBr^{-1} + ЕГ^{-1} = 0, which can be reduced to

where П = -ВГ^{-1}, П = 0 /З74С and V = -ЕГ^{-1}.

Using least square methods to solve the optimization problem (7.20), we obtain

Using the relation ПГ = —B, we obtain (G^{r}G)^{_1}G^{T}Wr = —B, which implies G^{r}WT = *-G ^{T}GB* or

Expanding Equation (7.22), we obtainG^{r}[—У + Т/Зу|_{т} — T]=G^{r}[0 G/3_{T}|_{C}]. Therefore, we have

Equations (7.23) and (7.24) can be rewritten as Equations (7.17) and (7.18) can be rewritten as

, 7 Гт 01 ^ *hT* U

^{where2=}[o cp^{=}[_{/}3_{T|C}J^{and}‘= [« •

Multiplying both sides of Equation (7.26) by *G ^{T},* we obtain
Note that
where

Using weighted least square methods, we obtain the generalized least square estimator:

If we assume Л = *a ^{2}l,* then Equation (7.29) is reduced to

*Two-Stage Least Square Method*

Now we study the relationship between the generalized least square estimator and two-stage least square estimator.

Let

where

Equation (7.31) can be reduced to

Consider two-stage least square estimator:

Stage 1

Solving the regression we obtain

/3_{Г|С} = (G^{t}G)~'G* ^{t}T* and
Stage 2

Solving the regression we obtain

Substituting Equation (7.32) into Equation (7.35), we obtain

which is the same as the Equation (7.30).

**Nonlinear Models for Two-Stage Least Squares Approach**

Here we consider nonlinear models for the 2SLS approach to the treatment effect estimation where covariate variable X is included (Figure 7.4). Introducing the following nonlinear structural equation models with additive latent errors that include the confounders (Mohamed et al. 2019, Hartford et al. 2017).

where У is the outcome variable, 7is the treatment variable, X is a vector of covariates, *g(.,.)* is a nonlinear function, *e* is the error including the unobserved confounders. We assume that E[c] = 0. We also assume that the treatment variable is endogenous and is correlated with the error, i.e., E[c | X, *T]* * 0 and *E[eT*| X] * 0.

Define the potential outcome prediction function (Hartford et al. 2017)

Recall that we assume that the treatment variable is uncorrelated with the error, i.e.,

which implies that

FIGURE 7.4

A general graphical model for IV and treatment effect estimation.

Since *E[eX,* Г] * 0 and is unknown, Equation (7.40) shows that E[Y| T, .v] cannot be directly estimated and cannot be estimated by *It* (*T*, *X).* Two unknown quantities *g(T, X)* and E [c> | *T, X)* require two equations to solve it. Similar to the 2SLS in the linear models, we introduce instrument variables that allow the establishing of two equations. The previous core assumptions for the IV approach need to be modified to take the covariates into consideration. The new three assumptions (Hartford et al. 2017) are

ASSUMPTION 1: (Relevance): The distribution F(T| X, Z) of the treatment T, given X and Z, is not constant in Z, i.e, F(T| X, Z) depends on Z. ASSUMPTION 2: Z1LY| (X, *T, e).*

ASSUMPTION 3: (Unconfounded Instrument): Instrument variable Z is independent of the error, given the covariates X, i.e., ZlLc|X. Assumption implies

It follows from Equation (7.37) that

By definition of the conditional distribution, we obtain

By assumption 3, we obtain

Combining Equations (7.42)—(7.44), we obtain

To solve Equation (7.45) for *lt(T,* X), we solve the following optimization problem:

where *H* is a function space.

To remove the correlation between the treatment and error (confounder), we divide the estimation into two stages. At the first stage, we estimate the conditional distribution *F (T* | X, Z) of the treatment *T,* given by the IV Z and covariates X. Then, after replacing *F(TX, Z)* by F(T|X, Z), we solve the optimization problem (7.46).

Under the linear model discussed at Section 7.4.2:

where assuming that £ [ey | X, Z] = 0, £ [ер | X, Z] = 0.

Under the linear model, we obtain

Substituting Equation (7.48) into Equation (7.49), we obtain

Therefore, we first use least square methods to solve regression problem (7.48), which results in the estimators *t* and *y.* Then, we substitute г and *у *into Equation (7.50). Finally, we substitute Equation (7.50) into the objective function in the optimization problem (7.46) and use the least square methods to solve the resulting optimization problem. The solution to the optimization problem replicates the results in Section 7.4.2.

Now we introduce some principles for solving the optimization problem (7.46) with nonlinear models. Let

where nonlinear function *h* (T, X, *в*) can be implemented by deep neural networks and *в* are parameters in the neural networks. The gradient-based algorithms are often used to solve the optimization problem (7.51). The gradient of *L (в)* is

It can be calculated by Monte Carlo integration. The details are beyond the scope of this chapter. We refer the readers to Mohamed et al. (2019).

The conditional distribution F (T | X„ Z,) can be generated by CGANs (Section 7.2.3.4) where X and Z are condition variables and T is a target variable. The generated *(T, X, Z)* will be substituted into the Equation (7.52) to calculate the gradient After the solution *h* (*T*, *X)* is calculated, we can estimate the treatment effect as follows.