Propensity Score Weighting
Weighting is another popular way for propensity score adjustment. The original idea comes from survey sampling literature, i.e., the Hortivz- Thompson estimator, as we can view the inverse of propensity score as a sampling weight of selecting subjects to treatment. It has been shown that the PATE can be estimated unbiasedly using propensity score weighting:
For treated subjects, the weight is the inverse of the propensity score and for control subjects, the weight is the inverse of one-minus-the-propensity-score. The use of weights is to create a pseudo-population in which all covariates are balanced at the population level. Then confounding is no longer an issue in causal effect estimation.
From a practical perspective, the propensity score is not known in advance and researchers must estimate it. Whether using parametric or nonparametric methods, there is always a chance that the propensity score is not estimated correctly. This is less an issue for matching because the estimated propensity score itself does not go into the estimator. As long as the matching obtains desirable balance, it should be fine for the design purpose, but the situation is different for weighting, as the estimated propensity score is part of the estimator. A mis-specified propensity score model will introduce substantial bias. To overcome this, researchers have developed a so-called “doubly robust" estimation strategy, which combines the propensity score and a regression model to improve the performance (Bang and Robins 2005). A regression model on the outcome is introduced to guard against potential mis-specifications of the propensity score model. Such a model is known as a structural model to differentiate from the conventional regression model, because it models the potential outcomes rather than the observed ones. A common type of structural model is the marginal structural model, which focuses on the marginal causal effect (Robins et al. 2000). The advantage of the doubly robust estimation strategy is that it has two chances to get a consistent estimate of the causal effect: (1) when the outcome model is incorrectly specified but the propensity score model is correctly specified or (2) when the propensity score model is incorrectly specified but the outcome model is correctly specified. When both models are correct, it yields an efficient causal effect estimator (Tan 2007).
Propensity Score Stratification
Stratification refers to the procedure that groups subjects by similar propensity score values, which follows the idea of block randomization. If the propensity score values are pretty homogeneous within each stratum, we can effectively remove confounding bias by first doing stratum-level causal effect estimation, then combining across all strata. It is easier to implement than matching or weighting, as it does not require special algorithms or statistical computational procedures to handle weights. Depending on the sample size, researchers may divide the data into 5 or 10 groups (or even more if the sample size allows). One key step is to make sure that there are enough treated and control subjects within each stratum to warrant a valid estimation of stratum-specific causal effect.
One major issue is that propensity score stratification is often not good enough to remove confounding by itself. Just grouping subjects into a few strata may lead to residual confounding within each stratum. So additional covariate adjustment is highly recommended (Imbens and Rubin 2015).
Overall, all three strategies have seen ample applications. Matching follows the randomization design principle more closely, which is especially good for non-model-based inference. Weighting can be combined with models more naturally, which also enjoys the doubly robust property. Stratification can be viewed as a coarsened matching design or a special weighting strategy. Its main advantage is simplicity, which may be used by those with limited statistical background.