# Propensity Score Matching

Matching refers to classifying subjects into small groups so that there are both treated and untreated subjects in each group and they are similar in terms of matching variables. Matching on propensity scores closely resembles a block randomized design, where the treatment assignment is random within each subgroup. Generally, propensity score matching is considered advantageous in the following perspectives. (1) It is more robust in the sense that it uses a nonparametric way to balance the covariate distributions between treated and untreated groups, which does not rely on parametric outcome models. (2) It resembles the randomization design, which is easily interpretable to a general audience. (3) It is more objective in the sense that the causal effect inference is conducted only after good matches are established and the outcome variable never enters the matching process. (4) Matching-based sensitivity analysis is well developed to assess the impact of hidden bias.

Ideally, we want to match exactly on every single covariate to remove the observed confounding. But this is often impossible in practice, especially when there are a large number of covariates. The propensity score is considered as a dimension reduction tool because it is a scalar and matching it can remove all biases related to treatment assignment under the ignor- ability assumption (Rosenbaum and Rubin 1983a). To successfully implement matching to infer causal relationship, we need to carefully consider the design and algorithm, assess the covariate balance after matching, select inferential procedures compatible with matching structure, and conduct the sensitivity analysis. We will go over each of them briefly in this subsection (except the sensitivity analysis, which will be covered in the next section). For a more detailed description of matching, readers may refer to Stuart (2010).

## Matching Design

Matching group subjects into well-structured matched sets and such structure may play a role in determining the statistical inference procedure. Each set must contain subjects from both treated and control groups for comparison purposes. Matching design refers to the structure generated from the matching process. There are three general types of design: bipartite matching, non-bipartite matching, and poly-matching (Figure 7.1).

Bipartite matching is the most commonly seen design. It is used when there are two well-defined treatment groups (i.e., treated vs. control), and matching is always conducted between the two groups. If one treated subject is matched to one control subject, this is known as a pair match. If one treated subject is matched to a fixed number of *k(k>* 1) control subjects, often used to improve the efficiency of the estimation, it is known as a 1 *-k* match. A variation to 1 *-k* match is called a variable match, where one treated subject can be matched to multiple controls, but the ratio is not pre-fixed to achieve better matching quality. To further improve the matching quality, a full matching

FIGURE 7.1

Three types of matching designs

design may be considered, which allows one treated subject matched to multiple controls or the other way around (Rosenbaum 1991).

When there are multiple exposure groups or there is no clearly defined two groups, non-bipartite matching is used to form pair structure between any two groups when appropriate. It can create matched pairs with multiple dose-level groups that maximize dose difference within the pair to explore potential treatment effects. It also may be used in longitudinal data matching to find best matches across different timepoints. Interested readers may refer to Lu et al (2011) for further details.

Poly-matching is a new design focusing on creating matched sets with one subject from each group, when there are more than two exposure groups. It provides a simple and clean structure to make statistical inference among multiple treatment groups, as it represents a block randomization design well. An example of matching with three exposure groups, namely triplet matching, can be found in (Nattino et al. 2019).