Experimental and quasi-experimental research designs in economics
In this section, I analyze four case studies of research methods that allow for uncovering relations invariant under interventions. The number of economic studies that use experimental or quasi-experimental research design (designbased econometrics) is rapidly growing (Hamermesh 2013; Meyer 1995). The following two studies exemplify quasi-experimental research design. Doyle’s (2007) study of the influence of foster care on children’s wellbeing and income uses instrumental-variable design to construct quasi-experiment (Section 6.2.1). Pop-Eleches’ (2006) analysis of the introduction of abortion ban in Romania employs regression-discontinuity design to study its influence on children’s socioeconomic wellbeing. The latter two studies instantiate two experimental approaches. Hussain et al. (2008) conducted a laboratory financial market experiment to study how the experience of market participants influences the emergence of bubbles. Finally, I analyze the case of the gold standard of causal evidence. Dupas and Robinson (2013) conducted randomized field experiment to address the question of why saving propensity among the poor is low. In the following Section 6.3,1 analyze what types of policymaking are justified by this evidence and deal with the problem of extrapolation.
Instrumental variable (IV) estimation as a quasi-experiment
Joseph Doyle (2007) employed the instrumental-variable estimation to assess the effects of foster care on children’s wellbeing. On the basis of the study, Doyle (2007, p. 1583) concluded that “the results suggest that children on the margin of placement tend to have better outcomes when they remain at home, especially older children.” In contrary to other research methods, economists using quasi-experimental research design do not restrain from explicit causal language: the discussion of causal effects (not associations or determinants as in the case of some econometric studies, cf. Chapters2—3) is present throughout the whole paper.
Before Doyle’s study, the knowledge of the effects of foster care on children’s wellbeing, health, and income had been limited. Direct comparisons of wellbeing of children experiencing foster care and those growing up in their families lead to misleading results, since the children taken from their homes (due to family problems) are at risk of experiencing socioeconomic difficulties in adulthood. Therefore, direct comparison of wellbeing of children growing up in their families and foster care is impossible. This is the case due to the problem of endogeneity and selection bias. Being placed in foster care is caused by children’s characteristics;i.e.,behavior and family background that also influence their wellbeing in adulthood. Furthermore, Doyle (2007) pointed out that no long-term data are accessible for the children who have experienced being investigated for either abuse or neglect but have remained with their families.
To estimate the effects of foster care on children, Doyle (2007) used the feature of the foster care system that provides quasi-randomization. Specifically, child protection investigators are assigned to children on rotational basis in a way that equalizes workload for each investigator. Furthermore, the decisions regarding taking children from family and assigning them to foster care depend on investigator’s commonsense judgment rather than explicit rules (p. 1588), and therefore they differ considerably. Some investigators of child abuse cases tend to place children in foster care, while others decide to leave children with their biological parents. The quasi-random assignment of cases (children) to investigators allows for estimating the difference in outcomes (local average treatment effect, or LATE).
Given that LATE is the difference in outcomes for children who have received the treatment and would not have received the treatment even though they are eligible (e.g., abused sufficiently to justify sending to foster care), the definition of causality accepted implicitly by Doyle seem to be connected with a coun-terfactual formulation of manipulationist theory. To proceed, Doyle (2007) constructed an instrumental variable denoting the tendency of an investigator to place children in foster care. The tendency is defined as a ration of children placed in foster care divided onto the average ratio for all investigators. The IV is (statistically) independent from the characteristics of children that has been tested by a linear regression model. This allows for estimating marginal treatment effect (MTE). MTE denotes the benefit (or harm) from treatment for the individuals being precisely at the threshold of treatment (putting into the foster care). This, on the contrary, indicates that the study allows uncovering what action brings about the effect of improved children’s wellbeing. As Doyle (2007, p. 1589) put it, “the results will consider the effect of assignment to different types of case managers, categorized by their rate of foster care placement, on long-term child outcomes.”
What is vital for the policy implications of Doyle’s (2007) quasi-experimental study is that the IV estimation allowed for estimating marginal treatment effect; i.e., the effect of being placed in foster care for children who are at the border between being left at their family home and being taken out. While the author conducted a robustness check that allows for concluding that this result is
Interventions and manipulability 179 representative for a larger group of children (those who are not precisely at the limit), the effect may be different for those who are, according to contemporary standards, unlikely to be put in foster care. On the other hand, those unfortunate cases who strongly suffer in their familial homes may benefit from foster care (despite the estimated negative effect of the intervention) due to the severe implications of staying at home (e.g., homicide, serious physical abuse, etc.). Furthermore, the study used data of all children investigated for abuse in Illinois in chosen years and therefore allowed for estimating the effects of intervention (locating in foster care) that are representative for that population. Doyle (2007) observed some geographical heterogeneity in the sample (children in some counties are more likely to be put in foster care), suggesting that the population is not homogenous and this may undermine extrapolating this result to other states (cf. Section 6.3).
The relata of Doyle’s causal claim are variables. Two types of variables occur. First, digital variables could be interpreted to represent either events (such as becoming teen mother) or features of instances (e.g., being Hispanic or white). Second, variables stand for measurable values (e.g., income). Overall, the variables denote attributes of instances or characteristics of children and investigators, and hence, the relata seem to be features of phenomena.
The trust in the causal claim to be implementation neutral depends on whether the instrument allows for ‘as-if’ random allocation between control and treatment group. As Joshua Angrist et al. (1996, p. 454) put it, “the strong assumptions [are] required for a causal interpretation of IV estimates”. If these assumptions are not fulfilled, then the usual criticism of econometric models applies (cf. Section 3.3). However, given the successful quasi-randomization, Doyle’s (2007) evidence justifies the belief that the relation is invariant under intervention.
Natural experiments: regression discontinuity design
While the IV estimation, exemplified previously, is a popular method of analyzing cross-sectional data, time series analysis has been dominated by regressiondiscontinuity design (RDD). This approach is based on the comparison of data from periods just before and just after an intervention. The difference between the two groups mimicking control and treatment groups is considered as resulting from the intervention under consideration. A slightly different approach would be to measure differences in trend before and after interventions, what is known as difference-in-differences design. Even though these quasi-experimental studies are representative for design studies in macroeconomics, I have chosen the paper of Cristian Pop-Eleches (2006) focusing on the influence of an abortion ban on children’s educational attainment and labor market outcomes because this study is an excellent example of how quasi-experimental research design can suffer from confounding.
Pop-Eleches (2006) studied the influence of abortion ban introduced in Romania in 1966 on children’s economic success and educational attrition. Hismain conclusion is that “children born after the ban on abortions had worse educational and labor market achievements as adults” (p. 744). However, a straightforward application of the regression discontinuity design, based on the comparison of pre-intervention and post-intervention samples (children born in the period of a few months before and after the ban was introduced) leads to the opposite conclusion. Such a simple analysis relies on estimating the following linear regression (Pop-Eleches 2006, p. 753):
OUTCOME, = digital variables measuring educational attrition and labor market outcomes for adults
a0 = the probability of succeeding for children not affected by abortion ban Q| = the influence of being born after the ban on the probability of succeeding for children
after- = i-th children affected by the abortion ban (digital variable)
£j = the error term
Surprisingly, such a simple design of the natural experiment led to obtaining results opposite of the expected outcomes of the abortion ban. The estimated values (ct0 > cq) showed that, on average, children after the abortion ban were more likely to have better education and higher position on the labor market. This result opposes previous studies of the influence of liberalization of abortion law in United States and other countries that have reported the improvement of children’s situation after the increase of the number of abortions (e.g., Levine et al. 1996; Koch et al. 2012). However, adding additional explanatory variables to the regression (i.e., observable characteristics of children such as familial and economic background) leads to the reversal of the preliminary finding.
X( = vector of variables characterizing i-th child’s background
/?„ = constant
/?! = the influence of being born after the ban on the probability of succeeding for children
(32 = the influence of child characteristics on the probability of success
When the regression controlling for confounders is estimated, the abortion ban has negative and not positive effect; i.e.,/? < /?r Pop-Eleches (2006) explains the reversal of the average treatment effect by the fact that educated women are the group most affected by the ban (because abortions before 1966 had been most frequent among the members of this group). Given that children born by higher-educated women have a higher chance of finishing their education, and
Interventions and manipulability 181 unequal influence of abortion ban on different social groups, the result of the simple regression (not controlling for confounding) delivers results contrary to the actual influence of the abortion ban on socioeconomic factors.
The results of Pop-Eleches (2006) show how the use of quasi-experimental research design (RDD, in this case) may result in spurious results in cases when there are confounders that bias the estimate and create non-random assignments to treatment and control groups. What follows, ascribing an observed effect of an intervention to it, requires careful examination of whether any confounding factors can be present. If either there are no confounders or their effects are taken into account, then regression-discontinuity design allows for uncovering relation that is invariant under intervention. Taking into account that the abortion ban is a human action and the effects follow from that actions, a version of Menzies and Price’s (1993) agency theory of causality generalized into a probabilistic context seems to be a good candidate for the views implicitly accepted by the economists using RDD.
However, the design only allows for uncovering average treatment effect. Similar to the previous example, predicting the effects of the intervention for each potentially affected child is impossible. This gives a hint that economists using quasi-experimental research design accept a version of manipulationist definition focusing on type-level relation.
Furthermore, the analysis of the effects of the abortion ban indicates that even evidence supporting manipulationist claims cannot be extrapolated into other contexts without caution because other causal factors can play a role. In the case of abortion legislation, the opposite effects can be observed for the American and Romanian populations, as long as one does not control for confounders. The liberalization of abortion law in the United States reduced crime rates and had a positive impact on the socioeconomic outcomes of affected children because the procedure is mostly used by mothers who have social and economic difficulties themselves. On the contrary, the Romanian ban on abortion affected mostly mothers living in cities.
Introducing statistical control of confounders requires knowledge of the factors that influence effects of intervention. Given that our knowledge of other important factors may be limited (some confounders stay unknown) or fallacious (controlling for ‘confounders’ that are not causally related, but only correlated to outcomes, may lead to spurious causal claims), quasi-experimental research designs deliver results less reliable than the gold standard of causal inference. This is the case, because randomization allows, at least in principle, for construction of control and treatment groups influenced by confounders in the same way so that the difference can only result from the intervention.
According to orthodox neoclassical economics, prices on financial markets reflect the fundamental value of assets. While this view has faced considerable criticism, especially after the 2007—2008 financial crisis (e.g., Krugman 2009),
it persists despite the presence of contradicting empirical evidence. One of the phenomena in disagreement with market efficiency is bubbles. The volatility of stock prices cannot be explained by changes in real values such as predicted stock returns (West 1988; Vogel and Werner 2015), and this evidence indicates that bubbles do happen. Unfortunately, high uncertainty of financial predictions and epistemic inaccessibility of the decision-making process of market participants make it impossible to definitively describe a price trend as a bubble before it bursts. What follows, studying the development of bubbling markets in the world, has serious epistemic limitations.
For this reason, economists construct asset markets in a laboratory and hope that changing the conditions under which market participants undertake their decisions and observing effects will shed light on how these markets work and why they diverge from the ideal of efficient market. The study of Reshmaan Hussam et al. (2008) is a representative example of this type of laboratory experiments. Game-theoretic experiments are another common type of laboratory experiments (Maziarz 2018). In case of this type of laboratory experiments, economists construct game settings (e.g., the prisoner’s dilemma or the ultimatum game) to test predictions of the rational expectations model and observe how actual decisions diverge from this ideal.
Hussam et al. (2008) conducted a series of laboratory market experiments aimed at assessing how the learning and experience of market participants influence price bubbles. Similar to the quasi-experimental studies considered previously, the analysis entails explicitly causal talk throughout the paper. The two causal claims formulated by the authors state that “[experience reduces the amplitude of a bubble significantly” and “[e]xperience significantly reduced the duration of a bubble” (Hussam et al. 2008, pp. 933-934).
To obtain these results, the economists conducted the canonical asset market experiments on a group of undergraduate students. This involved an asset that lasts for 15 trading sessions and pays a random dividend at each session (the distribution of dividends is constant throughout all 15 sessions). Hence, the fundamental value of the asset should start at the value equal to 15 * expected value of dividend and deteriorate linearly. At the start of experiments, participants in a given round received one of three portfolios including cash and the asset in different proportions. In order to manipulate the level of experience of the participants, Hussam et al. (2008) divided participating students into two groups. The first group took part in two 15-session rounds. These were learning sessions aimed at obtaining experienced students, and their results were not taken into account.
Later, these experienced students were randomly divided so that they participated in trading sessions with either inexperienced students or students after one or two rounds. Furthermore, Hussam et al. (2008) modified the variance of dividend and the cash value owned by participants (rekindle treatment). These two interventions have been recognized in other experiments to promote bubbles (e.g., Lei et al. 2001). The purpose of these modifications was to test for the robustness of the results. To assess the degree to which asset markets in
Interventions and manipulability 183 each round bubbled, Hussam et al. (2008, p. 932) developed a few measures of bubbling such as amplitude of prices and duration of periods when the asset was traded at price levels different from its fundamental value. The results indicated that the experience of experimentees has a negative effect on bubbles, reducing their amplitude and duration. The effect of experience and the rekindle treatment was similar in size. This sheds light on the fact that bubbles can appear despite the experience of traders because of changes in market environment such as dividend payoffs.
What does it mean that the experience reduces bubbles on the asset market? What is the definition of causality presupposed by Hussam et al. (2008)? Reiss (2019, pp. 3113—3114, emphasis in original) argued that “[a]n ideal randomised trial implements an ideal intervention” but “no real randomised experiment implements an ideal intervention.” The reasons, according to Reiss, are unequally distributed confounders among treatment and control groups, failure at blinding, and different dropout rates between treatment and control groups. Donald Gillies (2018, Appendix 1) extended the list by arguing that treatment interventions affect outcomes by other paths than by the variable directly targeted (e.g., by the placebo effect that influences measured outcome; i.e., disease but not by the mechanism targeted by a drug under test). The interventions in laboratory experiments also do not fulfill Woodward’s highly technical definition of intervention. Let me exemplify this view with the case of Hussam et al.’s (2008) modifications of the experience level of experimentees. Specifically, Woodward (2005 , p. 98) requires the intervention I to act “as a switch for all the other variables that cause C.” In other words, intervention on the level of experience should screen off all other factors shaping the asset market experience of students. This condition is definitely not fulfilled. Obviously, students may have gained some experience from participating in Finance 101, investing their savings in financial markets, or merely reading about canonical asset market experiments that are a standard setting for studying market (in)efficiency in laboratory settings. Therefore, Hussam et al.’s (2008) action shaping the level of experience of student participants is not ‘intervention’ in the Woodwardian sense.
However, what the laboratory market experiment allows for discovering is the relation between the treatments (modifying the level of experience and the rekindle treatment) and outcomes of interest. In other words, the laboratory asset market allowed to infer that raising the level of experience of market participants reduces the magnitude and duration of market bubbles. Given this and considering the treatments in laboratory experiments in economics (and, in fact, all other experimental sciences) lie within the scope of human capabilities, then Menzies and Price’s (1993) agency theory delivers a good candidate for the definition presupposed by experimenters. The relata are variables standing for features of phenomena. Under the action of the experimenters, some features (level of experience of market participants, dividend distribution) are modified, and these changes lead to changes in the propensity of the market to bubble. These actions are meant to bring about more efficient market pricing of the asset.
Unfortunately, causal claims are true only within the laboratory. The epis-temic situation of these claims can be compared to theoretical models. In Chapter 5,1 argued that mechanistic knowledge of one mechanism is insufficient to predict the effects of interventions. Given that theoretical models of economic mechanisms isolate (and idealize) single mechanisms, and that there are many mechanisms operating in the world at the same space and time, predictions based on a model of one mechanism, despite being true within the model world, will prove false in the actual reality due to the presence of external influences. This view, if correct, makes testing the accuracy of economic models of mechanisms by comparison of predictions to econometric results impossible (cf. Maziarz 2019). However, the example of Hussam et al.’s (2008) laboratory experiment shows that the verification of the accuracy of mechanistic models can proceed by constructing artificial settings in the laboratory. In this way, economists can construct artificial laboratory market as it is described by the assumptions of a theoretical model and test whether the decisions undertaken by economic agents are in agreement with the predictions deduced from the model under consideration. The similarity of laboratory experiments and theoretical models also implies that the 'experimental closure’ (the isolation of one mechanism from external factors) have severe and detrimental effects on the use of laboratory results for policymaking (cf. Section 6.3).
Randomized field experiments
Randomized field experiments are not susceptible to this criticism because they test whether a treatment brings about an expected outcome in the field and not under the experimental closure. While the topics studied by means of randomized field experiments range from the topics strictly located within the scope of economics such as the effects of basic income on the behavior of recipients on labor markets to the intersection of economics and medicine (e.g.,the effects of anti-mosquito bed nets), the majority of studies using this design aim at testing the effectiveness of some policies improving economic growth. Experiments are usually conducted in developing countries. This practice allows for reducing costs of experimentation, but also limits the chance of extrapolating the results (cf. Section 6.3). The field experiment of Pascaline Dupas and Jonathan Robinson (2013) is a representative example of this type of research.
The trial of Dupas and Robinson (2013, p. 163) concludes that “simple informal savings technologies can substantially increase investment in preventive health and reduce vulnerability to health shocks.” Contrary to clinical trials in medicine, which randomly allocate participants among treatment and control groups, Dupas and Robinson randomized local savings organizations (rotating savings and credit associations, or ROSCA) in one of the Kenyan administrative regions. These organizations have been divided into five groups, four of which received different treatments and one became a control group.14 The four treatment groups received the following treatments. Participants belonging to the first group received a metal box (a piggy bank) delivering space to keep money
Interventions and manipulability 185 at home in a considerably safe place. The second group received the same piggy bank box, but were refused the key to it so that the money could be accessed only after exceeding their saving goal. The third group saved money to a ‘health pot’ managed by ROSCA that bought health products on behalf of (randomly chosen) recipients of the funding. The fourth group got access to a health savings account. All the treatments proved effective and raised the ratio of people having health savings from over 60% to over 90% in six- and 12-month periods.
Dupas and Robinson (2013) had also considered treatment ‘technology’ and related them to features such as access to storage or social commitment. These allowed them to study econometrically the influence of participants’ characteristics with the view to relate the effectiveness of different treatments to these characteristics. Given that this aspect of Dupas and Robinson’s (2013) study heavily overlaps with observational research, I exclude it from the analysis. Therefore, the main conclusion of the experimental design is that delivering tools such as saving boxes or savings accounts raises poor peoples’ propensity to save.
This conclusion is a type-level causal claim. It allows inferring that introducing one of the treatments will raise saving propensity of targeted population (under the assumption that the population sufficiently resembles the people living in Kenyan rural areas), but it does not allow the inference that each recipient of the treatment will benefit from it. As in the previous cases of experimental research, the relata of the causal claim are variables standing for features of phenomena. The treatment used by Dupas and Robinson (2013) is in disagreement with Woodward’s (2005120031, p. 98) third axiom describing the ideal intervention I. According to Woodward, interventions should influence the effect E by no other path than C. The discussion of the role of health savings, applied to both control and treatment groups, may, at least in principle, affect E; i.e., the health savings propensity of the poor without changing C. Given this, the experimental design of Dupas and Robinson (2013) does not allow for uncovering the results of the interventions in the Woodwordian sense, but it nevertheless allows for discovering that delivering saving tools is an effective means to raise the health savings level among the poor in rural Kenya, and therefore Menzies and Price’s (1993) formulation of agency theory can be indicated as a concept of causality implicitly presupposed by the authors. Ahead, I address the questions of whether the relation invariant under intervention can be applied to other settings stays open.