How the inferential statistic process operates in Frequentist statistics
Significance testing is akin to a judicial system where at the outset the defendant is presumed innocent until proven otherwise. This innocence is captured by the null hypothesis, where, with respect to the TMS intervention, it would state that the new TMS intervention for depression is no better than the appropriate placebo in reducing symptomatology (i.e., the new intervention is assumed “innocent” in the sense that it has done nothing and has not had any effect). The jury does not know whether the defendant is innocent or guilty, it only assumes it is innocent. The jury then needs to gather empirical evidence and use it in conjunction with a decision procedure before reaching a verdict. Therefore, to test this initial null hypothesis, empirical studies are performed so that data (i.e., the evidence) are considered and evaluated to reach a verdict.
In our case, this would correspond to conducting empirical studies where samples of data obtained from people treated with the TMS method and from other people receiving the placebo treatment are collected, and the means of the two samples are compared. A procedure is then performed to calculate the probability of obtaining certain results given the null hypothesis and then to decide, on the basis of some decision procedure, whether the TMS treatment is different from the placebo. More formally the aim is to calculate P(data | a theory and a decision procedure).
The question is then when and on what basis the jury decides to declare the TMS intervention guilty of affecting depression on the basis of the evidence being examined? Intuitively, this happens when the observed behaviour is rather unlikely to be seen in innocent people (i.e., innocent people may well display this behaviour, but quite rarely). Hence, if the obtained data are unlikely to occur in samples obtained when the null hypothesis is true, then a guilty verdict is returned. A criterion is then needed to decide when the observed data seem unlikely to occur, given the null hypothesis, to warrant a guilty verdict (i.e., the null hypothesis gets rejected). The convention is that when observed sample data are obtained 5% or less of the time when the null hypothesis is true, then a guilty verdict is returned.
According to the Frequentist approach, to calculate P(data | a theory and a decision procedure), we need to know the long-term relative frequency (i.e., the objective probability) with which certain data could be obtained given a theory and a decision process. This involves running, in principle, an infinite number of identical studies to then calculate the relative frequency with which some data may be obtained given a particular theory. In Frequentist statistics, the theory at the core of significance testing is the null hypothesis (in theory the defendant is innocent unless proven otherwise).