Mechanics
Applying the chisquare test is quite simple. The following paragraphs outline the process, which when followed, will provide the user a template for practical applications of this statistical method.
Types of Data
The chisquare test is a nonparametric test, also called a distribution free test. Nonparametric tests can be used when any one of the following conditions pertains to the data (McHugh, 2013):
 1. The level of measurement of all the variables is nominal or ordinal.
 2. The original data were measured at an interval or ratio scale, but violate one of the following assumptions of a parametric test:
a. The distribution of the data is seriously skewed, violating the assumption that the dependent variable is approximately normally distributed.
b. The data violate the assumption of equal variance or homoscedasticity.
c. For any of a number of reasons, the continuous data were collapsed into a small number of categories, and thus the data are no longer interval or ratio. For example, number of vehicles in a household could be used in a chisquare test if you categorize them into groups (e.g., no vehicle or any number of vehicles).
Assumptions
As with any statistic, there are requirements for its appropriate use, which are called assumptions of the statistic. The assumptions of the chisquare include:
1. The data in the cells are frequencies, or counts of cases. Categorical data may be displayed in a contingency table. For example, Table 8.1 shows counts of cases regarding two variables—trip modes and housing types.
Table 8.1 Contingency Table
Counts of survey respondents 
housing type ( 1single family detached; Oothers) 
Total 

0 
1 

mode: 1walk, 2bike, 3tiansit, 4auto, Oothers 
0 
591 
3,303 
3,894 
1 
1,979 
6,424 
8,403 

2 
272 
1,470 
1,742 

3 
963 
1,424 
2,387 

4 
14,087 
101,856 
115,943 

Total 
17,892 
114,477 
132,369 
 2. The levels (or categories) of the variables are mutually exclusive. That is, a particular subject fits into one and only one level of each of the variables. You can’t ride both transit and automobile at the same time. In Table 8.1, if you asked respondents “what is your means of transportation to go to school or work?" and allow them to select multiple modes used in an entire trip (e.g., both walk and transit), you cannot apply the chisquare test for figuring out the statistical differences among different modes.
 3. Each subject may contribute data to one and only one cell in the %2. If, for example, the same subjects are tested over time such that the comparisons are of the same subjects at Time 1, Time 2, Time 3, etc., then %2 may not be used. This kind of data is called paired samples.
 4. The expected value of the cell should be five or more in at least 80 percent of the cells, and no cell should have an expected value of less than one (Yates, Moore, & McCabe, 1999). This assumption is most likely to be met if the sample size equals at least the number of cells multiplied by five. You will see what the expected value means later.
Hypothesis in ChiSquare Test
In this step, we will create both a null hypothesis and an alternative hypothesis. The null hypothesis is that there is no difference in the proportion of occurrences in each category'. So the variables are unrelated in any way. On the contrary, the alternative hypothesis, or the research hypothesis, is that there is a difference in the proportion of occurrences in each category'. If you are interested in the relationship between two variables, you can word the alternative hypothesis that there is a statistically significant relationship between the variables.
Going back to our example, our null hypothesis would be that there is no difference in the proportion of occurrences in each category and so two variables—trip modes and housing types—are not related. Then, our alternative hypothesis is the exact opposite, which is that the relationship between trip modes and housing types is statistically significant. If the /»value is lower than 0.05, it indicates that there is less than a 5 percent chance that the values of each category' are randomly distributed. In other words, there is a difference in the proportion of occurrences in each category. In this case, the null hypothesis is rejected and the alternative hypothesis is accepted as true.
Calculate the Test Statistic
The chisquare test statistic is obtained by contrasting the observed frequencies with the expected frequencies. The expected frequencies represent the number of observations that would be found in each cell if the null hypothesis were true or, in other words, the categorical variables were unrelated. The chisquare equation is shown here.
In this equation, is chisquare, o is the observed frequency of each category and e is the expected frequency, or the number of observations that would be found if the null hypothesis were true.
In this equation for the expected value, M_{K} represents row marginal values, or the sum of each row, and M_{c} represents column marginal values, or the sum of each column while n is the total sample size. For cell 1 in Table 8.1, the math is as follows: (17,892 * 3,894) / 132,369 = 526.3. Table 8.2 provides the results of this calculation for each cell.
Once the expected values have been calculated, the cell /2 values are calculated with the formula. Then they are summed to obtain the %2 statistic for the table.
Why square the difference between observed and expected frequencies? It is to get rid of the minus signs and provide a set of measures whose sum will reflect the aggregate degree of difference that actually exists between the observed and expected patterns of frequencies. So a large value of the x^{2} statistic would not support the null hypothesis and thus lead to its rejection. Otherwise, a small value of the x^{2} statistic
Table 8.2 Observed Versus Expected Counts
housing type: 1singlefamily detached; 0others 
Total 

0 
1 

mode: 1walk, 2bike, 3 transit, 4 auto, 0 others 
0 Count 
591 
3,303 
3,894 
Expected Count 
526.3 
3,367.7 
3,894.0 

1 Count 
1,979 
6,424 
8,403 

Expected Count 
1,135.8 
7,267.2 
8,403.0 

2 Count 
272 
1,470 
1,742 

Expected Count 
235.5 
1,506.5 
1,742.0 

3 Count 
963 
1,424 
2,387 

Expected Count 
322.6 
2,064.4 
2,387.0 

4 Count 
14,087 
101,856 
115,943 

Expected Count 
15,671.7 
100,271.3 
115,943.0 

Total 
Count 
17,892 
114,477 
132,369 
Expected Count 
17,892.0 
114,477.0 
132,369.0 
indicates a high probability' of the result occurring by chance and you can conclude that no association between two variables exists.
In the end, you have computed a chisquare test statistic of a certain value. This test statistic will be compared to a critical value of chisquare to determine a level of statistical significance. Figure 8.1 shows chisquare frequency distributions for different degrees of freedom (df). The area under these curves to the right of each value represents the probability that you might get a value of chisquare this great or greater by random chance. You can see, for example, that for 2 or 3 df, you will almost never get a chisquare value greater than 10 by chance, while for 10 df, you may' get a value of chisquare greater than 10 by chance more than 5 percent of the time. For 5 df, it is not obvious how much area is to the right of the critical value of 10, and you would have to consult a chisquare table to determine the level of statistical significance. In fact, the critical value of chisquare at the 0.05 significance level is 11.07. If you computed a chisquare value of 10, you could not reject the null hypothesis since you could get a value this great by chance more than 5 percent of time.
Determine the Degrees of Freedom and a Critical Value
We are getting close to drawing some conclusions; however, we cannot interpret the test statistic without considering the degrees of freedom (df).
The chisquare table requires the df in order to determine the significance level of the statistic. The df for a %2 table is calculated with the formula:
For the preceding example, the 5x2 table has 4 degrees of freedom for (51) x (21). So degrees of freedom become a multiplied value of two variables’ number of categories minus one.
Figure 8.1 ChiSquare Distributions for Different Degrees of Freedom
In this final step of our analysis, we take all of the information we have obtained in earlier steps and begin to pull it together to draw a conclusion. We will assess the test statistic against the critical value at our chosen level of significance (usually 0.05) and either reject or fail to reject our null hypothesis. For Table 8.2, the computed chisquare statistic is 2,394 and we can reject the null hypothesis at the 0.05 level or beyond (way beyond). That is, we can say with 95 percent or greater confidence that travel mode varies by housing type, or equivalently, that the two variables are related to one another.
Strength Test for the ChiSquare
The researcher’s work is not quite done yet. Finding a significant difference merely means that there is less than a 5 percent chance that the values of each category are randomly distributed and two variables are not related. But recall that statistical significance is not equivalent to practical significance. Statistical significance depends on both the strength of the relationship between two variables and the number of cases. Because the numerator in the chisquare formula is squared, you will likely get statistically significant values when you have a large sample, even if the association between variables is weak.
For the chisquare, the most commonly used strength tests are phi test and Cramer’s V tests. Both depend only on the strength of the relationship between two categorical variables in a contingency table.
Phi is used with 2x2 contingency tables and Cramer’s V is used with larger tables. Phi and Cramer’s V assume values between 0 and 1. Phi eliminates sample size by dividing chisquare by n (the sample size) and taking the square root. V eliminates sample size by taking the square root of chisquare divided by n and multiplied by m (which is the smaller of [rows  1] or [columns  1]). Since phi and V have known distributions, statistical software packages can give us the significance level of the computed phi or V value.