 Root-cause analysis requires the use of the simple analytical tools and methods just discussed. More advanced methods are also useful for root-cause analysis and are essential for some analyses. These include distribution fitting because statistical tests assume a specific probability distribution, tests of means and medians for continuously distributed data, test of proportions, when the data are pass or fail, and other advanced tools and methods.

Hypothesis testing formulates a statistical statement dependent on a practical question. The statement is also associated with a specific statistical test method that has specific assumptions. Figure 9.22 shows nine FIGURE 9.22

Hypothesis testing overview.

sequential steps to set up a statistical hypothesis. In all hypothesis tests, the null hypothesis (H0) is a statement of equality, and the alternative hypothesis (Ha) is a statement of not equal, less than, or greater than. There is also a test method with the practical question, and it has a test statistic. The magnitude of the test statistic is evaluated relative to its statistical significance. The larger the magnitude of the test statistic, the smaller the area to right side (for this discussion) of the critical value. The area is also called a probability value p. This area is also the probability of incorrectly stating that the null hypothesis is false, i.e., it is incorrectly rejected in favor of an alternative hypothesis.

TABLE 9.15

Statistical Risk

 Your Decision Reject H0 Don’t Reject H0 Reality H0 True Type I Error P (Type I error) = a Correct H0 False Correct Type II Error P (Type II error) = l-a =

Table 9.15 describes statistical risk. Alpha risk is related to a decision of correctly rejecting a false null hypothesis, whereas beta risk is failing to reject a false null hypothesis. As an example, if a null hypothesis is true but we do not reject it, then we have made a Type I decision error. Alternatively, if a null hypothesis is false but we do not reject it, then we have made a Type II error. Statistical risk occurs because samples are used to estimate population parameters such as a mean or variance.

Each statistical test is based on an underlying probability distribution and assumptions of its test statistic. We confirm distribution assumptions using a goodness-of-fit analysis to confirm a sample follows a specific probability distribution. This is also a hypothesis test with a null assumption that the sample data follows the presumed distribution. A common assumption of most analyses is that the sample is drawn from a normal distribution. The assumption needs to be proven using a goodness-of-fit test. The null hypothesis can be correctly rejected with 1 -p confidence of not making a Type I error, or stating that the sample data is not from a normally distributed population when in fact it is.

In Figure 9.23, sample data representing monthly demand show an Anderson-Darling goodness-of-fit normality test. Thep value indicates we can reject the null hypothesis correctly with 1-0.123 = 87.7% confidence of not making a Type I error. It should be noted that by convention we usually set a critical probability of p = 0.05 (or 1 -p = 95% confidence) of not making a Type I error. In the current example, because thep value of 0.123 is greater than 0.05, we do not reject the null hypothesis and assume the sample was drawn from a normal distribution. We can now use statistical tests that require normality.

Tests on means and medians are used when data are continuous and the questions are relative to central location. Tests of means include one-sample t-tests, two-sample t-tests, and one-way analysis of variance (ANOVA) tests. The one-sample f-test answers a simple question: “Is a sample mean FIGURE 9.23 Distribution fitting.

equal to a constant?” A two-sample t-test answers the question, “Are two means equal?” Figure 9.24 shows an example of the two-sample f-test for the question, “Is the mean monthly demand equal to the mean monthly units shipped?” The calculated test statistic is shown to be 17.65, and the p value is close to 0. This implies we should reject the assumption of equal means and conclude that the samples differ at a statistically significant level (i.e., 1 -p confidence of not making a Type I error).

A one-way ANOVA test answers the question, “Are these к sample means equal?” In all three statistical tests, an assumption is that the sub-groups are drawn from a normal distribution. The one-way ANOVA also assumes that the variances of the к sub-groups are equal. If the sample distributions are continuous but highly skewed, non-parametric tests of medians are used to compare central location of sub-groups. The one-sample Wilcoxon test compares a sample median to a test median, whereas Mood’s median test or a Mann-Whitney test compares two sample medians to each other, and a Kruskal-Wallis test compares several sample medians. In all comparative tests, when a p value associated with the calculated test statistic is less than 0.05 (or 5%), we can reject the null hypothesis of assumed equality with 1 -p = 95% confidence of not making a Type I error and state that there is a difference in central location. FIGURE 9.24

Box plot of monthly demand and shipments.

Tests of proportions answer practical questions related to differences between proportions. As an example, a one-sample proportion test answers the question, “Is the sample proportion equal to a known test proportion?” A two-sample proportion test answers the question, “Are these two sample proportions equal?” The underlying assumption in proportion tests is that the test statistic follows a binomial distribution because it is a success or a failure (i.e., discrete).

A contingency table answers a practical question, “Are two variables related to each other based on an observed count or frequency?” In Figure 9.25, the null hypothesis states the observed counts of defective FIGURE 9.25

Contingency tables.

invoices are the same regardless of the type of form used or the shift using a form. The observed counts are shown in Figure 9.25 to be 5, 8, and 20 for Form A and shifts 1, 2, and 3, respectively. The observed counts for Form В were 20, 30, and 25 for shifts 1, 2, and 3, respectively. The calculated or expected counts (rounded) are shown to be 7.6, 11.5, and 13.8 for Form A and shifts 1,2, and 3, respectively. The expected counts for Form В are 17.4, 26.4, and 31.3 for shifts 1, 2, and 3, respectively. Contingency tables help answer the question, “Are the observed counts close enough to the expected counts to be considered a random pattern?” If the p value of the calculated test statistic is less than 0.05 (or 5%), the null hypothesis with its assumption of equality is rejected and we conclude the counts differ by the type of form or shift with 1 -p confidence of not making a Type I error.

Equal variance tests answer the practical question, “Are the variances of two or more sub-groups equal?” The null hypothesis is that sub-group variances are equal. If the sub-groups are normally distributed, then the more sensitive Bartlett test can be used for the analysis. However, if one or more sub-groups are not normally distributed, then the Levene test (non- parametric assumption) should be used for the analysis. In Figure 9.26, we see that the p value associated with the Bartlett test is 0.166, which exceeds 0.05. Based on this high p value, we conclude the sub-group variances are equal. Equal variance tests are also used to determine if an assumption FIGURE 9.26

Test for equal variances for monthly shipments (units).

of equal sub-group variance is satisaed prior to using tests such as two- sample f-tests and one-way ANOVA that require this assumption.

One-way ANOVA tests answer a practical question: “Are the means of the sub-groups equal?” The null hypothesis is that the sample means are equal. The assumptions necessary to use this test are that the sub-groups are normally distributed and have equal variance. In the example shown in Figure 9.27, the mean monthly shipments of four machines are compared to each other. The null hypothesis is that the machines have the same mean number of shipments. The high p value of 0.591 indicates that we reject the null hypothesis and conclude that the mean numbers of shipments of the FIGURE 9.27

Box plot of monthly shipments by machine type. FIGURE 9.28

Multi-Vari chart for monthly shipments by industry, machine type, and price level.

machines are equal. If the assumptions for the one-way ANOVA are not met, then a non-parametric test such as the Kruskal-Wallis test can be used to test the null hypothesis that the sample medians are equal.

A Multi-Vari chart is a sophisticated graphical tool comparing several independent variables or factors to a continuous dependent variable or Y. An example is shown in Figure 9.28, where the dependent variable is monthly shipments. The chart shows how the level of a dependent variable changes when the levels of several independent variables change. The independent variables shown in Figure 9.28 include machine type, price level, and industry. The highest variation in shipments is associated with industry.

Whereas a scatter plot compares relationships between two continuous variables without calculating a model, a correlation analysis assesses the linear relationship between two independent and continuous variables from samples and provides a model to explain a linear relationship. In a correlation analysis, the r or simple correlation coefficient varies between -1 and +1. A value of r = -1 indicates a perfect negative correlation where one variable increases, the second deceases. A value or r = +1 indicates a perfect positive correlation where as one variable increases, the second variable increases. In Figure 9.29, several continuous variables are compared pairwise, and simple linear correlation coefficients are estimated for each pair-wise comparison. A p value is also calculated for the null hypothesis: “There is no linear correlation between the variables.”

In the example, the correlation between warranty and rework is 0.629, or weakly positively correlated, but the associated p value is 0, which is lower FIGURE 9.29

Correlation.

than our critical value of 0.05 (or 5%). As a result, we reject the null hypothesis of no correlation and conclude that the variables are linearly correlated to each other with at least 95% confidence of not making a Type I decision error. The algorithm made the p calculation based in part on the sample size. In contrast, warranty and margin% have a correlation coefficient of 0.99 and a p value of 0.094, indicating no statistically significant linear correlation at a 95% confidence level.

These tools and methods help build a regression model by identifying potentially important variables that explain the level and variation in the KPOV (or Y). As we work through the analyze phase, we ask questions such as, Does prior information suggest a tentative model? How are data being collected? What are the independent variables? Then when we create the regression model, check its assumptions, look for influential observations (i.e., outliers), and validate it using confirmatory experiments. Recall that we have used various terms for the dependent variable, including CT characteristic, Y, and KPOV. For the independent variables, we have used terms like X or KPIV. Regression explains relationships between the dependent variable and one or more independent variables. The coefficients of the model are linear in that the terms can be added together if FIGURE 9.30

Simple linear regression.

independent to predict the dependent variable. But the X terms may have a first or second order effect on Y. In other words, they may have a format of X or X2. In more advanced models, the independent or dependent variables may be either continuous or discrete. In all the models, a one-unit change in X increases Y by the coefficient of X, and these terms are added to the constant of the equation, assuming the Xs are independent of each other.

We begin the regression discussion by discussing simple linear regression as an analytical tool used in the analyze phase of the DMAIC methodology. Figure 9.30 shows a simple example of monthly warranty units versus the number of monthly rework units. The assumption is that as the number of reworked units increases, there will be leakage to customers. Based on the analysis shown in Figure 9.30, there appears to be a positive correlation between the numbers of reworked units and warranty units. But there is noise in the analysis because the R2adjUSted value is just 39.9%. This means it explains only 39.9% of the variation in Y, whereas we require an R2adjUStcd value of 90% or higher.

The analysis also calculates a regression equation: Warranty = 367.6 + 0.09265 Rework. This means that, on average, a one-unit increase in rework increases warranty units by 376.6 + 0.09265 = 367.7 units. The line through the sample data is the equation. The data vary within the confidence intervals (Cl) 95% of the time, and individual values vary within the

TABLE 9.16

Model Assumptions: A Residual Is a Difference between the Predicted Y from the Regression Equation and the Actual Y

 Predicted Y Actual Y Residual 10 8 2 8 10 -2 10 10 0 Assumption How to Verify Independence of the residuals Checked by plotting the residuals versus the time sequence of observation to verity randomness and using the appropriate test (e.g., Durbin-Watson) Normality of the residuals Checked by running a normal plot of the residuals Constant variance of the residuals Checked by running a normal plot of the residuals

prediction intervals (PI) 95% of the time. Recall that, if there are outliers, these should be investigated and eliminated if possible to create a better model. In addition to the outlier investigation, the results of the model should be analyzed. A residual is a difference between the predicted Y from the regression equation and the actual Y. The larger the differences, the poorer the regression model and the farther the sample data are from the regression line. If the residuals are close to the regression line, the higher the R2 statistic. R2 is a ratio of the variation explained by our model to the total variation of the dataset. Table 9.16 describes how residuals are calculated and the rules that ensure a good model.

Figure 9.31 shows the equations for the R2 and R2adjUSted statistics. The total variation is calculated as the difference from each data point from the mean of all the data. The sum of squares of error (SSE) is the difference at each level where the model does not predict the actual values. The simple linear regression model is sum of squares of total variation (SST) = sum of squares of variation explained by the model (SSM) + SSE. SST, SSM, and SSE are calculated using other equations. When we move from simple linear regression having one independent variable to multiple linear regression with several independent variables, R2 is adjusted because the addition of any independent variable to the regression model, however irrelevant, will cause a decrease in the SSE term and create a marginal increase in R2. This correction term balances the effect of the addition of irrelevant independent variables to the model against the required increases in sample size and reduction in the SSE. FIGURE 9.31

What is R2? SSE = sum of squares for error; SST = sum of squares for total.

Figure 9.32 shows some common regression-based models. In this chapter, we will discuss multiple linear regression models (MLR) using several independent variables. MLR models explain variation of a dependent variable by using a least squares algorithm. The algorithm fits an equation or line through a dataset in a way in which the sum-of-squared deviations from every data point to the fitted line is minimal relative to any other line that could be fit through the same dataset. We will also discuss several other statistical tests that how good the fitted line is for explaining the variation of the dependent variable.

MLR requires parameters of a model be linear so they can be estimated with a least squares algorithm. Referring to Figure 9.32, we see additional assumptions that must be met. These are residuals should be normally and independently distributed around zero with constant variance. Recall that a residual is the difference between the models’ fitted value (Yfmed) minus its observed value (Yobserved) for each experiment or observation of a dataset. An experimental observation consists of a Y or dependent variable (e.g., monthly sales) and the levels or values of each independent variable (i.e., Xs) are used to build a predictive MLR equation represented as Y = /(X) = Po + P,X, + P2X2 + ... + PkXk. As an example, if the MLR model is fitted to the monthly sales of \$100,000, but the actual observed value of monthly sales was \$90,000, then the residual would be \$10,000. The larger the residual, the poorer the fit of an MLR equation.

In addition to the MLR model, there are several specialized regression models used for specific situations. Several of these are shown in Figure 9.32. In the MLR model, the parameters or coefficients are linearly FIGURE 9.32

Regression models.

related to a d ependent variable Y, although the form of the independent variables can be quadratic (e.g., x2)or may contain оther higher-order terms, a ere are other types of regression models shown in Figure 9.32 that a re u sed toe xplain t he v ariation о f a d ependent v ariable Y w hen the assumptions required for using an MLR model cannot be met. As an example, non-linear regression is used if the estimated parameters of the equation are not a linear function of the dependent variable. An example would be if the parameters were in the form of an exponential function such as Y = (30 + P1exi In this application, a transformation of the nonlinear equation might be useful for building the linear relationship of Y = /(X), but this may not always be possible. The Cochran-Orcutt method is another regression model wherein the MLR assumptions are not satisfied. In this application, residuals are correlated by time and are not independent. An MLR analysis requires the serial correlation information contained in its residual pattern be incorporated back into the MLR model as a term to explain the variation of the dependent variable more adequately. The Cochran-Orcutt method can be directly used to explain the variation of the dependent variable. If independent variables are correlated to each other (as determined using a variance inflation factor), ridge regression can be used to build a regression model. A robust regression method is used if MLR assumptions are not met due to severe outliers in a dataset. Finally, if a dependent variable is discrete, logistical regression can be used to build a model. There are three major types of logistic regression models: binary (pass/fail), ordinal (1, 2, 3, 4, etc.), and nominal (red, white, blue, etc.), depending on how the data of the dependent variable are structured.

Several statistics are associated with MLR models. Figure 9.31 showed an R2 statistic and its adjusted version, R2adjUS(ed- These statistics measure the percentage of variation of the dependent variable explained by an MLR equation. As an example, an R2 statistic of 0.9 implies 90% of the variation of Y is explained by an MLR equation. It is apparent that high R2 statistics imply a good model fit to a dataset. R2adjUSted adjusts R2 downward to account for the number of independent variables incorporated into the MLR model relative to the sample size used to build it. This is because R2 can be increased by simply adding variables to the MLR model, even if they do not explain changes in Y.

A second important test is the Durbin-Watson (DW) statistic, which measures serial correlation of the model’s residuals (assuming data are ordered by time) as shown in Figure 9.33. One assumption of an MLR model is that its residuals are not serially correlated over time. Figure 9.33 shows it is possible to have positive or negative serial correlation in a fitted model. In actual practice, the calculated Durbin-Watson test statistic is compared to critical values in a statistical table. These critical values are determined considering sample size, the number of independent variables in the model, and the Type I error required by the test. As a rule, an ideal range of the Durbin-Watson test statistic is between 1.5 and 2.5. Serial correlation requires that serially correlated data points or experimental FIGURE 9.33

Durbin-Watson test for serial correlation. A formal statistical test for serial correlation of the residuals is based on the Durbin-Watson test. For no serial correlation, the test statistic d - 2.0; for positive serial correlation, d < 2.0; and for negative serial correlation, d > 2.0. If d is in the range of 1.5 to 2.5, do not suspect statistically significant serial correlation. Note: the d statistic varies by sample size.

observations be removed from the MLR model, or that additional terms be incorporated into the MLR model to incorporate the eff ect of serial correlation of the residuals. Alternatively, the Cochran-Orcutt method can also be used to build the MLR model.

Figure 9.34 summarizes the analytical strategies for building models with serial correlation. These are in fact quite common in situations where time is relevant (e.g., forecasting). A useful approach is to add a lagging dependent variable to the model. A failure to account for residual serial correlation will create either an inefficient model (without lagging dependent variable terms in the model) or an inconsistent model (with lagging dependent variable terms in the model). Inefficient implies the estimated coefficients of the model are accurate on average, but they have a large variance (i.e., a larger sample will be required to precisely estimate them). Inconsistent means the estimated parameters are incorrect and will change from sample to sample.

A third useful statistic is the variance inflation factor, which measures the degree of correlation between two independent variables. Highly correlated independent variables, if left in the final model, will result in inefficient estimates of the independent variable coefficients. This will require a larger than necessary sample to precisely estimate them. Changes in one independent variable would be confused with changes to other independent variables with which it may be correlated. FIGURE 9.34

Correlated residuals. Inefficient = produces a larger variance of the parameter estimate, and a larger sample size is required to reject the null hypothesis: parameter = 0. Inconsistent = the estimated values of the parameter will not correspond to the true value.

Table 9.17 shows how to interpret the suitability of a regression model using the R2adjustcd statistic and residual patterns to adjust it to better estimate the dependent variable (or Y). The R2adjUSted statistic and non-random residual patterns may be caused by poor data collection, including incorrect independent variables, the wrong model format (i.e., perhaps it is not a straight line but a curve, which is better explained using X2 terms). Or perhaps additional independent variables should be added to the model or provide a better fit to the data. It is a useful tool to evaluate how well an MLR model fits a dataset using its R2 value and residual pattern. R2 is a simple ratio of the variation explained by an MLR model (i.e., SSM divided by the total variation of the dataset as measured by deviations from the average of all the data points SST).

Table 9.17 also shows a slightly modified version of an R2 statistic in terms of the SSE: 1-(SSE/SST). SSE is the variation of the dependent variable not explained by an MLR model. The total variation in a dataset equals the variation explained by the MLR model and the variation not explained by it (i.e., SST = SSM + SSE). The higher the SSE term, the poorer the MLR model fit, and vice versa. If a model fits the dataset poorly, its residual pattern should be first analyzed to look for clues as to how best to modify it to more exactly fit the dataset (i.e., to explain more of the variation of the dependent variable). As an example, in Table 9.17,

TABLE 9.17

Interpreting Model Residuals

 SS Total - SS Model + SS Error Larger the residuals the higher SS Error Random Low Variation of Residuals Random High Variation of Residuals Non-Random Pattern of Residuals R* Low • N/A • Poor Model • Incorrect KPIVs • Measurement Accuracy • Poor Model • Add terms to model • Transform KPIVs, KPOVs or Both R-High • Good Model • Correct KPIVs • N/A • Good Model • Add terms? • Transform? What Is R2 Adjusted? The addition of any independent variable to the regression model, however irrelevant will cause a decrease in the SSE term .. producing a marginal increase in R 2 ... the correction term balances the effect of the addition of independent variables to the model against the required increases in sample size. 2 SSE R = 1-- SST 2 SSE / ( n-k-1) R 1-- 1 SST 1 (n—1) Where к = number of independent variables in the regression model.

where R2 is low (i.e., < 0.90), the residuals are large. One or more of the potential causes discussed above might be operative (i.e., there could be measurement errors with the data collection). If there is a non-random residual pattern, it may be possible to transform the dependent or independent variables to obtain a more exact model fit. Another option might be to add an additional term to the MLR model. As an example, if a quadratic pattern is observed in the residual pattern, it might make sense to add a Xt2 term. This assumes serial correlation is eliminated and the independent variables are not collinear (i.e., their variance inflation factors are equal to or close to 1).