# Four Step Data Analysis, Different Hypothesis Tests

Generally, sample statistics include: quantitative data:

• - mean, variance, standard deviation, median, quartiles, range,....
• - mean difference, ratio of medians

discrete data, failure-time data:

• - proportion, percentage (depending on time)
• - difference between proportions, numbers needed to treat, odds ratios, relative risks, hazard ratios,....

The current chapter will particularly focus on the discrete data and failure-time data. The quantitative data analyses have been covered in the Chap. 6. Discrete data can answer many questions in trials like those given underneath.

How large is the response rate How many patients have side-effects?

How many patients were alive (after 5 years)?

Is the response rate under treatment A larger than under B?

Are there more “side-effects” after than before treatment?

What is the optimal dose?

Study design: (a.o.)

trials, cohorts, case-control studies cross-sectional vs follow-up measurements Data type: (a.o.)

quantities, binary, categorical, ordinal variables censored variables

The required data analysis is dependent on (1) the study design and (2) the type of data. Four steps are, often, mentioned to constitute a proper data analysis:

step 1 summarize the data

- calculate statistics

step 2 provide the reliability of the statistics

• - standard error (se), confidence interval (ci) step 3 hypothesis testing
• - p-values, significance level step 4 regression analysis
• - (causal) association, confounder correction, prediction, explained variation,....

The fourth step regression will be the subject of the Chap. 7, and will not be addressed here. The general situation with randomized controlled trials is, that they have representative random samples from a target population. The ultimate conclusion of a trial is very relevant to the sample, but much more to the target population of the trial, as explained the underneath graph.

This somewhat peculiar situation of trials explains much of the analysis steps taken.

 1 sample 1 measurement 2 samples 1 measurement >2 samples 1 measurement Quantitative one sample t-test/ Wileoxon test unpaired t-test / Mann- Whitney test ANOVA, Kruskal- Wallis test Discrete Z-or ehi-squared test Z-or ehi-squared test ehi-squared test Censored (kaplan-meier) logrank test logrank test
 1 sample 2 measurements 1 sample >2 measurements >1 samples >1 measurement Quantitative paired t-test / Wileoxon test mm ANOVA/ Friedman test mm ANOVA Discrete Mc Nemar test Cochran’s Q test r.e. logistic regression Censored stratified logrank test stratified logrank test fRailty models

Above an overview of relevant tests for data analysis including those of discrete data analysis is given (ANOVA=analysis of variance, r.e.=random effects). Many hypothesis tests are possible, and each of them has its own place in the area of statistical data analysis. In this chapter the most relevant procedures will now be explained with examples from practice.