# Statistics Is for Providing Quality Criteria for Diagnostic Tests, Validity, Reproducibility, and Precision of Qualitative Tests

## Validity of Qualitative Tests

The underneath table shows healthy and unhealthy subjects. The b and c subjects are false negative and false positive respectively. They are the problem of diagnostics tests.

Sensitivity=a/(a+c) = (false positives)/(false positive+false negatives) Specificity=d/(b+d) = (true negatives)/(true negatives+false positives)

Overall validity=(a+d)/(a+b+c+d)

Example. Patients assessed for pneumonia consist of 2 Gaussian groups: on x-axis individ ESRs, y-axis how often. Various “normal values” can be considered.

If the “normal value” is an ESR (erythrocyte sedimentation rate) of 43 mm, then, according to the test, right from 43 mm the patients are diseased. You will miss many diseaseds. this test would have a low sensiticity. Your beta would be large, and your alpha would be small.

If the ESR is 32 mm, then, according to the test, right from 32 mm the patients will be diseased. You miss many healthy patients. The test has a low specificity. Your beta will be small, and your alpha will be large.

(beta=type II error of finding no effect, where there is one, alpha=type I error of finding an effect, where there is none)

Now, what cut-off value or “normal” value is best? You want to miss few diagnoses, thus, wish to have a high sensitivity and specificity. ROC (receiver operating characteristic) curves are helpful for finding out. Calculate for several normal-values the corresponding sensitivities and specificities. Then draw a curve with on the y-axis sensitivities, and on the x-axis specificities or, rather, (1-specificities) = proportion of false positives. A perfect diagnostic test will reach the top y-axis (100 % sensitivity, 100 % specificity), but, unfortunately, this will never happen.

In the underneath graph an example is given. With an ESR (erythrocyte sedimentation rate) of 38 mm the shortest distance to the top of the y-axis is obtained.

ROC curves are very popular but....

• 1. Sometimes more than a single shortest distance from the top of the y-axis is observed.
• 2. A curve close to the diagonal may exist and indicates a poor test, because sensitivity and specificity together will never exceed approximately 100 %, e.g., 45 % and 55 %, a sensitivity or specificity close to 50 % is a result similar to that of gambling, like tossing a coin. Such a test is poor, because it can be replaced with gambling.
• 3. Comparing 2 curves for finding the best of 2 diagnostic tests is called c-statistics. The problem is that the curves often cross with intervals where one test performs better than the other vice versa.

## Reproducibility of Qualitative Tests

Cohen’s kappas are, traditionally, used for assessing the repoducibility or reliability of a qualitative diagnostic test. An example is given. A lab-test includes 30 patients. All patients are tested twice.

If not reproducible at all, you should find 15 x twice the same (half of the times the same outcome). We do, however, find 21 x twice the same.

A result of 0.4 is better than not reproducible at all, 0 means very poor, 1 excellent reliability.