I like to start visualizing interval data by running stem-and-leaf plots and box plots. Figure 20.3 shows a stem-and-leaf plot for the variable MFRATIO in table 20.8.

Inspection of table 20.8 confirms that the lowest value for this variable is 85 men per 100 women (in Latvia). The ‘‘stem’’ in the stem-and-leaf plot is 85, and the 0 next to it is the ‘‘leaf.’’ There are two cases of 93 men per 100 women (Poland and Uruguay), so there are two 0s after the 93 stem. The M in figure 20.3 stands for the median, or the 50th percentile (it’s 98), and the Hs indicate the lower hinge and the upper hinge, or the 25th and the 75th percentiles. The two cases below 89 and the three cases above 107 are technically considered outliers. This variable looks like it’s pretty normally distributed, with a hump of cases in the middle of the distribution and tails going off in both directions.

Distributions Again

We can check to see if the variable is, in fact, normally distributed or skewed by evaluating the distribution. That’s where box plots come in.

Just as a refresher, back in chapter 6, on sampling theory, we looked at some basic shapes of distributions: skewed to the right, skewed to the left, bimodal, and normal. I have shown them again in figure 20.4, but this time notice the mode, the median, and the mean:

1. the mode, the median, and the mean are the same in the normal distribution;

2. the mean is pulled to the left of the median in negatively skewed distributions;

3. the mean is pulled to the right of the median in positively skewed distributions; and

4. some distributions are bimodal, or even multimodal. In these cases, you need to calculate two or more means and medians because the global mean and median give you a distorted understanding of the distribution.