Geography 360:
Standard deviation, skewness, kurtosis
In order to better explain the coefficient of variance consider the following data taken from the unemployment statistics provided in class. The data was provided as thousands of people who report as being unemployed. Below, I have taken the values for every fifth state in the alphabetical listing. In the first column, I have rounded the scores for each state to the nearest 100,000. The second column gives the data as reported. You would expect the variance within each data set to be comparable because they represent the same values. How do the calculated standard deviations compare? Why is the value for the second column so much higher? What happens when the standard deviation is divided by the mean?
|
# reported unemployed |
100,000s |
1,000s |
|
|
1 |
126.1 |
|
|
2 |
158.3 |
|
|
3 |
263 |
|
|
3 |
333.4 |
|
|
1 |
113.2 |
|
|
0 |
13.2 |
|
|
0 |
11.2 |
|
Mean |
2.5 |
248.37 |
|
Standard deviation |
3.32 |
323.65 |
|
Coefficient of Variation |
1.33 |
1.30 |
Skewness: There are various ways to calculate skewness - that is, to quantify the symmetry of a frequency distribution. Each method emphasizes are particular aspect of skewness and values should only be compared if calculated the same way. All of these procedures effectively emphasize outliers - those variables which deviate greatly from a symmetrical frequency distribution. Because the mean is affected to a greater extent by outliers than the median, subtracting the second from the first (mean-median) is a simple way to quantify skewness. Other calculations do similar things by cubing the deviation between each variable and the mean. Positive skewness is skewed to the right, or has outliers above the mean. Negative skewness is skewed to the left and has outliers less than the mean. Draw the frequency distribution for columns 1-5 (because of the small number of variables, grouping these in categories with a range of five will make this more understandable visually). Note the impact of adding an additional outlier similar in value to the one already present when comparing 1 to 2 and 3 to 4.
Kurtosis, as with skewness, describes the shape of the frequency distribution. It quantifies the peakedness of the distribution. Consider distributions 5-7 treated as you did in the previous exercise (or adjust the category ranges as seems appropriate). How do these compare visually?
|
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
|
|
18 |
18 |
23 |
22 |
23 |
15 |
23 |
|
|
|
21 |
21 |
69 |
23 |
26 |
42 |
62 |
|
|
|
23 |
23 |
75 |
69 |
40 |
55 |
66 |
|
|
|
25 |
25 |
78 |
75 |
53 |
66 |
69 |
|
|
|
25 |
25 |
87 |
78 |
56 |
70 |
69 |
|
|
|
27 |
27 |
89 |
87 |
58 |
70 |
69 |
|
|
|
30 |
30 |
92 |
89 |
61 |
74 |
69 |
|
|
|
50 |
50 |
99 |
92 |
74 |
85 |
72 |
|
|
|
112 |
112 |
112 |
99 |
88 |
98 |
76 |
|
|
|
|
112 |
|
112 |
91 |
125 |
115 |
|
|
Mean |
36.78 |
44.3 |
80.44 |
74.6 |
57 |
70 |
69 |
|
|
Std Dev |
27.98 |
34.84 |
23.7 |
28.51 |
22.01 |
28.46 |
20.85 |
|
|
C.V. |
0.76 |
0.79 |
0.29 |
0.38 |
0.39 |
0.41 |
0.3 |
|
|
Variance |
782.62 |
1,213.61 |
561.8 |
813.04 |
484.6 |
810 |
434.8 |
|
|
Skewness |
2.08 |
1.32 |
-1.26 |
-0.89 |
0 |
0 |
0 |
|
|
Kurtosis |
2.86 |
-0.05 |
1.24 |
-0.4 |
-1 |
-0.01 |
1.74 |
|
|
Worktable for Calculating Mean, Standard Deviation, Coefficient of Deviation, Skewness, Kurtosis |
||||||
|
Observation i |
Xi |
Xi - : |
(Xi - :)2 |
:2 |
(Xi - :)3 |
(Xi - :)3 |
|
1 |
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
4 |
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
8 |
|
|
|
|
|
|
|
9 |
|
|
|
|
|
|
|
10 |
|
|
|
|
|
|
|
11 |
|
|
|
|
|
|
|
12 |
|
|
|
|
|
|
|
13 |
|
|
|
|
|
|
|
14 |
|
|
|
|
|
|
|
15 |
|
|
|
|
|
|
|
Sum of Column |
|
|
|
|
|
|
Please see me for sample data to use with the above table.
N = : = 1/N(EXi) = =
F = Coefficient of variation =
skewness =
kurtosis =