Geography 360, Fall 2003
Exam II
Answer key
1. Essay question – the following are the main principles I expected to be included in your essays.
A. Explanation of basis for Central limit theorem: distribution of all possible sample means around population mean approach normal distribution around mean
if population is not normally distributed, it must be very large if distribution of sample means is to approach normal
mean of sample means is equal to population mean
standard error is standard deviation of population (estimated by sample) divided by square root of sample size
B. Application to confidence intervals:
because sample means are normally distributed, one is able to make statements regarding the expected percentage of possible sample means to have a value within a given range
comparing to the standard normal curve, we can examine the range determined by the standard error calculated from known or estimated standard deviation at a given percentage of confidence
3. Application to sample size
A large sample should improve the accuracy of values estimated on the basis of the sample
By determining a maximum acceptable error, equivalent to alpha or the p-value, we can calculate sample size that would provide a given level of confidence regarding estimates of population mean, total or proportion
4. larger standard error (standard deviation) increases size of confidence interval (its range)
larger sample size reduce the impact of the standard error because less likely to include values only above or below mean.
2. Sampling
You should have identified and discussed the means of attaining a random, systematic, or stratified sample using two of the following methods: points (for example, individual seat numbers), quadrats (block of 20 seats from different sections), transects (establishing two endpoints and lines - most feasibly aisles between sections). You also needed to justify your use of purely random or stratified sampling, especially given differences in behavior between student and other sections.
3. Probability
1. Need to use normal distribution because we are considering a continuous distribution. We must assume that gauge readings are normally distributed.
2. Can use binomial distribution, because we are considering whether or not sufficient water flow is available (yes or no). We assume that the probability remains constant over time and that annual gauge readings are independent of each other, that is a drought in one year does not increase the chance of insufficient flow the following year.
3. Should use Poisson distribution, because we are considering the frequency of an event within a given time frame. We assume random occurrence of events.
4. Z or t test/score
I expect you to recognize that the test statistic measures the extent of difference (in this case between sample and population mean) relative to the standard error (of the mean). When testing for significance, we acknowledge that the (means) are not equal, but we wish to determine how likely it would be to get such a difference (or greater) given the characteristics of the population or sample distribution. Thus, the greater the difference in the numerator, the greater is the percentage of possible sample mean values lying below the actual sample mean. The extent of the difference is mitigated (lessened) by a larger standard error - the more variance in the data, the more likely that the sample mean is influenced by outliers. The influence of the standard error is, in turn, mitigated by the size of the sample - the more scores we include in our sample, the more representative of the population we expect it to be, so the difference in the numerator becomes more relevant.
5. Inferential testing
a) Because the data has been collected on two occasions from the same sites, we are dealing with a matched pairs test. Furthermore, the geographer’s observation that the distribution of plants is not normal, but skewed according to microclimatic influences requires the use of non-parametric procedures. Thus, the correct test is the Wilcoxon matched pairs signed rank test. Null Hypothesis: the ranked mated-pair differences are equal (Total rank of negative pairs is equal to total rank of positive pairs). Alternative hypothesis (because researchers believes population will decline) is a one tailed: the ranked matched pair differences are positive, if subtracting second sample from first (otherwise the opposite is true).
sample 1 sample 2 difference rank
50 38 12 12
17 16 1 1.5
5 9 -4 4
11 3 8 10.5
18 23 -5 5.5
38 30 8 10.5
23 18 5 5.5
4 11 -7 8.5
35 22 13 13
10 13 -3 3
30 14 16 15
31 16 15 14
6 12 -6 7
23 22 1 1.5
8 15 -7 8.5
Zw= (12+1.5+10.5+10.5+5.5+13+15+14+1.5)-(15*16/4)
= 83.5(60) = 1.33
sqrt ((15*16*31)/24) 17.61
95% confidence level is at Z=1.65, so she cannot reject null hypothesis. The p-value is .0918, or close to 10% chance of Type I error if null hypothesis is rejected. We accepted either rejection or acceptance of null hypothesis given that there is sufficient justification (did your explanation convince us that you understand what the test statistic has shown).
b) We are comparing a sample mean with a population mean, so we use the one sample difference of means test. The sample is smaller than 30, so we must compare to the t distribution. The null hypothesis is the population mean is equal to the sample mean. The alternative hypothesis is one sided, the population mean is greater than the sample mean. Using the given mean and standard deviation,
t = 110,117 - 120,000 = - 9,883 = -1.38 at 8 degrees of freedom (n-1)
21,496/sqrt(9) 7,185
The p-value is .0995
With increased number in sample without change in mean or standard deviation:
t = 110,117 - 120,000 = -9,883 = -1.72 at 13 degrees of freedom
21,496/sqrt(14) 5,745
The p-value is .0565. You would be more confident in rejecting the null hypothesis because you are more certain that the sample represents the neighborhood (larger sample) and thus the difference from the city mean is more likely significant as well.
c) The physical geographer is comparing to sample means to determine if the are derived from the same population, thus he needs to use the two sample difference of means test. Given that we will assume similar variances, he estimates the standard error of the difference using the pooled variance estimate (SVE) and we compare to the t-table at (n+n-2)= 23 degrees of freedom. Because he has no basis to suggest what the difference in water flow for the terraces was, he uses a two-sided hypothesis. Null hypothesis is that the means estimated by the samples are equal. The alternative hypothesis is that they are not equal.
t = 3.7 - 8.9 = - 5.2 = -5.51
sqrt((1.68^2)(15-1)+(3.07^2)(10-1)/(15+10-2))*sqrt((1/15)+(1/10)) 2.33*0.41
p-value is < 0.001
Students should justify rejection of null hypothesis.
d) We are comparing proportion of one course response to proportion of population response. Use a One-sample difference of proportion test. Null hypothesis is that estimated proportion is equal to population proportion. Alternative hypothesis is either that estimated proportion is not equal to population proportion (two-tailed) or that estimated proportion is less than population proportion (one-tailed).
Z for response 1= 0.35 – 0.33 = 0.02 = 0.18
Sqrt(0.35*0.65/20) 0.11
Z for response 2 = 0.61 – 0.33 = 0.28 = 2.55
Sqrt (0.61*0.39/18) 0.11
Z for total response = 0.47 – 0.33 = 0.14 = 1.75
Sqrt (0.47*0.53/38) 0.08
The students may calculate either for responses separately and comment on deterioration of evaluations or may calculate for total and discuss combined effect. Response 1 has p-value of 0.43 for one-tailed, 0.86 for two-tailed and null hypothesis accepted (the teacher stays). Response 2 has p-value of .0054 (one-tailed), .0108 (two-tailed) and null hypothesis rejected (teacher goes, or does this reflect students, etc…). Total response has p-value of 0.0401 (one-tailed – reject null hypothesis) and 0.0802 (two-tailed – less likely to reject). Again, various interpretations are possible given sufficient justification.