Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by.

Similar presentations


Presentation on theme: "Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by."— Presentation transcript:

1 Chapter 4

2 Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by the values that correspond to the 0.025 and 0.975 quantiles of the sampling distribution of the sample statistic.

3 Exercise 2 C would be the 1-α/2 quantile on the normal distribution. From Table 1 or R function qnorm: For CI of 0.8, the 0.9 quantile is 1.281 For CI of 0.92, the 0.96 quantile is 1.750 For CI of 0.98, the 0.99 quantile is 2.326

4 Exercise 3 From Table 1:

5 Exercise 4 From Table 1 :

6 Exercise 5 μ=1200, σ=25, n=36 For CI 0f 95% The 95% CI for μ does not contain 1200, so the claim seems unreasonable

7 Exercise 6

8 Exercise 7 Random sampling requires: 1.That all observations are sampled from the same distribution 2.That the sampled observations are independent, meaning that the probability of sampling a given observation does not alter the probability of sampling another. (Note: this is not the same as equal probability)

9 Exercise 8 The sampling distribution is centered around population μ so it will be 9. The variance of the sampling distribution is given by In this case:

10 Exercise 9 X: 1 2 3 4 P(x) 0.2 0.1 0.5 0.2 So

11 Exercise 10 The expected value of the sample mean equals the population mean, so if you average 1000 sample means the grand average should approximately equal μ, in this case, 2.7.

12 Exercise 11 Based on the same principle, the expected value of the sample variance equals to the population variance, so if you average 1000 sample variances should approximately equal, in this case, 1.01

13 Exercise 12 a=c(2,6,10,1,15,22,11,29) n=8 var(a) [1] 94.28571 The variance of the sample mean is estimated by And standard error is estimated by

14 Exercise 13 The estimate of μ in this case would be based on a single observation = 32. With a single observation, it is not possible to estimate the standard error because there is no variance in the sample. As the sample size increases, the variance of the sampling distribution decreases (squared standard error). Note that n is in the denominator of the standard error. Lower variance in the sampling distribution means smaller standard error, a less error in the sample estimates.

15 Exercise 14 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) N=12 var(b) [1] 93663.52 Squared SE=

16 Exercise 15 b=c(450,12,52,80,600,93,43,59,1000,102,98,43) > out(b) $out.val [1] 450 600 1000 These outliers substantially inflate the standard error, as they inflate the variance.

17 Exercise 16 c=c(6,3,34,21,34,65,23,54,23) n=9 var(c) [1] 413.9444 The squared SE is:

18 Exercise 17 No. An accurate estimate of the standard error requires independence among sampled observations.

19 Exercise 18 The variance of the mixed normal is 10.9, so the squared standard error for a sample of 25 would be 10.9/25=0.436, compared to 1/25=0.04 This means that under small departures from normality, the standard error can inflate more than 10 fold. The inflation greatly increases error, and the length of CIs.

20 Exercise 19 When sampling from a non-normal distribution, the sampling distribution of the mean no longer conforms to the probabilities that of the normal curve. In other words, the sampling distribution is no longer normal, so the se cannot be used accurately to determine probabilities and Cis.

21 Exercise 20 μ=30, σ=2, n=16, so SE=2/4=0.5. Determine Z, and consult Table 1, or use R. For pnorm(29,30,2/sqrt(16)) [1] 0.02275013 For pnorm(30.5,30,2/sqrt(16)) [1] 0.8413447 1-0.841=0.159 For pnorm(31,30,2/sqrt(16)) [1] 0.9772499 0.9777-0.022=0.955

22 Exercise 21 μ=30, σ=5, n=25, so SE=5/5=1. Determine Z, and consult Table 1, or use R. a.pnorm(4,5,1) [1] 0.1586553 b. pnorm(7,5,1) [1] 0.9772499. 1-0.977=0.023 c. pnorm(3,5,1) [1] 0.02275013. 0.977-0.022=0.955.

23 Exercise 22 μ=100000, σ=10000, n=16, so SE=10000/4=2500 From Table 1 P<0.0227 Using R: pnorm(95000,100000,10000/sqrt(16)) [1] 0.02275013

24 Exercise 23 μ=100000, σ=10000, n=16, so SE=10000/4=2500 Compute z scores for each value and consult Table 1. Or use R: pnorm(97500,100000,10000/sqrt(16)) [1] 0.1586553 pnorm(102500,100000,10000/sqrt(16)) [1] 0.8413447.

25 Exercise 24 μ=750, σ=100, n=9, so SE=100/3=33.333 Compute z scores for each value and consult Table 1. Or use R. > pnorm(700,750,100/sqrt(9)) [1] 0.0668072 > pnorm(800,750,100/sqrt(9)) [1] 0.93319280. 933-0.06=0.873

26 Exercise 25 μ=36, σ=5, n=16, so SE=5/4 pnorm(37,36,5/4) [1] 0.7881446 pnorm(33,36,5/4) [1] 0.008197536. 1-0.008=0.992 pnorm(34,36,5/4) [1] 0.05479929 Use table 1 For p<-1.6 pnorm(37,36,5/4) [1] 0.7881446 > pnorm(34,36,5/4) [1] 0.05479929 0.788-0.054=0.734

27 Exercise 26 μ=25, σ=3, n=25, so SE=3/5 a.pnorm(24,25,3/5) [1] 0.04779035 b.pnorm(26,25,3/5) [1] 0.9522096 c. 1-0.0477=0.9523 d. 0.95-0.047=0.903

28 Exercise 27 Heavy tailed distributions generally yield long CI for the mean because their large variance inflates the SE. Central limit thorem does not remedy this problem.

29 Exercise 28 Light tailed, symmetric distributions provide relatively accurate probability coverage for CI even with small sample sizes. Central limit theorem works relatively well in this case.

30 Exercise 29 C is the 1-α/2 quantile of a T distribution with n- 1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9df Or use R:qt(0.975,9): [1] 2.262157 a. b. c.

31 Exercise 30 C is the 1-α/2 quantile of a T distribution with n- 1 degrees of freedom. Look up c from Table 4, 0.975 qantile with 9df Or use R:qt(0.99,9): [1] 2.82 a. b. c.

32 Exercise 31 x=c(77,87,88,114,151,210,219,246,253,262,296,299,306,376, 428,515,666,1310,2611) The R function t.test(x) returns: One Sample t-test data: x t = 3.2848, df = 18, p-value = 0.004117 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 161.5030 734.7075 sample estimates mean of x : 448.1053

33 Exercise 32 y=c(5,12,23,24,18,9,18,11,36,15) The R function t.test(y) returns: One Sample t-test data: y t = 6.042, df = 9, p-value = 0.0001924 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 10.69766 23.50234 sample estimates: mean of x 17.1

34 Exercise 33 Heavy tailed distributions inflate the standard error in a manner that changes the cumulative probabilities of the T distribution. In this situation, the new T quantiles correspond to values that are different than T under normality. The inflation of the SE, due the larger frequency of extreme values in the tails, leads to very long CI that far exceed the nominal value of the state probability coverage under normality. For example, the intended 95% CI will yield a range that in reality covers over 99% of the distribution. When distributions are skewed, T becomes skewed, off centered (mean and median no longer 0 – due to the dependency that is now created between the mean and SD), with values that do not correspond to the quantiles in Table 4. This results in highly inaccurate probability coverage for CIs.

35 Exercise 34 When the variance is estimated by the empirical sample in a light tailed skewed distribution, the t distribution markedly departs from the values to student t (becoming skewed and no longer centered around 0), so probability coverage is no longer accurate.

36 Exercise 35 a. c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 df=24-24 ✕ 2 ✕ 0.2-1=15 qt(0.975,15) [1] 2.13145 b. Df=36-36 ✕ 2 ✕ 0.2-1=21 qt(0.975,21) [1] 2.079614 12-12 ✕ 2 ✕ 0.2-1=7 qt(0.975,7) [1] 2.364624

37 Exercise 36 c corresponds to the 0.975 quantile of a T distribution with n-2g(g=.2n,rounded down)—1 a. qt(0.99,15) [1] 2.60248 b. qt(0.99,21) [1] 2.517648 c. qt(0.99,7) [1] 2.997952

38 Exercise 37 x=c(77,87,88,114,151,210,219,246,253,262,296, 299,306,376,428,515,666,1310,2611) The R function trimci(x) returns $ci [1] 160.3913 404.9933

39 Exercise 38 With trimmed means the CI is 244.6 long With means it is 573.2, which is 2.34 times longer. The mean has a larger standard errors, resulting in larger CI.

40 Exercise 39 m=c(56,106,174,207,219,237,313,365,458,497,515,529,557,615,625,6 45,973,1065,3215). For mean: t.test(m) 266.6441, 930.3033 For trimmed mean: trimci(m) 293.5976, 595.9409 Checking for outliers: out(m) $out.val [1] 3215 The CI for trimmed mean is far shorter than the CI for the mean because the outlier (3213) inflates the SE. In the case of the trimmed mean, it is trimmed. Other values in the data set may have a smiliar effect.

41 Exercise 40 Under normality, the sample mean has the smallest standard error. So it is the only candidate for being ideal. But as we have seen, other estimators have a smaller standard error than the mean in other situations, so an optimal estimator does not exist across board.

42 Exercise 41 No, because what often appears to be normal is not normal. In addition, there are robust estimators that compare relatively well (although not as well) to the mean under normality but perform far better in situations that mildly depart from normality. In other word, under normality, the difference is small, under non-normality it can be very large.

43 Exercise 42 c=c(250,220,281,247,230,209,240,160,370,274,210,204,2 43,251,190,200,130,150,177,475,221,350,224,163,272,23 6,200,171,98) CI for the mean: t.test(c) 95 percent confidence interval: 200.7457 257.5991 CI for the trimmed mean: trimci(c) [1] 196.6734 244.9056

44 Exercise 43 And outlier analysis reveals 4 outliers: out(c) $out.val [1] 370 475 350 98 These increase the length of the CI foe the mean. They are trimmed with the trimmed mean CI.

45 Exercise 44 Even if the two measures are identical, outliers can largely inflate the CI based on means, rendering the outcome less informative.

46 Exercise 45 In this case we have 16 successes in 16 trials. The R function: binomci(16,16, alpha=0.01) $ci [1] 0.7498942 1.0000000

47 Exercise 46 In this case we have 0 successes in 200000 trials. The R function: binomci(0,200000) $ci [1] 0.000000e+00 1.497855e-05

48 Exercise 47 val=0 for(i in 1:5000) val[i]=median(rbinom(25,6,0.9)) splot(val) This is an example of how the sampling distribution of the median can largely depart from the expected bell curve dues to tied values. Each of the 5000 samples has many tied values because there are 25 trial in every sample and only 7 possible outcomes. Thus values are bound to repeat.


Download ppt "Chapter 4. Exercise 1 The 95% CI represents a range of values that contains a population parameter with a 0.95 probability. This range is determined by."

Similar presentations


Ads by Google