Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sample Means & Proportions

Similar presentations


Presentation on theme: "Sample Means & Proportions"— Presentation transcript:

1 Sample Means & Proportions
Week 7 Sample Means & Proportions

2 Variability of Summary Statistics
Variability in shape of distn of sample Variability in summary statistics Mean, median, st devn, upper quartile, … Summary statistics have distributions

3 Parameters and statistics
Parameter describes underlying population Constant Greek letter (e.g. , , , …) Unknown value in practice Summary statistic Random Roman letter (e.g. m, s, p, …) We hope statistic will tell us about corresponding parameter

4 Distn of sample vs Sampling distn of statistic
Values in a single random sample have a distribution Single sample --> single value for statistic Sample-to-sample variability of statistic is its sampling distribution.

5 Means Unknown population mean, 
Sample mean, X, has a distribution — its sampling distribution. Usually x ≠  A single sample mean, x, gives us information about 

6 Sampling distribution of mean
If sample size, n, increases: Spread of distn of sample is (approx) same. Spread of sampling distn of mean gets smaller. x is likely to be closer to  x becomes a better estimate of 

7 Sampling distribution of mean
Population with mean , st devn  Random sample (n independent values) Sample mean, X, has sampling distn with: Mean, St devn, (We will deal later with the problem that  and  are unknown in practice.)

8 Weight loss Estimate mean weight loss for those attending clinic for 10 weeks Random sample of n = 25 people Sample mean, x How accurate? Let’s see, if the population distn of weight loss is:

9 Some samples Four random samples of n = 25 people:
Mean = 8.32 pounds, st devn = 4.74 pounds Mean = 8.48 pounds, st devn = 5.27 pounds Mean = 7.16 pounds, st devn = 5.93 pounds N.B. In all samples, x ≠ 

10 Sampling distribution
Means from simulation of 400 samples Theory: mean =  = 8 lb, s.d.( ) = lb (How does this compare to simulation? To popn distn?)

11 Errors in estimation From 70-95-100 rule Even if we didn’t know 
Population Sampling distribution of mean mean =  = 8 lb, s.d.( ) = lb From rule x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error Even if we didn’t know 

12 Increasing sample size, n
If we sample n = 100 people instead of 25: s.d.( ) = lb. Larger samples  more accurate estimates

13 Central Limit Theorem If population is normal (, )
If popn is non-normal with (, ) but n is large Guideline: n > 30 even if very non-normal

14 Other summary statistics
E.g. Lower quartile, proportion, correlation Usually not normal distns Formula for standard devn of samling distn sometimes Sampling distn usually close to normal if n is large

15 Lottery problem Pennsylvania Cash 5 lottery
5 numbers selected from 1-39 Pick birthdays of family members (none 32-39) P(highest selected is 32 or over)? Statistic: H = highest of 5 random numbers (without replacement)

16 Lottery simulation Theory? Fairly hard.
Simulation: Generated 5 numbers (without replacement) 1560 times Highest number > 31 in about 72% of repetitions

17 Normal distributions Family of distributions (populations)
Shape depends only on parameters  (mean) &  (st devn) All have same symmetric ‘bell shape’ = 65 inches, s = 2.7 inches

18 Importance of normal distn
A reasonable model for many data sets Transformed data often approx normal Sample means (and many other statistics) are approx normal.

19 Standard normal distribution
Z ~ Normal ( = 0,  = 1) -3 -2 -1 1 2 3 Prob ( Z < z* )

20 Probabilities for normal (0, 1)
P(Z  -3.00) = P(Z  −2.59) = P(Z  1.31) = P(Z  2.00) = P(Z  -4.75) = 0.0013 Check from tables:

21 Probability Z > 1.31 P(Z > 1.31) = 1 – P(Z  1.31)
= 1 – = .0951

22 Prob ( Z between –2.59 and 1.31) P(-2.59  Z  1.31)
= P(Z  1.31) – P(Z  -2.59) = – = .9001

23 Standard devns from mean
Normal (, )   Heights of students = 65 inches, s = 2.7 inches

24 Probability and area X ~ normal ( = 65 , s = 2.7 )
P (X ≤ 67.7) = area

25 Probability and area (cont.)
Normal (, )   Exactly rule P(X within  of ) = approx 70% P(X within 2 of ) = approx 95% P(X within 3 of ) = approx 100%

26 Finding approx probabilities
Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X ≤ 62 )? Sketch normal density Estimate area P (X ≤ 62) = area About 1/8

27 Translate question from X to Z
X ~ Normal (, ) Find P(X ≤ x*)   x* Translate to z-score: Z ~ Normal ( = 0,  = 1) -3 -2 z* -1 1 2 3

28 Finding probabilities
Prob (height of randomly selected college woman ≤ 62 )? About 13%.

29 Prob (X > value) Ht of college woman, X ~ normal ( = 65 , s = 2.7 ) Prob (X > 68 inches)?

30 Finding upper quartile
Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile? Step 1: Solve for z-score Closest z* with area of (tables) z = 0.67 Step 2: Calculate x = z*s + m x = (0.67)(10) = or about 127.

31 Probabilities about means
Blood pressure ~ normal ( = 120,  = 10) 8 people given drug If drug does not affect blood pressure, Find P(average blood pressure > 130)

32 P ( X > 130) ? X ~ normal ( = 120,  = 10) n = 8 prob = 0.0023
Very little chance!

33 Distribution of sum   X ~ distn with (, ) aX ~ distn with (a, a)
e.g. miles to kilometers Central Limit Theorem implies approx normal

34 Probabilities about sum
Profit in 1 day ~ normal (= $300, = $200) Prob(total profit in week < $1,000)? Total = Prob = Assumes independence

35 Categorical data Most important parameter is  = Prob (success)
Corresponding summary statistic is p = Proportion (success) ^ N.B. Textbook uses p and p

36 Number of successes Easiest to deal with count of successes before proportion. If… 1. n “trials” (fixed beforehand). 2. Only “success” or “failure” possible for each trial. 3. Outcomes are independent. Prob (success), remains same for all trials, . Prob (failure) is 1 – . X = number of successes ~ binomial (n, )

37 Examples

38 Binomial Probabilities
for k = 0, 1, 2, …, n You won’t need to use this!! Prob (win game) = 0.2 Plays of game are independent. What is Prob (wins 2 out of 3 games)? What is P(X = 2)?

39 Mean & st devn of Binomial
For a binomial (n, )

40 Extraterrestrial Life?
50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?” Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5)

41 Extraterrestrial Life?
Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5) rule of thumb for # “yes” About 95% chance of between 40 & 60 Almost certainly between 35 & 65

42 Normal approx to binomial
If X is binomial (n , ), and n is large, then X is also approximately normal, with Conditions: Both n and n(1 – ) are at least 10. (Justified by Central Limit Theorem)

43 Number of H in 30 Flips X = # heads in n = 30 flips of fair coin X ~ binomial ( n = 30, = 0.5) Bell-shaped & approx normal.

44 Opinion poll n = 500 adults; 240 agreed with statement
If  = 0.5 of all adults agree, what P(X ≤ 240) ? X is approx normal with Not unlikely to see 48% or less, even if 50% in population agree.

45 Sample Proportion Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40). Random sample of 25 people; X = # with gene. X ~ binomial (n = 25 ,  = 0.4) p = proportion with gene

46 Distn of sample proportion
X ~ binomial (n , ) Large n: p is approx normal (n ≥ 10 & n (1 – ) ≥ 10)

47 Examples Election Polls: to estimate proportion who favor a candidate; units = all voters. Television Ratings: to estimate proportion of households watching TV program; units = all households with TV. Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers. Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess.

48 Public opinion poll Suppose 40% of all voters favor Candidate A.
Pollsters sample n = 2400 voters. Propn voting for A is approx normal Simulation 400 times & theory.

49 Probability from normal approx
If 40% of voters favor Candidate A, and n = 2400 sampled Sample proportion, p, is almost certain to be between 0.37 and 0.43 Prob 0.95 of p being between 0.38 and 0.42


Download ppt "Sample Means & Proportions"

Similar presentations


Ads by Google