Exercise 19: Sample Size
Part One Explore how sample size affects the distribution of sample proportions This was achieved by first taking random samples 20 times when n=10 and then taking 20 random samples where n=40. These random samples were then summarized as sample statistics (p-hat).
Tally for Discrete Variable : Live Live Count Percent off on N= 445 *= 1 This verifies that the proportion of students living on campus and off campus is approximately 50%. This would be the population proportion (p).
Mean, Shape & Standard Deviation What would you expect if 20 random samples of 10 were taken? What would you expect if 20 random samples of 40 were taken?
Results from 20 samples where n=10 resulting in phatlive…
Descriptive Statistics: phatlive=10 Variable N N* Mean SE Mean StDev Phatlive Minimum Q1 Median Q3 Maximum
Let’s Look At A Stem Plot Stem-and-leaf of phatlive=10 (N = 20) Leaf Unit =
Sample Proportions… What is the center, spread and shape for this sample proportion? Center= mean= = phat Spread= st.dev= Shape= np and/or n(1-p) does not equal atleast 10, therefore guidelines for normality are not met. However, as shown in the stem plot, the results appear relatively normal because of the perfectly balanced population proportions of.5 and.5.
What if the sample size increases… Results from 20 samples where n=40 resulting in phatlive…
Descriptive Statistics phatlive=40 Variable N N* Mean SE Mean StDev Phatlive= Minimum Q1 Median Q3 Maximum
Stem-plot for phatlive=40 N = 20 &Leaf Unit =
Sample Proportions for phatlive=40 What is the center, spread and shape for this sample proportion? Center= mean=.4562 Spread= st. dev. =.0611 Shape= np and n(1-p) are greater then 10 there normality satisfied.
Let’s compare them simultaneously Descriptive Statistics: phatlive=40, phatlive=10 Variable N N* Mean SE Mean StDev Minimum Q1 Median phatlive= phatlive= Variable Q3 Maximum phatlive= phatlive= How do their centers, spreads and shapes compare?
Box-plots
What does this mean? The mean for n=40 is more consistent with the population mean. The spread is smaller for n=40 The shape is more normal for n=40
As outlined in Chapter 6 A random variable X for count of sampled individuals in the category of interest is binomial with parameters n and p if… 1.There is a fixed sample size n 2.Each selection is independent of the others 3.Each individual sampled takes just two possible values 4.The Probability of each individual falling in the category of interest is always p.
However… The second condition isn’t really met when sampling without replacement. But as long as the population is at least 10n, then approximate independence can still be concluded. Since the population is greater then 400, both sample sizes of 10 and 40 follow this rule.
Part 2 Explores how population shape affects the distribution of sample proportion. First, 20 random samples of 10 were taken and then 20 random samples of 40 were taken. The results were compared.
Handedness Tally for Discrete Variables: Handed Handed Count Percent ambid left right N= 446 Proportion of ambidextrous is very skewed since only approximately 3% of population is vs. 97% who is not.
For Handedness n=10 Variable N N* Mean SE Mean phathandedn= StDev Min. Q1 Median Q3 Max
Stem-plot n=10 Stem-and-leaf of phathandedn=10 N = 20 & Leaf Unit =
What does this data show? The center or mean is The spread is.0073 The shape is not normal because the guidelines of np and n(1-p) being greater then 10 are not met
Handedness n=40 Descriptive Statistics: phathandedn=40 Variable N N* Mean SE Mean StDev phathandedn= Minimum Q1 Median Q3 Maximum
Stem-plot n-40 Stem-and-leaf of phathandedn=40 N = 20 Leaf Unit =
What does this mean? The center or mean is The spread is The shape is normal because the guidelines of np and n(1-p) being greater then 10 are met.
Let’s compare them… Variable N N* Mean SE Mean StDev phathandedn= phathandedn= Minimum Q1 Median Q3 Maximum
Let’s compare them…
What does it mean? By increasing the sample size, the box plot became less skewed. There was less of a spread and fewer outliers. The center remained at approximately.03 The shape became more normal.
Overall Live seemed to be more normal the handedness. This was because the population was no skewed for the live variable like for handedness. In both situation, n=40 caused the distributions to be more normal.