Presentation on theme: ""— Presentation transcript:
65 Is there a familiar pattern to the variability of ? As the sample size becomes larger, the distribution of the sample mean becomes closer to a normal distribution, regardless of the population from which the sample is drawn.The central limit theorem by Polya (1920’s) is a very important theorem which states that the distribution of the sample mean is Normal
66 Central Limit TheoremIf a sufficiently large random sample (i.e. n > 30) is drawn from a population with mean, m, and variance, s2, the distribution of the sample mean will have the following characteristics:1. an approximately normal distribution regardless of the distribution of the underlying population.2.3.
67 Example 7Suppose the random variable X has a mean of 50 and a standard deviation of 10.Calculate the mean and the standard deviation of the sample mean (standard error) for each of following sample sizes: (Assume the population is infinite.)a. n=40b. n=55c. n=100d. What are the sizes of the standard deviation of the sample mean (Standard errors) as the sample size increases?
68 Example 7 - SolutionWe are given that X has m = 50 and s = 10 and the population is infinite. SE= s / na.b.
69 Example 7 - Solutionc.d. It decreases–reflecting the additional information provided by a larger sample size.Summaryn = 40 =n = 50 =n = 100 = 1
70 Importance of the Central Limit Theorem The most important feature of this theorem is that it can be applied to any population.Because the theorem does not have any distribution assumptions, it is widely applicable and is one of the cornerstones of statistical inference.
71 Central Limit Theorem and Sample Size The only restrictive feature of the theorem is that the sample size must be sufficiently large for the theorem to be applicable.Even if the distribution of the population deviates substantially from the normal distribution, a sample size of 30 will usually be sufficiently large to produce a sampling distribution for that is approximately normal.
72 Distribution Shapes Population Distribution Distribution of the Sample Mean for Large SamplesBimodal PopulationExponential Population
73 Distribution Shapes Population Distribution Distribution of the Sample Mean for Large SamplesNormal PopulationUniform Population
74 Example 8Suppose a sample of size 40 is drawn from a population that has a mean of 276 and a variance of 81.What is the probability that the mean of the sample will be less than 273?
75 Example 8 - SolutionWe are given that a sample of size n = 40 is drawn from a population that has m = 276 and sBy the CLT, has a normal distribution with
78 Example 9Suppose there is a normally distributed population with a mean of 100 and a standard deviation of 10.If is the average of a sample of 50, find the following probabilities.a.b.c.
79 Example 9 - SolutionWe are given that X has a normal distribution with m = 100 and s = 10 and n = 50.By the CLT, has a normal distribution with
80 Example 9 - Solution a. P ( 103) = P( ) = P(z ) = .5 + P(0 < z < 2.12)= .9830P ( ) = P( )= P(z ) = .5 + P(-2.83 < z < 0)= .9977b.
81 Example 9 - Solution c. P (95 103) = P( ) = P(-3.54 z 2.12) = P(-3.54 < z < 0) + P(0 < z < 2.12)= .9830
82 Example 10A travel agency conducted a survey of the prices charged by ocean cruise ship lines and determined they were approximately normally distributed with a mean of $110 per day and a standard deviation of $20 per day.
83 Example 10 - Questions1. If an ocean cruise ship line is chosen at random, find the probability that they will charge less than $99 per day?2. What is the probability that the average charge for a randomly selected sample of 35 ocean cruise shop lines will be less than $99 per day?
87 ProportionsThere are many instances in which the variable of interest is a proportion.Examples:A marketing researcher may be interested in what proportion of persons on a mailing list will buy their product.A college is concerned with the fraction of freshmen that will be in academic difficulty after the first year.
88 Population Proportions and Sample Proportions Population proportions must be estimated just like population means.The sample proportion is a reasonable estimate of the population proportion.Sample proportions vary depending on the selected samples.
89 SymbolsThe symbols used to represent the population and sample proportions arep - population proportion,- sample proportion.
90 How do you determine a sample proportion? When calculating a proportion, the number in the sample that possesses the characteristic of interest goes in the numerator, and the size of the sample is placed in the denominator.where x is the number in the sample possessing the characteristic of interest
91 What is the central value of ? The expected value (mean) of the sample proportion is the population proportion.E( ) = pSince the expected value of the estimator is equal to p, then is an unbiased estimator of p.
92 What is the variance of ? The variance of is given by If the population proportion is unknown (which is usually the case), p can be estimated by , and the variance of the sample proportion is estimated as
93 Is there a familiar pattern to the variability of ? The sampling distribution of approaches normality as n becomes sufficiently large.The sample size is generally considered “sufficiently large” if np 5 and n(1-p) 5.Sampling Distribution of pp
94 Sampling distribution of the Sample Proportion If the population is infinite and the sample is sufficiently large, the distribution of has the following characteristics:1. an approximately normal distribution.2.3.
95 Sampling Distribution of the Sample Proportion If the population is finite and the sample is sufficiently large, the distribution of has the following characteristics:1. an approximately normal distribution.2.3.where N is the size of the population.
96 Since is a good estimator of p ... Can limits be established for the error in estimation?Since the sampling distribution of is known, determining probabilities for various errors of estimation can be determined.
97 Example 11A random sample of 100 employees of a large steel company has 30 females and 70 males.1. Find the sample proportion of female employees.2. Find the sample proportion of male employees.
99 Example 12Suppose that the true proportion of Americans over 25 years old that have a 4 year college degree is .35.Find the mean and the standard deviation of the sample proportion for samples of the following sizes.a. n = 38b. n = 52c. n = 75d. What happens to the size of the standard deviation of the sample proportion as the sample size increases?
101 Example 12 - Solutionc.It decreases–reflecting the additional information provided by the larger sample size.d.
102 Example 13 Suppose that the true population proportion, p = .30. What is the probability that the sample proportion of a sample of size 30 will be less than .20?
103 Example 13 - Solution has an approximately normal distribution because np = (30)(.3) = 9, andn(1 - p) = (30)(.7) = 21are both greater than or equal to 5.
104 Example 13 ans Zstat= (0.2-0.3)/0.08367 =-1.195172 Rounded to -1.20 Area 0 to 1.20 in Table A isTail area == this is the area in the left tail
105 Example 14The property manager of a large office building would like to make the building smoke free; however, he does not want to upset too many of his customers.He decides to randomly select 50 of the workers in the building and ask them whether or not they smoke.If the sample proportion of workers who smoke is less than .30, the property manager will make the building smoke free.
106 Example 141. Find the probability that the property manager will make the building smoke free when the true proportion of smokers is .50.2. Find the probability that the property manager will not make the building smoke free when the true proportion of smokers is .20.
107 Example 14 - Solution 1. Because np = (50)(.50) = 25 and are both greater than or equal to 5, we can assume that has an approximately normal distribution with
108 Example 14 - Solution1.The property manager will make the building smoke free if is less than .30.P( < .30) = P( < )= P(z < -2.83)= .5 - P(-2.83 < z < 0)= = .0023
109 Example 14 - Solution 2. Because np = (50)(.20) = 10 and are both greater than or equal to 5, we can assume that has an approximately normal distribution with
110 Example 14 - Solution2.The property manager will not make the building smoke free if is greater than .30.P( > .30) = P( > )= P(z > 1.77)= .5 - P(0 < z < 1.77)= = .0384
112 Probability SamplesProbability samples enable an analyst to determine the probable errors that an estimator might generate.They allow the analyst a known degree of confidence in their estimation.All statistical inference relies on probability sampling.
113 Types of Probability Samples Cluster sampling involves dividing the population into clusters, and randomly selecting a sample of clusters to represent the population.In stratified sampling, the population is divided into strata, which are sub-populations.A strata can be any identifiable characteristic that can be used to classify the population.If the population consisted of people, then strata could be sex, income, political party, religion, education, race, and location.
114 Pros and Cons of Cluster Sampling Cluster sampling can be as effective as simple random sampling if the clusters are as heterogeneous as the population; however, clusters are almost never as diverse as the population.Smaller cluster sizes will result in more representative samples.Cluster sampling simplifies the task of constructing the sampling frame, since the initial frame is composed only of clusters.
115 Stratified SamplingStratified sampling can provide greater accuracy if the population is heterogeneous, and sub-populations of the population can be identified that are relatively homogeneous.
116 Non-probability Samples Non-probability samples are a convenient means of obtaining sample data.If data from a non-probability sample is used to estimate a population parameter, there is no statistical theory that helps define the potential error of the estimate and hence no statement about an estimate’s reliability can ba made.
117 Types of Non-probability Samples A judgment sample is a sample in which sample values are selected by an expert in the field.A convenience sample is a convenient group of observations.One of the worst forms of non-probability samples is the voluntary or self-selected sample.
118 Almost Random SamplesThe systematic sample, does not clearly belong to probability or non-probability samples.In a systematic sample, every kth member of the population is included in the sample.Note: If there is some pattern in the sampling frame that corresponds to the sampling pattern, an unrepresentative sample may result.
119 Example 15 (a - c)A social researcher in Florida wants to determine the average number of children per family in the state.a. What is the population of interest?b. What variable will be measured?c. What level of measurement is the variable of interest?
120 Example 15 (a - c) Solution a. Population - families in the state of Floridab. Variable measured - number of children per familyc. Level of measurement - ratio
121 Example 15 (d)d. What are the steps that would be necessary for each of the following sampling methods:1. Simple random sampling2. Cluster sampling3. Stratified sampling
122 Example 15 (d) Solution 1. Simple Random Sample - List all families in the state of Florida (perhaps from a census, phone books, tax returns etc.Assign sequential numbers to all of the families (1 to N).Select n random numbers between 1 and N from a random number table (or generate these).Select the families corresponding to the random numbers.
123 Example 15 (d) Solution 2. Cluster Sampling - e.g. Take a map and divide the state of Florida into 1000 regions.Number the regions from 1 to 1000.Select n random numbers between 1 and 1000.Select the n regions corresponding to the random numbers.Survey every family in the region indicated by the random numbers.
124 Example 15 (d) Solution 3. Stratified Sampling - e.g. Separate all families in the state by income level.Number each family within the income level.Select e.g. 100 random numbers for each income level.Select the 100 families for each income level indicated by the random numbers.
125 Example 15 (e)What sampling method do you believe would be most cost effective?
126 Example 15 (e) SolutionThe most cost effective method would be cluster sampling.
127 Example 16A biology professor is interested in the proportion of students at his college who are pre-med. majors.In his next class he asks the students who are pre-med. majors to raise their hands.Fifty percent of the students raise their hands.
128 Example 161. What type of sampling technique was used for this survey?2. What type of biases may be present in the responses?3. Is 50% a reasonable point estimate of the proportion of students at the college who are pre-med. majors? Explain.
129 Example 16 - Solution 1. Convenience 2. If the Biology course is a required course for all majors, then there may be a larger proportion of freshmen and sophomores in the class than in the college population as a whole.
130 Example 16 - Solution2. If the Biology course is not a required course for all majors, then there may be a larger proportion of students in the class who are in majors which require the course, than in the college population as a whole.3. No. For the reasons cited in part 2.