Presentation on theme: "9 - 65 Is there a familiar pattern to the variability of ? As the sample size becomes larger, the distribution of the sample mean becomes closer to a normal."— Presentation transcript:
Is there a familiar pattern to the variability of ? As the sample size becomes larger, the distribution of the sample mean becomes closer to a normal distribution, regardless of the population from which the sample is drawn. The central limit theorem by Polya (1920’s) is a very important theorem which states that the distribution of the sample mean is Normal
Central Limit Theorem If a sufficiently large random sample (i.e. n > 30) is drawn from a population with mean, , and variance, 2, the distribution of the sample mean will have the following characteristics: 1. an approximately normal distribution regardless of the distribution of the underlying population
Example 7 Suppose the random variable X has a mean of 50 and a standard deviation of 10. Calculate the mean and the standard deviation of the sample mean (standard error) for each of following sample sizes: (Assume the population is infinite.) a. n=40 b. n=55 c. n=100 d. What are the sizes of the standard deviation of the sample mean (Standard errors) as the sample size increases?
Example 7 - Solution We are given that X has = 50 and = 10 and the population is infinite. SE= / n a. b.
Example 7 - Solution c. d. It decreases–reflecting the additional information provided by a larger sample size. Summary n = 40= n = 50= n = 100= 1
Importance of the Central Limit Theorem The most important feature of this theorem is that it can be applied to any population. Because the theorem does not have any distribution assumptions, it is widely applicable and is one of the cornerstones of statistical inference.
Central Limit Theorem and Sample Size The only restrictive feature of the theorem is that the sample size must be sufficiently large for the theorem to be applicable. Even if the distribution of the population deviates substantially from the normal distribution, a sample size of 30 will usually be sufficiently large to produce a sampling distribution for that is approximately normal.
Distribution Shapes Population Distribution Distribution of the Sample Mean for Large Samples Bimodal Population Exponential Population
Distribution Shapes Population Distribution Distribution of the Sample Mean for Large Samples Normal Population Uniform Population
Example 8 Suppose a sample of size 40 is drawn from a population that has a mean of 276 and a variance of 81. What is the probability that the mean of the sample will be less than 273?
Example 8 - Solution We are given that a sample of size n = 40 is drawn from a population that has = 276 and . By the CLT, has a normal distribution with
Example 8 - Solution
Example 8 - Solution P( < 273) = P( < ) = P(z < -2.11) =.5 - P(-2.11 < z < 0) = =.0174 z
Example 9 Suppose there is a normally distributed population with a mean of 100 and a standard deviation of 10. If is the average of a sample of 50, find the following probabilities. a. b. c.
Example 9 - Solution We are given that X has a normal distribution with = 100 and = 10 and n = 50. By the CLT, has a normal distribution with
Example 9 - Solution P ( 103) = P( ) = P(z 2.12) =.5 + P(0 < z < 2.12) =.9830 P ( 96) = P( ) = P(z -2.83) =.5 + P(-2.83 < z < 0) =.9977 a. b.
Example 9 - Solution P (95 103) = P( ) = P(-3.54 z 2.12) = P(-3.54 < z < 0) + P(0 < z < 2.12) =.9830 c.
Example 10 A travel agency conducted a survey of the prices charged by ocean cruise ship lines and determined they were approximately normally distributed with a mean of $110 per day and a standard deviation of $20 per day.
Example 10 - Questions 1. If an ocean cruise ship line is chosen at random, find the probability that they will charge less than $99 per day? 2. What is the probability that the average charge for a randomly selected sample of 35 ocean cruise shop lines will be less than $99 per day?
Example 10 - Solution By the CLT, has a normal distribution with P( < 99) = P( ) = P(z -3.25) =.5 - P(-3.25 < z < 0) = =
The Distribution of the Sample Proportion
Proportions There are many instances in which the variable of interest is a proportion. Examples: –A marketing researcher may be interested in what proportion of persons on a mailing list will buy their product. –A college is concerned with the fraction of freshmen that will be in academic difficulty after the first year.
Population Proportions and Sample Proportions Population proportions must be estimated just like population means. The sample proportion is a reasonable estimate of the population proportion. Sample proportions vary depending on the selected samples.
Symbols The symbols used to represent the population and sample proportions are p - population proportion, - sample proportion.
How do you determine a sample proportion? When calculating a proportion, the number in the sample that possesses the characteristic of interest goes in the numerator, and the size of the sample is placed in the denominator. where x is the number in the sample possessing the characteristic of interest
What is the central value of ? The expected value (mean) of the sample proportion is the population proportion. E( ) = p Since the expected value of the estimator is equal to p, then is an unbiased estimator of p.
What is the variance of ? The variance of is given by If the population proportion is unknown (which is usually the case), p can be estimated by, and the variance of the sample proportion is estimated as
Is there a familiar pattern to the variability of ? The sampling distribution of approaches normality as n becomes sufficiently large. The sample size is generally considered “sufficiently large” if np 5 and n(1-p) 5. Sampling Distribution of p p
Sampling distribution of the Sample Proportion If the population is infinite and the sample is sufficiently large, the distribution of has the following characteristics: 1. an approximately normal distribution
Sampling Distribution of the Sample Proportion If the population is finite and the sample is sufficiently large, the distribution of has the following characteristics: 1. an approximately normal distribution where N is the size of the population.
Since is a good estimator of p... Can limits be established for the error in estimation? Since the sampling distribution of is known, determining probabilities for various errors of estimation can be determined.
Example 11 A random sample of 100 employees of a large steel company has 30 females and 70 males. 1. Find the sample proportion of female employees. 2. Find the sample proportion of male employees.
Example 11 - Solution 1. 2.
Example 12 Suppose that the true proportion of Americans over 25 years old that have a 4 year college degree is.35. Find the mean and the standard deviation of the sample proportion for samples of the following sizes. a. n = 38 b. n = 52 c. n = 75 d. What happens to the size of the standard deviation of the sample proportion as the sample size increases?
Example 12 - Solution a. b.
Example 12 - Solution It decreases–reflecting the additional information provided by the larger sample size. c. d.
Example 13 Suppose that the true population proportion, p =.30. What is the probability that the sample proportion of a sample of size 30 will be less than.20?
Example 13 - Solution has an approximately normal distribution because np = (30)(.3) = 9, and n(1 - p) = (30)(.7) = 21 are both greater than or equal to 5.
Example 13 ans Zstat= ( )/ = Rounded to Area 0 to 1.20 in Table A is Tail area = = this is the area in the left tail
Example 14 The property manager of a large office building would like to make the building smoke free; however, he does not want to upset too many of his customers. He decides to randomly select 50 of the workers in the building and ask them whether or not they smoke. If the sample proportion of workers who smoke is less than.30, the property manager will make the building smoke free.
Example Find the probability that the property manager will make the building smoke free when the true proportion of smokers is Find the probability that the property manager will not make the building smoke free when the true proportion of smokers is.20.
Example 14 - Solution Because np = (50)(.50) = 25 and n(1-p)=(50)(.50) = 25 are both greater than or equal to 5, we can assume that has an approximately normal distribution with 1.
Example 14 - Solution The property manager will make the building smoke free if is less than.30. P( <.30) = P( < ) = P(z < -2.83) =.5 - P(-2.83 < z < 0) = =
Example 14 - Solution Because np = (50)(.20) = 10 and n(1-p)=(50)(.80) = 40 are both greater than or equal to 5, we can assume that has an approximately normal distribution with 2.
Example 14 - Solution The property manager will not make the building smoke free if is greater than.30. P( >.30) = P( > ) = P(z > 1.77) =.5 - P(0 < z < 1.77) = =
Other Forms of Sampling
Probability Samples Probability samples enable an analyst to determine the probable errors that an estimator might generate. They allow the analyst a known degree of confidence in their estimation. All statistical inference relies on probability sampling.
Types of Probability Samples Cluster sampling involves dividing the population into clusters, and randomly selecting a sample of clusters to represent the population. In stratified sampling, the population is divided into strata, which are sub-populations. A strata can be any identifiable characteristic that can be used to classify the population. If the population consisted of people, then strata could be sex, income, political party, religion, education, race, and location.
Pros and Cons of Cluster Sampling Cluster sampling can be as effective as simple random sampling if the clusters are as heterogeneous as the population; however, clusters are almost never as diverse as the population. Smaller cluster sizes will result in more representative samples. Cluster sampling simplifies the task of constructing the sampling frame, since the initial frame is composed only of clusters.
Stratified Sampling Stratified sampling can provide greater accuracy if the population is heterogeneous, and sub-populations of the population can be identified that are relatively homogeneous.
Non-probability Samples Non-probability samples are a convenient means of obtaining sample data. If data from a non-probability sample is used to estimate a population parameter, there is no statistical theory that helps define the potential error of the estimate and hence no statement about an estimate’s reliability can ba made.
Types of Non-probability Samples A judgment sample is a sample in which sample values are selected by an expert in the field. A convenience sample is a convenient group of observations. One of the worst forms of non- probability samples is the voluntary or self-selected sample.
Almost Random Samples The systematic sample, does not clearly belong to probability or non-probability samples. In a systematic sample, every k th member of the population is included in the sample. Note: If there is some pattern in the sampling frame that corresponds to the sampling pattern, an unrepresentative sample may result.
Example 15 (a - c) A social researcher in Florida wants to determine the average number of children per family in the state. a. What is the population of interest? b. What variable will be measured? c. What level of measurement is the variable of interest?
Example 15 (a - c) Solution a. Population - families in the state of Florida b. Variable measured - number of children per family c. Level of measurement - ratio
Example 15 (d) d. What are the steps that would be necessary for each of the following sampling methods: 1. Simple random sampling 2. Cluster sampling 3. Stratified sampling
Example 15 (d) Solution 1. Simple Random Sample - –List all families in the state of Florida (perhaps from a census, phone books, tax returns etc. –Assign sequential numbers to all of the families (1 to N). –Select n random numbers between 1 and N from a random number table (or generate these). –Select the families corresponding to the random numbers.
Example 15 (d) Solution 2. Cluster Sampling - –e.g. Take a map and divide the state of Florida into 1000 regions. –Number the regions from 1 to –Select n random numbers between 1 and –Select the n regions corresponding to the random numbers. –Survey every family in the region indicated by the random numbers.
Example 15 (d) Solution 3. Stratified Sampling - –e.g. Separate all families in the state by income level. –Number each family within the income level. –Select e.g. 100 random numbers for each income level. –Select the 100 families for each income level indicated by the random numbers.
Example 15 (e) What sampling method do you believe would be most cost effective?
Example 15 (e) Solution The most cost effective method would be cluster sampling.
Example 16 A biology professor is interested in the proportion of students at his college who are pre-med. majors. In his next class he asks the students who are pre-med. majors to raise their hands. Fifty percent of the students raise their hands.
Example What type of sampling technique was used for this survey? 2. What type of biases may be present in the responses? 3. Is 50% a reasonable point estimate of the proportion of students at the college who are pre-med. majors? Explain.
Example 16 - Solution 1. Convenience 2. If the Biology course is a required course for all majors, then there may be a larger proportion of freshmen and sophomores in the class than in the college population as a whole.
Example 16 - Solution 2. If the Biology course is not a required course for all majors, then there may be a larger proportion of students in the class who are in majors which require the course, than in the college population as a whole. 3. No. For the reasons cited in part 2.