Download presentation

Presentation is loading. Please wait.

1
**Sampling distributions**

Lecture 10 Sampling distributions Dr. Richard Buรmann

2
**Distribution of sample proportions**

QTM1310/ Sharpe Distribution of sample proportions To learn more about the variability within a sample / population, we have to be somewhat creative and imagine. We probably will never know the value of the true proportion of an event in the population. However, the true proportion is important to us, so weโll give it a label, ๐ for โtrue proportion.โ

3
**How to imagine (in statistics)โฆ**

Most statisticians would be delusional to suppose they could ever truly know about true characteristics of a population. You are in a similar situation. Unless you can perform a complete census of all available data, you still need statistics to help you create a picture that is reasonable close to reality. Recall our ๆฏๆณฝไธ from class 1. Unless you have all necessary data, you wonโt be able to paint a โtrueโ picture of ๆฏไธปๅธญ, but with some data you can probably come up with the blue ๆฏๆณฝไธ.

4
**How to imagine (in statistics)โฆ**

Once you have the blue ๆฏๆณฝไธ, you can add some assumptions to your data. Most peopleโs skin colour is not blue. Most Chinese skin colour is fairly white. You then state these findings and manipulate the data to give ๆฏไธปๅธญ a more reasonable skin colour. Note, that this makes you the JUDGE of what is reasonable or not. This is one great power that statisticians wield!

5
**How to imagine (in statistics)โฆ**

Suppose you unearth a couple of bones of a dinosaur. Quite frankly, you are given a sample of information, while some information is missing from the sample. How to โimagineโ these results? You can simulate them? I.e. once you know the thickness of the dinosaurs legs, you can calculate its weight, from which in turn you can make assumptions with respect to the size of other bones, etc. Once you have (imagined) sufficient data, you can use it to come up with a model and learn more.

6
QTM1310/ Sharpe simulation A simulation is to use a computer to pretend to draw random samples from some (sample) population of values over and over. A simulation can help us understand how sample proportions vary due to random sampling.

7
QTM1310/ Sharpe simulation When we have only two possible outcomes for an event, label one of them โsuccessโ and the other โfailure.โ In a simulation, we set the true proportion of successes to a known value, draw random samples, and then record the sample proportion of successes, which we denote by ๐ , for each sample. Even though ๐ varies / changes from sample to sample, ๐ does so in a way that we can model and understand.

8
**simulation Below is the distribution of 2000 sample values of ๐ .**

QTM1310/ Sharpe simulation Below is the distribution of 2000 sample values of ๐ . The ๐ come from simulated samples (of size 1000) drawn from a population in which the true population proportion, ๐, is 0.21. Variability from one sample to another is expected, but the way in which the proportions vary shows us how the proportions of real samples would vary.

9
**Proportionโs Sampling distributions**

QTM1310/ Sharpe Proportionโs Sampling distributions The distribution of proportions over many independent samples from the same population is called the sampling distribution of the proportions. For distributions that are bell-shaped and centred at the true proportion, ๐, we can use the sample size n to find the standard deviation of the sampling distribution: ๐๐ท ๐ = ๐(1โ๐) ๐ = ๐๐ ๐

10
**Proportionโs Sampling distributions**

QTM1310/ Sharpe Proportionโs Sampling distributions Remember that the difference between sample proportions, referred to as sampling error is not really an error. The sample error is just the variability youโd expect to see from one sample to another. A better term might be sampling variability. Hence, to discover how variable a sample proportion is, we need to know the proportion and the size of the sample (, only).

11
**Proportionโs Sampling distributions**

QTM1310/ Sharpe Proportionโs Sampling distributions The particular Normal model, ๐(๐ , ๐๐ ๐ 2 ), is a sampling distribution model for the sample proportion. It wonโt work for all situations, but it works for most situations that youโll encounter in practice.

12
**๐(๐, ๐๐ ๐ 2 ) Letโs consider ๐(๐, ๐๐ ๐ 2 ) ๐๐ท ๐ = ๐(1โ๐) ๐ = ๐๐ ๐**

๐(๐, ๐๐ ๐ 2 ) Letโs consider ๐(๐, ๐๐ ๐ 2 ) ๐๐ท ๐ = ๐(1โ๐) ๐ = ๐๐ ๐ ๐ is the denominator of the ๐๐ท ๐ , hence the larger ๐, the better / smaller ๐๐ท ๐ . The smaller ๐๐ท ๐ , the better / more accurate are the conclusions we can draw from ๐(๐, ๐๐ ๐ 2 ).

13
**Proportionโs Sampling distributions**

QTM1310/ Sharpe Proportionโs Sampling distributions In the above equation, n is the sample size and q is the proportion of failures (q = 1 โ p). (We use ๐ for its observed value in a sample.) Note that ๐ comes at a price. The bigger your sample size, the more costs are involved in its collection. Nonetheless, you need a sample size that is big enough, so as to make valid inferences from it.

14
**Proportionโs Sampling distributions**

QTM1310/ Sharpe Proportionโs Sampling distributions The sampling distribution model for ๐ is valuable becauseโฆ we donโt need to actually draw many samples and accumulate all those sample proportions, or even to simulate them and becauseโฆ we can calculate what fraction of the distribution will be found in any region of the distribution

15
**Quality of the normal model**

QTM1310/ Sharpe Quality of the normal model Samples of size 1 or 2 just arenโt going to work very well, but the distributions of proportions of many larger samples have histograms that are remarkably close to a Normal model. The model becomes a better and better representation of the distribution of the sample proportions, as the sample size gets bigger. Given ๐(๐, ๐๐ ๐ 2 ) This shouldnโt be surprising any more.

16
**assumptions Independence Assumption**

QTM1310/ Sharpe assumptions Independence Assumption The sampled values must be independent of each other. Sample Size Assumption The sample size, n, must be large enough.

17
**conditions Randomization Condition**

QTM1310/ Sharpe conditions Randomization Condition If you have a survey, your sample should be a simple random sample of the population. If some other sampling design was used, be sure the sampling method was not biased and that the data are representative of the population. I.e. if your data come from a (medical) experiment, subjects should have been randomly assigned to treatments. I.e. if you want to survey Chinese peopleโs preference for meat, donโt just survey people in ๆฐ็็.

18
**conditions 10% Condition**

QTM1310/ Sharpe conditions 10% Condition If sampling has not been made with replacement, then the sample size, ๐, must be no larger than 10% of the population. Success/Failure Condition The sample size must be big enough so that both the number of โsuccesses,โ ๐โ๐, and the number of โfailures,โ ๐โ๐, are expected to be at least 10. So what if ๐โ๐=8? Thatโs your decision. How about you state that 8 is the new 10? 8=10 is open to interpretation. Make sure though that your data is still sufficiently representative of the population.

19
QTM1310/ Sharpe example Information on a box of seeds states that the germination (ๅ่ฝ) rate is 92%. Are conditions met to answer the question, โWhat is the probability that more than 95% of the 160 seeds in the box will germinate?โ Independence: It is reasonable to assume the seeds will germinate independently from each other. Randomization: The sample of seeds can be considered a random sample from all seeds from this producer.

20
**example 10% Condition: Success/Failure Condition:**

The packet is less than 10% of all seeds manufactured. Success/Failure Condition: ๐๐ = (0.92ร160) = > 10 ๐๐ = (0.05ร160) = > 10

21
QTM1310/ Sharpe example Information on a packet of seeds claims that the germination rate is 92%. What is the probability that more than 95% of the 160 seeds in the packet will germinate? ๐ 0.92, โ =๐(0.92, ) ๐ง= ๐ โ๐ ๐๐ท( ๐) = 0.95โ =1.428 Look up the z-value.

22
**Simulating the sampling distribution of a mean**

QTM1310/ Sharpe Simulating the sampling distribution of a mean Here are the results of a simulated 10,000 tosses of one fair die: This is is a uniform distribution.

23
**Simulating the sampling distribution of a mean**

QTM1310/ Sharpe Simulating the sampling distribution of a mean Here are the results of a simulated 10,000 tosses of two fair dice, averaging the numbers: This is a triangular distribution.

24
**Simulating the sampling distribution of a mean**

QTM1310/ Sharpe Simulating the sampling distribution of a mean Hereโs a histogram of the averages for 10,000 tosses of five dice: As the sample size (number of dice) gets larger, each sample average tends to become closer to the population mean. The shape of the distribution is becoming bell-shaped. In fact, itโs approaching the Normal model.

25
**The central limit theorem**

QTM1310/ Sharpe The central limit theorem Central Limit Theorem (CLT): The sampling distribution of any mean becomes Normal as the sample size grows. This is true regardless of the shape of the population distribution! However, if the population distribution is very skewed, it may take a sample size of dozens or even hundreds of observations for the Normal model to work well.

26
**The central limit theorem**

QTM1310/ Sharpe The central limit theorem Now we have two distributions to deal with: the real- world distribution of the sample, and the math-world sampling distribution of the statistic. Donโt confuse the two. The Central Limit Theorem doesnโt talk about the distribution of the data from the sample. It talks about the sample means and sample proportions of many different random samples drawn from the same population: The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation.

27
**Sampling distribution of the mean**

QTM1310/ Sharpe Sampling distribution of the mean Which would be more surprising, having one person in your Statistics class who is over 2 meters tall or having the mean of 100 students taking the course be over 2 meters? The first event is fairly rare, but finding a class of 100 whose mean height is over 6โฒ9โณ tall will only happen in a basketball academy (where students study statistics). Means have smaller standard deviations than individuals.

28
**Sampling distribution of the mean**

QTM1310/ Sharpe Sampling distribution of the mean The Normal model for the sampling distribution of the mean has a standard deviation equal to ๐๐ท ๐ฆ = ๐ โ๐ Where ๐ is the standard deviation of the population. To emphasize that this is a standard deviation parameter of the sampling distribution model for the sample mean, ๐ฆ , we write SD( ๐ฆ )) or ๐( ๐ฆ )

29
**Sampling distribution of the mean**

QTM1310/ Sharpe Sampling distribution of the mean When a random sample is drawn from any population with mean=๐ and SD=๐, its sample mean, ๐ฆ , has a sampling distribution with the same mean ๐, but whose SD is ๐ โ๐ . We write this as ๐ ๐ฆ =๐๐ท ๐ฆ = ๐ โ๐ . No matter what population the random sample comes from, the shape of the sampling distribution is approximately Normal as long as the sample size is large enough. The larger the sample size used, the more closely is the sampling distribution model for the mean approximated by the Normal distribution.

30
**sampling distribution models**

QTM1310/ Sharpe sampling distribution models We now have two closely related sampling distribution models. Which one we use depends on which kind of data we have. For categorical data, we calculate a sample proportion, ๐ . Its sampling distribution ~๐ with a mean at the population proportion, ๐, and a standard deviation ๐๐ท ๐ = ๐(1โ๐) ๐ ๐๐ ๐

31
**sampling distribution models**

When we have quantitative data, we calculate a sample mean, ๐ฆ . Its sampling distribution ~๐ with a mean at the population mean, ๐, and a standard deviation ๐๐ท ๐ฆ = ๐ โ๐

32
**assumptions Independence Assumption**

QTM1310/ Sharpe assumptions Independence Assumption The sampled values must be independent of each other. Sample Size Assumption The sample size, n, must be large enough.

33
**conditions 10% Condition**

QTM1310/ Sharpe conditions 10% Condition If sampling has not been made with replacement, then the sample size, ๐, must be no larger than 10% of the population. Randomization Condition Data must be sampled randomly or sampling distributions make no sense.

34
QTM1310/ Sharpe example According to recent studies, cholesterol levels in healthy U.S. adults average about 215 mg/dL with a standard deviation of about 30 mg/dL and are roughly symmetric and unimodal. If the cholesterol levels of a random sample of 42 healthy U.S. adults is taken, are conditions met to use the normal model? Randomization: 10% Condition: Large Enough Sample Condition:

35
QTM1310/ Sharpe example Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? Randomization: The sample is random 10% Condition: These 42 healthy U.S. adults are less than 10% of the population of healthy U.S. adults. Large Enough Sample Condition: Cholesterol levels are roughly symmetric and unimodal so a sample size of 42 is sufficient. (Had the distribution been skewed, a larger sample size might have been needed).

36
example Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? What would the mean of the sampling distribution be? What would the standard deviation of the sampling distribution be?

37
example Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? What would the mean of the sampling distribution be? ๐ ๐ฆ =๐=215 What would the standard deviation of the sampling distribution be? ๐๐ธ ๐ฆ = ๐ โ๐ = 30 โ42 =4.629

38
example Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? What is the probability that the average cholesterol level will be greater than 220? ๐ง= ๐ฆ โ๐ ๐๐ท( ๐ฆ ) = 220โ =1.08 Find the probability using the tables!

39
**Diminishing returns w.r.t sample size**

QTM1310/ Sharpe Diminishing returns w.r.t sample size The standard deviation of the sampling distribution declines only with the square root of the sample size. ๐๐ธ ๐ฆ = ๐ โ๐ The square root limits how much we can make a sample tell about the population. This is an example of the Law of Diminishing Returns. Apart from this course, this is a nice concept to know something aboutโฆ

40
QTM1310/ Sharpe Example The mean weight of boxes shipped by a company is 12 lbs, with a standard deviation of 4 lbs. Boxes are shipped in palettes of 10 boxes. The shipper has a limit of 150 lbs for such shipments. Whatโs the probability that a palette will exceed that limit? Asking the probability that the total weight of a sample of 10 boxes exceeds 150 lbs is the same as asking the probability that the mean weight exceeds 15 lbs.

41
**Example First weโll check the conditions. We will assume that**

QTM1310/ Sharpe Example First weโll check the conditions. We will assume that the 10 boxes on the palette are a random sample from the population of boxes and that their weights are mutually independent. Also, 10 boxes are surely less than 10% of the population of boxes shipped by the company.

42
QTM1310/ Sharpe example Under these conditions, the CLT says that the sampling distribution of ๐ฆ follows a Normal distribution with mean 12 and standard deviation ๐๐ท ๐ฆ = ๐ โ๐ = 4 โ10 =1.26 ๐ง= ๐ฆ โ๐ ๐๐ท( ๐ฆ ) = 15โ =2.38 ๐ ๐ฆ >150 =๐ ๐ง>2.38 =0.0087 So the chance that the shipper will reject a palette is only โless than 1%.

43
**Sampling distribution models**

QTM1310/ Sharpe Sampling distribution models Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error (SE). For a sample proportion, ๐ , the standard error is: ๐๐ธ ๐ = ๐ ๐ ๐ For the sample mean, ๐ฆ , the standard error is: ๐๐ธ ๐ฆ = ๐ โ๐

44
**Random quantities and their facts**

QTM1310/ Sharpe Random quantities and their facts The proportion and the mean are random quantities. We canโt know what our statistic will be, because it comes from a random sample. The two basic truths about sampling distributions are: Sampling distributions arise because samples vary. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

45
QTM1310/ Sharpe A Graphic summary To keep track of how the concepts weโve seen, we can draw a diagram relating them. We start with a population model, and label the mean of this model ๐ and its standard deviation, ๐. We draw one real sample ๐ ๐ of size n and show its histogram and summary statistics. We imagine many other samples (dotted lines).

46
**A graphic summary We imagine gathering all the ๐๐๐๐๐ into a histogram.**

QTM1310/ Sharpe A graphic summary We imagine gathering all the ๐๐๐๐๐ into a histogram. The CLT tells us we can model the shape of this histogram with a Normal model. The mean of this Normal is ๐, and the standard deviation is ๐๐ท ๐ฆ = ๐ โ๐ โ ๐ 1 โ๐

47
QTM1310/ Sharpe A graphic summary When we donโt know ๐, we estimate it with the standard deviation of the one real sample, ๐ 1 . That gives us the standard error. ๐๐ท ๐ฆ = ๐ โ๐ โ ๐ 1 โ๐

48
QTM1310/ Sharpe summary Donโt confuse the sampling distribution with the distribution of the sample. Beware of observations that are not independent. Watch out for small samples from skewed populations.

49
QTM1310/ Sharpe summary Model the variation in statistics from sample to sample with a sampling distribution. The Central Limit Theorem tells us that the sampling distribution of both the sample proportion and the sample mean are Normal. Understand that, usually, the mean of a sampling distribution is the value of the parameter estimated. For the sampling distribution of ๐ , the mean is ๐. For the sampling distribution of ๐ฆ , the mean is ฮผ.

50
**summary Interpret the standard deviation of a sampling distribution.**

QTM1310/ Sharpe summary Interpret the standard deviation of a sampling distribution. The standard deviation of a sampling model is the most important information about it. The standard deviation of the sampling distribution of a proportion is ๐๐ ๐ ๐ค๐๐กโ ๐=1โ๐ The standard deviation of the sampling distribution of a mean is ๐ โ๐ ๐ค๐๐กโ ๐=๐๐๐๐ข๐๐๐ก๐๐๐ ๐๐ท

51
**summary Understand that the Central Limit Theorem is a limit theorem.**

QTM1310/ Sharpe summary Understand that the Central Limit Theorem is a limit theorem. The sampling distribution of the mean is Normal, no matter what the underlying distribution of the data is. The CLT says that this happens in the limit, as the sample size grows (and grows, all the way to โ, if necessary). The Normal model applies sooner when sampling from a uni-modal, symmetric population and more gradually when the population is very non-Normal.

Similar presentations

OK

Copyright ยฉ 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

Copyright ยฉ 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on mobile computing pdf Ppt on dc motor working principle Ppt on child labour in world Colon anatomy and physiology ppt on cells Ppt on solar energy in hindi Ppt on global warming and greenhouse effect Download ppt on reality shows in india Ppt on standing order medication Ppt on cross site scripting owasp Ppt on social networks