Download presentation

Presentation is loading. Please wait.

Published byDonna Oddy Modified about 1 year ago

1
LECTURE 10 Sampling distributions Dr. Richard Bußmann 1

2
DISTRIBUTION OF SAMPLE PROPORTIONS 2

3
HOW TO IMAGINE (IN STATISTICS)… Most statisticians would be delusional to suppose they could ever truly know about true characteristics of a population. You are in a similar situation. Unless you can perform a complete census of all available data, you still need statistics to help you create a picture that is reasonable close to reality. Recall our 毛泽东 from class 1. Unless you have all necessary data, you won’t be able to paint a “true” picture of 毛主席, but with some data you can probably come up with the blue 毛泽东. 3

4
HOW TO IMAGINE (IN STATISTICS)… Once you have the blue 毛泽东, you can add some assumptions to your data. 1.Most people’s skin colour is not blue. 2.Most Chinese skin colour is fairly white. You then state these findings and manipulate the data to give 毛主席 a more reasonable skin colour. Note, that this makes you the JUDGE of what is reasonable or not. This is one great power that statisticians wield! 4

5
HOW TO IMAGINE (IN STATISTICS)… Suppose you unearth a couple of bones of a dinosaur. Quite frankly, you are given a sample of information, while some information is missing from the sample. How to “imagine” these results? You can simulate them? I.e. once you know the thickness of the dinosaurs legs, you can calculate its weight, from which in turn you can make assumptions with respect to the size of other bones, etc. Once you have (imagined) sufficient data, you can use it to come up with a model and learn more. 5

6
SIMULATION A simulation is to use a computer to pretend to draw random samples from some (sample) population of values over and over. A simulation can help us understand how sample proportions vary due to random sampling. 6

7
SIMULATION 7

8
Variability from one sample to another is expected, but the way in which the proportions vary shows us how the proportions of real samples would vary. 8

9
PROPORTION’S SAMPLING DISTRIBUTIONS 9

10
Remember that the difference between sample proportions, referred to as sampling error is not really an error. The sample error is just the variability you’d expect to see from one sample to another. A better term might be sampling variability. Hence, to discover how variable a sample proportion is, we need to know the proportion and the size of the sample (, only). 10

11
PROPORTION’S SAMPLING DISTRIBUTIONS 11

12
12

13
PROPORTION’S SAMPLING DISTRIBUTIONS 13

14
PROPORTION’S SAMPLING DISTRIBUTIONS 14

15
QUALITY OF THE NORMAL MODEL 15

16
ASSUMPTIONS Independence Assumption The sampled values must be independent of each other. Sample Size Assumption The sample size, n, must be large enough. 16

17
CONDITIONS Randomization Condition If you have a survey, your sample should be a simple random sample of the population. If some other sampling design was used, be sure the sampling method was not biased and that the data are representative of the population. I.e. if your data come from a (medical) experiment, subjects should have been randomly assigned to treatments. I.e. if you want to survey Chinese people’s preference for meat, don’t just survey people in 新疆省. 17

18
CONDITIONS 18

19
EXAMPLE Information on a box of seeds states that the germination ( 发芽 ) rate is 92%. Are conditions met to answer the question, “What is the probability that more than 95% of the 160 seeds in the box will germinate?” Independence: It is reasonable to assume the seeds will germinate independently from each other. Randomization: The sample of seeds can be considered a random sample from all seeds from this producer. 19

20
EXAMPLE 20

21
EXAMPLE 21

22
Here are the results of a simulated 10,000 tosses of one fair die: This is is a uniform distribution. SIMULATING THE SAMPLING DISTRIBUTION OF A MEAN 22

23
Here are the results of a simulated 10,000 tosses of two fair dice, averaging the numbers: This is a triangular distribution. SIMULATING THE SAMPLING DISTRIBUTION OF A MEAN 23

24
Here’s a histogram of the averages for 10,000 tosses of five dice: As the sample size (number of dice) gets larger, each sample average tends to become closer to the population mean. The shape of the distribution is becoming bell-shaped. In fact, it’s approaching the Normal model. SIMULATING THE SAMPLING DISTRIBUTION OF A MEAN 24

25
THE CENTRAL LIMIT THEOREM Central Limit Theorem (CLT): The sampling distribution of any mean becomes Normal as the sample size grows. This is true regardless of the shape of the population distribution! However, if the population distribution is very skewed, it may take a sample size of dozens or even hundreds of observations for the Normal model to work well. 25

26
THE CENTRAL LIMIT THEOREM Now we have two distributions to deal with: the real- world distribution of the sample, and the math-world sampling distribution of the statistic. Don’t confuse the two. The Central Limit Theorem doesn’t talk about the distribution of the data from the sample. It talks about the sample means and sample proportions of many different random samples drawn from the same population: The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation. 26

27
SAMPLING DISTRIBUTION OF THE MEAN Which would be more surprising, having one person in your Statistics class who is over 2 meters tall or having the mean of 100 students taking the course be over 2 meters? The first event is fairly rare, but finding a class of 100 whose mean height is over 6′9″ tall will only happen in a basketball academy (where students study statistics). Means have smaller standard deviations than individuals. 27

28
SAMPLING DISTRIBUTION OF THE MEAN 28

29
SAMPLING DISTRIBUTION OF THE MEAN 29

30
SAMPLING DISTRIBUTION MODELS 30

31
SAMPLING DISTRIBUTION MODELS 31

32
ASSUMPTIONS Independence Assumption The sampled values must be independent of each other. Sample Size Assumption The sample size, n, must be large enough. 32

33
CONDITIONS 33

34
EXAMPLE According to recent studies, cholesterol levels in healthy U.S. adults average about 215 mg/dL with a standard deviation of about 30 mg/dL and are roughly symmetric and unimodal. If the cholesterol levels of a random sample of 42 healthy U.S. adults is taken, are conditions met to use the normal model? Randomization: 10% Condition: Large Enough Sample Condition: 34

35
EXAMPLE Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? Randomization: The sample is random 10% Condition: These 42 healthy U.S. adults are less than 10% of the population of healthy U.S. adults. Large Enough Sample Condition: Cholesterol levels are roughly symmetric and unimodal so a sample size of 42 is sufficient. (Had the distribution been skewed, a larger sample size might have been needed). 35

36
EXAMPLE Cholesterol levels in adults average about 215 mg/dL, with SD of 30 mg/dL. Levels are symmetric and unimodal. A random sample of 42 adults is taken. Are conditions met to use the normal model? What would the mean of the sampling distribution be? What would the standard deviation of the sampling distribution be? 36

37
EXAMPLE 37

38
EXAMPLE 38

39
DIMINISHING RETURNS W.R.T SAMPLE SIZE 39

40
EXAMPLE The mean weight of boxes shipped by a company is 12 lbs, with a standard deviation of 4 lbs. Boxes are shipped in palettes of 10 boxes. The shipper has a limit of 150 lbs for such shipments. What’s the probability that a palette will exceed that limit? Asking the probability that the total weight of a sample of 10 boxes exceeds 150 lbs is the same as asking the probability that the mean weight exceeds 15 lbs. 40

41
EXAMPLE First we’ll check the conditions. We will assume that 1.the 10 boxes on the palette are a random sample from the population of boxes and 2.that their weights are mutually independent. Also, 10 boxes are surely less than 10% of the population of boxes shipped by the company. 41

42
EXAMPLE 42

43
SAMPLING DISTRIBUTION MODELS 43

44
RANDOM QUANTITIES AND THEIR FACTS The proportion and the mean are random quantities. We can’t know what our statistic will be, because it comes from a random sample. The two basic truths about sampling distributions are: 1.Sampling distributions arise because samples vary. 2.Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions. 44

45
A GRAPHIC SUMMARY 45

46
A GRAPHIC SUMMARY 46

47
A GRAPHIC SUMMARY 47

48
SUMMARY Don’t confuse the sampling distribution with the distribution of the sample. Beware of observations that are not independent. Watch out for small samples from skewed populations. 48

49
SUMMARY 49

50
SUMMARY 50

51
SUMMARY 51

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google