Presentation is loading. Please wait.

Presentation is loading. Please wait.

The eternal tension in statistics.... Between what you really really want (the population) but can never get to...

Similar presentations


Presentation on theme: "The eternal tension in statistics.... Between what you really really want (the population) but can never get to..."— Presentation transcript:

1 The eternal tension in statistics...

2 Between what you really really want (the population) but can never get to...

3 So you have to make do (with the sample) you can estimate the population, make educated guesses,

4 but bottomline is “you can never have the population”

5 the population An investigator usually wants to generalize about a class of individuals/things (the population) For example: in forecasting the results of elections, population = voters for the Furniture.com class group: Population = all potential users

6 Parameters: Usually there are some numerical facts about the population which you want to estimate Descriptive StatisticsStatistic: You can do that by measuring the same aspect in the sample (Descriptive Statistics) (Inferential Statistics)Depending on the accuracy of measurement, and representativeness of your sample, you can make inferences about the population (Inferential Statistics)

7 One person’s sample is another person’s populationOne person’s sample is another person’s population –IS 271 students are a sample for the larger student population of UC Berkeley –IS271 students could be population for some other study

8 The 1936 election: the literary digest poll CandidatesCandidates: Democrat FD Roosevelt and Republican Alfred Landon The Literary DigestThe Literary Digest: had called the winner in every election since 1916 Its predictionIts prediction: Roosevelt will get 43% polled 2.4 million people!

9 The election results The election result 62 The Digest prediction 43 Gallup’s prediction 44 of Digest Prediction Gallups’s prediction 56 of election result

10 Why the Digest went wrong: How they picked their sample Selection BiasSelection Bias: A systematic tendency on the part of the sampling procedure to exclude one kind of person or another from sample Sample SizeSample Size: When a selection procedure is biased, making the sample larger does not help: repeats the mistake on a larger level

11 How they picked their sample Non Response Bias: Non respondents differ from respondents –they did not respond as compared to respondents who did! –Lower income and upper income people tend not to respond, so middle class over represented. –Non Response Bias: One can give more weightage to people who were available but hard to get.

12 For Example: Predicting Elections –Non Voters –Non Voters: Gallup uses a few questions to predict if people will vote at all. Election forecast based only on those likely to vote. –Undecided –Undecided: Asks people who they are leaning towards as of today. –Non Response Bias –Non Response Bias: One can give more weightage to people who were available but hard to get. –Ratio Estimation –Ratio Estimation: Look at sample obtained, and compares it to population. If there are too many educated people weigh them lesser. –Interviewer Bias –Interviewer Bias: Build redundancy into questionnaire to check for consistency. Also reinterview a small sample to check for consistency.

13 Distribution of brown M&M’s Yellow 20% Orange 10% Blue 10% Green 10% Red 20% Brown 30%

14 The distribution of the population

15 Sample 1

16 Sample 2

17 Sample 3

18 Population 5 SamplesSample3 Sample2 Sample 1

19 How much is each sample going to deviate from the population? How much is each sample going to deviate from the population? (how big is the chance error for each sample likely to be?) Computation of Standard Error  number of samples x SD of sample 9, 7, 6, 9, 11, 12 Mean = 9 Standard Deviation = 2.2 Standard Error = 4.4

20 Why is knowing the chance error important? Allows us to estimate the accuracy of our estimates and is we are justified in using inferential statistics. Allows us to make inferences about the population

21 If there is a lot of spread in the samples, the SD is big and it will be hard to predict how accurate the sample will be. So the standard error will be big as well. Standard Deviation (SD) and Standard Error (SE): SD refers to a list of number. How far are most numbers from the mean? SE refers to the variability in samples. How variable is each sample going to be.

22 Should the sample for Texas be larger than that for Rhode Island?

23 Surprisingly: No Analogy: If you took a drop of liquid for analysis. If the liquid is well mixed, then it would not matter if the liquid was from a small or a large bottle, whether the sample is 1% or.1% of the population.. The statistical rationale: The accuracy of sampling is related to the standard deviation of the sample. Example: Election of 1992, % voters who chose Clinton 46% of voters in New Mexico, SD =.50 37% of voters in Texas =.48 Therefor accuracy of sample in Texas and New Mexico will be similar

24 Types of Samples The convenient sample:The convenient sample: More convenient elementary units are chosen from a population. The judgement sample:The judgement sample: Units are chosen according to judgement made by someone who is familiar with the relevant characteristics of the population. The random sampleThe random sample: Units are chosen randomly with a known probability.

25 Quota Sampling:Quota Sampling: Each interviewer is assigned a fixed quota of subjects fitting certain demographic characteristics. Within the quota is a judgement sample. –Problems: quotas might not be representative, and judgement sampling is bad.

26 Types of Random Sample Simple Random Sample:Simple Random Sample: Every unit of the population has an equal chance of being chosen. A systematic random sample:A systematic random sample: One unit is chosen on a random basis, additional elementary units are taken from evenly spaced intervals until the desired number of units is obtained.

27 The stratified random sampleThe stratified random sample: Obtained by independently selecting a separate simple random sample from each population stratum. A population can be divided into different groups:based on some characteristic or variable like income of education. The cluster sample:The cluster sample: Obtained by selecting clusters from the population on the basis of simple random sampling. The sample comprises a census of each random cluster selected. For example, a cluster may be some thing like a village or a school, a state.


Download ppt "The eternal tension in statistics.... Between what you really really want (the population) but can never get to..."

Similar presentations


Ads by Google