Professor B. Jones University of California, Davis The Scientific Study of Politics (POL 51)

Slides:



Advertisements
Similar presentations
VI. Sampling: (Nov. 2, 4) Frankfort-Nachmias & Nachmias (Chapter 8 – Sampling and Sample Designs) King, Keohane and Verba (Chapter 4) Barbara Geddes
Advertisements

Sampling A population is the total collection of units or elements you want to analyze. Whether the units you are talking about are residents of Nebraska,
Sampling.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Statistics for Managers Using Microsoft® Excel 5th Edition
Sampling.
MKTG 3342 Fall 2008 Professor Edward Fox
Taejin Jung, Ph.D. Week 8: Sampling Messages and People
Sampling-big picture Want to estimate a characteristic of population (population parameter). Estimate a corresponding sample statistic Sample must be representative.
The Logic of Sampling. Political Polls and Survey Sampling In the 2000 Presidential election, pollsters came within a couple of percentage points of estimating.
Sampling Design.
Section 5.1. Observational Study vs. Experiment  In an observational study, we observe individuals and measure variables of interest but do not attempt.
SAMPLING Chapter 7. DESIGNING A SAMPLING STRATEGY The major interest in sampling has to do with the generalizability of a research study’s findings Sampling.
How could this have been avoided?. Today General sampling issues Quantitative sampling Random Non-random Qualitative sampling.
Sampling ADV 3500 Fall 2007 Chunsik Lee. A sample is some part of a larger body specifically selected to represent the whole. Sampling is the process.
Sampling Procedures and sample size determination.
CHAPTER 7, the logic of sampling
Sampling Moazzam Ali.
Key terms in Sampling Sample: A fraction or portion of the population of interest e.g. consumers, brands, companies, products, etc Population: All the.
Sampling: Design and Procedures
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
Chapter 1: Introduction to Statistics
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Qualitative and Quantitative Sampling
Sampling: Theory and Methods
1 Excursions in Modern Mathematics Sixth Edition Peter Tannenbaum.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
7-1 Chapter Seven SAMPLING DESIGN. 7-2 Selection of Elements Population Element the individual subject on which the measurement is taken; e.g., the population.
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
Sampling. Sampling Can’t talk to everybody Select some members of population of interest If sample is “representative” can generalize findings.
Chapter 11 – 1 Chapter 7: Sampling and Sampling Distributions Aims of Sampling Basic Principles of Probability Types of Random Samples Sampling Distributions.
Population and Sampling
CHAPTER 12 DETERMINING THE SAMPLE PLAN. Important Topics of This Chapter Differences between population and sample. Sampling frame and frame error. Developing.
1 Hair, Babin, Money & Samouel, Essentials of Business Research, Wiley, Learning Objectives: 1.Understand the key principles in sampling. 2.Appreciate.
Population and sample. Population: are complete sets of people or objects or events that posses some common characteristic of interest to the researcher.
Sampling Design.
The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
SAMPLING TECHNIQUES. Definitions Statistical inference: is a conclusion concerning a population of observations (or units) made on the bases of the results.
Chapter Ten Basic Sampling Issues Chapter Ten. Chapter Ten Objectives To understand the concept of sampling. To learn the steps in developing a sampling.
1. Population and Sampling  Probability Sampling  Non-probability Sampling 2.
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
Part III – Gathering Data
Chapter Eleven Sampling: Design and Procedures Copyright © 2010 Pearson Education, Inc
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Chapter Ten Copyright © 2006 John Wiley & Sons, Inc. Basic Sampling Issues.
LIS 570 Selecting a Sample.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
7: The Logic of Sampling. Introduction Nobody can observe everything Critical to decide what to observe Sampling –Process of selecting observations Probability.
 When every unit of the population is examined. This is known as Census method.  On the other hand when a small group selected as representatives of.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
SAMPLING Why sample? Practical consideration – limited budget, convenience, simplicity. Generalizability –representativeness, desire to establish the broadest.
CHAPTER 7, THE LOGIC OF SAMPLING. Chapter Outline  A Brief History of Sampling  Nonprobability Sampling  The Theory and Logic of Probability Sampling.
Probability Sampling. Simple Random Sample (SRS) Stratified Random Sampling Cluster Sampling The only way to ensure a representative sample is to obtain.
Population vs. Sample. Population: a set which includes all measurements of interest to the researcher (The collection of all responses, measurements,
Selecting a Sample. outline Difference between sampling in quantitative & qualitative research.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Types of method Quantitative: – Questionnaires – Experimental designs Qualitative: – Interviews – Focus groups – Observation Triangulation.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Sampling Concepts Nursing Research. Population  Population the group you are ultimately interested in knowing more about “entire aggregation of cases.
Sampling Design and Procedure
Sampling Chapter 5. Introduction Sampling The process of drawing a number of individual cases from a larger population A way to learn about a larger population.
Sampling Dr Hidayathulla Shaikh. Contents At the end of lecture student should know  Why sampling is done  Terminologies involved  Different Sampling.
1.3 Experimental Design. What is the goal of every statistical Study?  Collect data  Use data to make a decision If the process to collect data is flawed,
Graduate School of Business Leadership
Meeting-6 SAMPLING DESIGN
Sampling: Theory and Methods
Sampling Population – any well-defined set of units of analysis; the group to which our theories apply Sample – any subset of units collected in some manner.
Presentation transcript:

Professor B. Jones University of California, Davis The Scientific Study of Politics (POL 51)

Today Sampling Plans Survey Research

Populations Key Concepts Population Defined by the research “All U.S. citizens age 18 or older.” All democratic countries Counties in the United States Characteristics of a Population Bounded and definable If you can’t define the population, you probably don’t have a well formed research question!

Populations vs. Samples Populations are often unattainable TOO BIG (U.S. population) Very Costly to Obtain May not be necessary The beauty of statistical theory Samples Simply Defined: a subset of the population chosen in some manner How you choose is the important question!

Moving Parts of a Sample Units of Analysis J is the population i is a member of J Then i is a “sample element” Sampling Frames The actual source of the data Literary Digest Poll (1936) “Dewey Defeats Truman” (1948) Exit Polls

More Moving Parts Sampling Unit Could be same as sample element (Unit of Analysis) But it could be collections of elements (cluster, stratified sampling) Sampling Plan Random? Nonrandom?

Kinds of Samples Simple Random Sample Major Characteristic: Every sample element has an equi- probable chance of selection. If done properly, maximizes the likelihood of a representative sample. What if your assumptions of randomness goes badly? Nonrandom samples (often) produce nonrepresentative surveys.

Why Randomness is Goodness Nonprobability Sampling Probability of “getting into” the sample is unknown All bets are off; inference most likely impossible Highly unreliable! Simple Random Sampling Every sample element has the same probability of being selected: Pr(selection)=1/N In practice, not always easy to guarantee or achieve An Example of a Bad Assumption

Some Data

More Data

Getting Probability Samples Wrong

Draft Lottery Simple random sampling did not exist. Avg. Lottery Number Jan.-June: 206 Avg. Lottery Number July-Dec.: 161 Avg. Deaths Jan.-June: 111 Avg. Deaths July-Dec.: 159 Differences highly significant. Its absence had profound consequences. Randomness should have ensured an equal chance of draft, invariant to birth date. It didn’t. By analogy, suppose college admissions were based on this kind of lottery… o.php?vid= o.php?vid=52539

How to Achieve Randomness Random number generation Modern computers are really good at this. Assign sample elements a number Generate a random numbers table Use a decision rule upon which to select sample. The Key: sampled units are randomly drawn. Why Important? Randomness helps ensure REPRESENTATIVENESS! Absent this, all bets are off: Convenience Polls Push Polls Person-on-the-Street Interviews

Populations and Samples A population is any well-defined set of units of analysis. The population is determined largely by the research question; the population should be consistent through all parts of a research project. A sample is a subset of a population. Samples are drawn through a systematic procedure called a sampling method. Sample statistics measure characteristics of the sample to estimate the value of population parameters that describe the characteristics of a population.

Populations and Samples

A population would be the first choice for analysis. Resources and feasibility usually preclude analysis of population data. Most research uses samples.

Probability Samples The goal in sampling is to create a sample that is identical to the population in all characteristics except size. Any difference between a population and a sample is defined as bias. Bias leads to inaccurate conclusions about the population.

Probability Samples Probability samples: Each element in the population has a known probability of inclusion in the sample. Probability samples are a better choice than nonprobability samples, when possible, because they are more likely to be representative and unbiased.

Probability Samples Simple random sample: Each element and combination of elements in a population have an equal chance of selection. Selection can be driven by a lottery, a random number generator, or any other method that guarantees an equal chance of selection.

Probability Samples Systematic sample: Generated by selecting elements from a list of the population at a predetermined interval. Start point for selection must be chosen at random or the list must be randomized; otherwise, the sample will not be as representative.

Probability Samples Stratified sample: Drawn from a population that has been subdivided into two or more strata based on a single characteristic. Elements are selected from each strata in proportion to the strata’s representation in the entire population.

Probability Samples Disproportionate stratified sample: Elements are drawn disproportionately from the strata. Used to over-represent a group that, due to its small size in the population, would not likely make up a large enough percentage of the sample to allow for quality inferences.

Probability Samples Cluster samples: Group elements for an initial sampling frame (50 states). Samples drawn from increasingly narrow groups (counties, then cities, then blocks) until the final sample of elements is drawn from the smallest group (individuals living in each household).

Nonprobability Samples Nonprobability samples: Each element in the population has an unknown probability of inclusion in the sample. These sampling techniques, while less representative, are used to collect data when probability samples are not feasible.

Nonprobability Samples Purposive samples: Used to study a diverse and limited number of observations. Case studies.

Nonprobability Samples Convenience samples: Include elements that are easy or convenient for the investigator; for example, college students in samples collected on college campuses.

Nonprobability Samples Quota sample: Elements are chosen for inclusion in a nonprobabilistic manner (usually in a purposive or convenient manner) in proportion to their representation in the population.

Nonprobability Samples Snowball sample: Relies on elements in the target population to identify other elements in the population for inclusion in the sample. Particularly useful when studying hard-to-locate or identify populations.

A Population and Some “Samples” A “Population” Striations represent “attitudes” Some “Samples”

Sampling come to life in…R!!! Suppose we have a population of 100,000 And in that population, we have 4 groups Group 1: 13,000 (13 percent) Group 2: 12,000 (12 percent) Group 3: 4,000 ( 4 percent) Group 4: 70,000 (70 percent) Racial/Ethnic Characteristics in the US: US Census White (69.13 percent) Black (12.06 percent) Hispanic (12.55 percent) Asian (3.6 percent) Some R Code

R #Creating a population of 100,000 consisting of 4 groups set.seed( ) population<- rep(1:4,c(13000, 12000, 4000, 70000)) #Tabulating the population (ctab requires package catspec) ctab(table(population)) #Tabulating the population (ctab requires package catspec) (btw, not sure why percents are not whole numbers) ctab(table(population)) Count Total % population

Sampling What do we expect from random sampling? That each sample reproduces the population proportions. Let’s consider SIMPLE RANDOM SAMPLES. Also, let’s consider small samples (size 100) …which is a.001 percent sample.

R: 3 samples of n=100 #Three Simple Random Samples without Replacement; n=100 which is a.001 percent sample #The set.seed command ensures I can exactly replicate the simulations set.seed(15233) srs1<-sample(population, size=100, replace=FALSE) ctab(table(srs1)) set.seed( ) srs2<-sample(population, size=100, replace=FALSE) ctab(table(srs2)) set.seed(5255) srs3<-sample(population, size=100, replace=FALSE) ctab(table(srs3))

R: Sample Results > set.seed(15233) > srs1<-sample(population, size=100, replace=FALSE) > ctab(table(srs1)) Count Total % srs > set.seed( ) > srs2<-sample(population, size=100, replace=FALSE) > ctab(table(srs2)) Count Total % srs > set.seed(5255) > srs3<-sample(population, size=100, replace=FALSE) > ctab(table(srs3)) Count Total % srs

Implications? Small samples? Variability in proportion of groups. Why does this occur? Let’s understand stratification. What does it do? You’re sampling within strata. Suppose we know the population proportions?

R: Identifying Strata and then Sampling from them. #Stratified Sampling #Creating the Groupings strata1<- rep(1,c(13000)) strata2<- rep(1,c(12000)) strata3<- rep(1,c(4000)) strata4<- rep(1,c(70000)) #Sampling by strata #Selection observations proportional to known population values: Proportionate Sampling set.seed( ) srs4<-sample(strata1, size=13, replace=FALSE) ctab(table(srs4)) set.seed( ) srs5<-sample(strata2, size=12, replace=FALSE) ctab(table(srs5)) set.seed(33325) srs6<-sample(strata3, size=4, replace=FALSE) ctab(table(srs6)) set.seed( ) srs7<-sample(strata4, size=70, replace=FALSE) ctab(table(srs7))

R: Results? Proportional Sampling w/small samples. > srs4<-sample(strata1, size=13, replace=FALSE) > ctab(table(srs4)) Count Total % srs > > set.seed( ) > srs5<-sample(strata2, size=12, replace=FALSE) > ctab(table(srs5)) Count Total % srs > > set.seed(33325) > srs6<-sample(strata3, size=4, replace=FALSE) > ctab(table(srs6)) Count Total % srs > > set.seed( ) > srs7<-sample(strata4, size=70, replace=FALSE) > ctab(table(srs7)) Count Total % srs

Proportionate Sampling What do we see? If we know the proportions of the relevant stratification variable(s)… Then sample from the groups. SMALL SAMPLES can reproduce certain characteristics of the sample. But of course, it is probabilistic.

Disproportionate Sampling Why? “Oversampling” may be of interest when research centers on small pockets in the population. Race is often an issue in this context.

R: Disproportionate Sampling > #Sampling by strata > #Selection observations disproportional to known population values: disproportionate Sampling > #"Oversampling by Race" > set.seed( ) > srs8<-sample(strata1, size=24, replace=FALSE) > ctab(table(srs8)) Count Total % srs > > set.seed( ) > srs9<-sample(strata2, size=22, replace=FALSE) > ctab(table(srs9)) Count Total % srs > > set.seed(103325) > srs10<-sample(strata3, size=14, replace=FALSE) > ctab(table(srs10)) Count Total % srs > > set.seed(11534) > srs11<-sample(strata4, size=70, replace=FALSE) > ctab(table(srs7)) Count Total % srs >

Disproportionate Samples What did I ask R to do? I “oversampled” for some groups. Again, understand why we, as researchers, might want to do this.

Side-trip: Sample Sizes Who is happy with a.001 percent SRS? On the other hand… What do we get from a stratified sample? Suppose we increase n in a SRS? It’s R time!

R: SRS with a 1 percent sample > #Sample Size=1000 > > set.seed( ) > srs1<-sample(population, size=1000, replace=FALSE) > ctab(table(srs1)) Count Total % srs > > set.seed( ) > srs2<-sample(population, size=1000, replace=FALSE) > ctab(table(srs2)) Count Total % srs > > set.seed(52909) > srs3<-sample(population, size=1000, replace=FALSE) > ctab(table(srs3)) Count Total % srs >

Implications? Sample Size MATTERS What do we see? Note, again, what stratification “buys” us. The issues with stratification? Another R example (code posted on website)

R We have again 4 sample elements > set.seed(52352) > urn<-sample(c(1,2,3,4),size=1000, replace=TRUE) > > ctab(table(urn)) Count Total % urn  My Population

R version of a person-on-the-street interview > #Convenience Sample: What shows up > > con<-matrixurn[1:10]; con [1] > > ctab(table(con)) Count Total % con

R and Samples, redux What do we find? Very unreliable sample: we oversample some groups, undersample others. Useless data more than likely. What do you imagine happens when we increase the sample sizes?

R and SRS with samples of size N /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/ set.seed(562) s1<-sample(urn, 10, replace=FALSE) ctab(table(s1)) set.seed(58862) s1a<-sample(urn, 50, replace=FALSE) ctab(table(s1a)) set.seed(562657) s1b<-sample(urn, 75, replace=FALSE) ctab(table(s1b)) set.seed(58862) s2<-sample(urn, 100, replace=FALSE) ctab(table(s2)) set.seed(58862) s3<-sample(urn, 200, replace=FALSE) ctab(table(s3)) set.seed(10562) s4<-sample(urn, 250, replace=FALSE) ctab(table(s4)) set.seed(22562) s5<-sample(urn, 900, replace=FALSE) ctab(table(s5)) set.seed(56882) s6<-sample(urn, 1000, replace=FALSE) ctab(table(s6))

> /*Sample: Sizes 10, 50, 75, 100, 200, 250, 900, 1000*/ Error: unexpected '/' in "/" > > set.seed(562) > s1<-sample(urn, 10, replace=FALSE) > ctab(table(s1)) Count Total % s > > set.seed(58862) > s1a<-sample(urn, 50, replace=FALSE) > ctab(table(s1a)) Count Total % s1a > Sampling and Sample Size

> > set.seed(562657) > s1b<-sample(urn, 75, replace=FALSE) > ctab(table(s1b)) Count Total % s1b > > set.seed(58862) > s2<-sample(urn, 100, replace=FALSE) > ctab(table(s2)) Count Total % s > Sample Sizes

> set.seed(58862) > s3<-sample(urn, 200, replace=FALSE) > ctab(table(s3)) Count Total % s > > set.seed(10562) > s4<-sample(urn, 250, replace=FALSE) > ctab(table(s4)) Count Total % s > Sample Size

> set.seed(22562) > s5<-sample(urn, 900, replace=FALSE) > ctab(table(s5)) Count Total % s > > set.seed(56882) > s6<-sample(urn, 1000, replace=FALSE) > ctab(table(s6)) Count Total % s >

R: What did we learn? Sample size seems to have some impact here. But there are trade-offs.

Important Moving Parts Randomness (covered!) Sampling Frame Random sampling from a bad sampling frame produces bad samples. Sample Size What is your intuition about sample sizes? Must they always be large? Not necessarily so…although…

Bad Sampling Person-on-the-Street Interviews What do these imply? Small samples and inherently nonrandom Likely poor inference. Other examples? Not all non-random samples are necessarily bad Purposive Samples