Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.

Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001

Lecture overview Comments about Assignment I More sampling techniques Sampling error Sample sizes

Comments about Assignment I Late policy Location of mailbox Randomization vs. random selection Validity, reliability Sampling frames Physician responses=?=“gold standard” Research questions vs. survey questions Registering for class

Comments about Assignment I Grading Looked for completeness in answering questions, care in discussion of survey, effort, basically correct information, not just cut-n-paste, synthesis. Questions about grade: email manyadm@tulane.edu

Comments about Assignment I Grading: –++ 90-100% –+80-89% – 70-79% –-60-69% –--<60% –0not turned in

Random digit dialing (1) Delineate the geographic boundaries of the sampling area Identify all of the exchanges used in the geographic area Identify the distribution of prefixes with the sampling area –Example: There may be 8 exchanges, but you may find that 3 of them are used for nearly two- thirds of residential lines.

Random digit dialing (2) You may stratify based on the distribution of prefixes –Ex. Take more samples of the 3 exchanges that account for the most residential lines Try to identify vacuous suffixes –These are suffixes not yet assigned or assigned in large groups to a business –Usually consider suffixes in 100s ex. 0000-0099, 0100-0199

Random digit dialing (3) May randomly select the four-digit suffixes –ex. use a random-numbers table Alternatively, you may use a plus-one approach –When you reach residence, use the number as a seed, and add fixed digits (one or two) to get the next sample

Random digit dialing (4) Provides a nonzero chance of reaching any household within a sampling area that has a telephone line regardless of whether the number is listed Is the probability of reaching every household equal? –No. Households with more than one phone line will have a greater probability than households with one phone line. –Adjust for unequal probability by weighting

Random Digit Dialing (5) Advantages: Inexpensive and easy to do Disadvantages: 1. Large number of unfruitful calls 2. Will exclude individuals without phones 3. May be difficult to ascertain geographic area

Sampling distributions The central limit theorem: In a sequence of samples of a population, for a particular estimate (say a mean), there will be a normal distribution around the true population value As sample size increases, distribution becomes increasingly normal

This variation around the true value is the sampling error—it stems from the fact that, by chance, samples may differ from the population as a whole.

The larger the sample size and the less variance of what is being measured, the more tightly the sample estimates will “bunch” around the true population value, and the more accurate the sample-based estimate will be.

Example (1) (adapted from Babbie) Survey at TUSPHTM Approval of new Lundi Gras holiday Dichotomous outcome: approve/disapprove Survey population—aggregation of students Sampling frame—student list Random sample of students; representative sample of student body

Example (2) (adapted from Babbie) Extremes and all combinations in between possible: 100% approve  100% disapprove, 1% approve, 99% disapprove, etc.. First random sample: 48% approve, 52% disapprove Second random sample: 20% approve, 80% disapprove And so forth

Example (3) (adapted from Babbie) What results from this exercise, is a distribution of samples, or a sampling distribution. As more independent random samples are selected, the sample statistics obtained will be distributed around true population value in a known way.

Example (4) (adapted from Babbie) They will be clustered about the true value within a certain range. The range is given by the standard error. We do not know if the value in our sample is within the range, just that if many similar samples were taken in the same fashion, X% would fall within the specified range; this one may or may not.

Example (5) (adapted from Babbie) Probability theory says that 68% of samples will fall within one standard deviation of the parameter and 95% will fall within two standard deviations of the parameter Increasing confidence with increasing range

Note difference between standard errors & standard deviations

Standard error of a mean

The standard deviation of the distribution of sample estimates of the mean that would be formed if an infinite number of samples of a given size were drawn.

Proportions Mean of a two-value (binomial) distribution Var of a proportion = p(1-p) So the

Table 2.1 Confidence Ranges for Variability Attributable to Sampling Trends If sample size=75 and p=0.20,

Confidence intervals In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level? In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes?

In a survey of 100 respondents, 20% say yes. What is the confidence interval for a 95% confidence level? Interval is 8. 95% CI=(12%, 28%)

In a survey 250 respondents, 10% say yes. What is the confidence interval for a 95% confidence level? What if 50% said yes? Interval is about 3.8. 95% CI is about (6.2%, 13.8%) If 50% said yes, CI is about (43.7%, 56.3%)

Sampling error and sampling strategy SRS is approximated by the standard error Systematic sampling –If not stratified, sampling error is the same as in SRS. –If stratified, errors are lower than those associated with SRS for the same size for variables that differ (on average) by stratum, if rates of selection are constant across strata.

Sampling error and sampling strategy (2) Unequal rates of selection decrease sampling error for oversampled groups. It will generally produce sampling errors for the whole sample that are higher than those associated with SRS of the same size for variables that differ by stratum.

Sampling error and sampling strategy (3) Clusters will produce sampling errors that are higher than SRS for the same size for variables that are more homogenous within clusters than in the population as a whole. You must look at the nature of the clusters to evaluate the effect on the sampling error.

Caveats Sampling error is in no way the only source of error. Non-sampling error, bias, error resulting from incorrect specification of sampling frame, etc., etc., are also sources of error. Often the latter are more insidious as they are seldom quantifiable Total survey approach useful in this regard.

Sample size (1) Very important to consider prior to undertaking study Consult a biostatistician Many references in texts, available spreadsheet, stat programs, EpiInfo, etc. Never feel bad asking for assistance

Sample size (2) What not to do 1.Sample size does not rely on the fraction of the population that is sampled. Nor does it depend on the size of the population you want to describe. 2.Sample size should not be decided solely based on what others have previously done. 3.Sample size should not be based on the desired level of precision for just one estimate.

Sample size (3) What to do –develop analysis plan –desired precision of estimates for subgroups, –consider research questions –affordability, –feasibility, –and to some extent, previous studies

Sample size (5) Parameters required to calculate sample size: –Null hypothesis—what precisely are you asking/testing? –  [Pr(type I error)] –  [Pr(type II error)]—usually included as 1-  =power –What difference between groups do you want to observe? (e.g.,  1 -  2 ) –What is a good estimate of variance in population?

Sample size (6) How sample size works—some examples

Sample size (7)  sample size,  power Group A  Group B 

Sample size (8)  sample size,  power A:      B:     

Sample size (9)  variability,  power A:      B:     

Sample size (10)  variability,  power A:      B:     

Non-response (1) Very big issue Source of non-sampling error Can lead to bias, uninterpretability of results Violates whole point of probability sample, yet unavoidable

Non-response (2) Issue in probability as well as non- probability samples Exists on many levels

Non-response (3) Whole sample Reached Not reached

Non-response (4) Reached Can participate Cannot participate

Non-response (5) Reached Enrolled Refused

Non-response (6) Participated Answer individual question Did not answer individual question

Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.

Similar presentations

Presentation on theme: "Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001.

Similar presentations

Presentation on theme: "Survey Methodology EPID 626 Sampling, Part II Manya Magnus, Ph.D. Fall 2001."— Presentation transcript:

Similar presentations

About project

Feedback