Presentation on theme: "Sampling MICS3 Regional Workshop Survey Design. MICS Sample Design MICS is a complex survey (Multi-stage stratified). MICS is a worldwide program, consistence."— Presentation transcript:
Sampling MICS3 Regional Workshop Survey Design
MICS Sample Design MICS is a complex survey (Multi-stage stratified). MICS is a worldwide program, consistence & comparability are important issues. We will discuss only a few of the highlights including: Sample size determination Stratification and sample allocation Number of Primary Sampling Units and cluster sizes Use of existing sample or new sample A few special topics
Sample Size for MICS Most important feature of MICS with respect to survey costs. We will discuss: DETERMINANTS – factors, constraints INDICATORS to use FORMULA to calculate sample size
Determinants of Sample Size (Factors and Constraints) Sample size (households) depends on many factors: Expected size estimate of indicators Expected size estimate of target population(s) Average household size Margin of error wanted Level of confidence wanted Design effect (increase in sample error due to use of cluster survey instead of simple random sample) Expected non-response rate Number of clusters or PSUs Cluster size (number of households per sample cluster) Number of sub-national areas for separate estimates (domains) Survey budget and implementing capability
MICS Recommendations on Sample Size Determinants FACTORRECOMMENDATION 1.Expected size estimate of indicators(next slide) 2.Expected size estimate of target population12-23 mos [3%] 3.Average household size6 persons 4.Relative margin of error wanted12% of coverage rate 5.Level of confidence wanted95 percent 6.Design effect in cluster surveys1.5 7.Expected non-response rate10 percent 8.Number of clusters or PSUs - minimum[ ] 9.Cluster size[15-35] 10.Number of estimation domains wanted[5 or fewer] 11.Survey budget(country specific) For items 2, 3, 6, 7 use available country data (recent survey or census); if not available, use value above.
Indicators for Sample Size Determination Sample size is different for each MICS indicator. Must choose a key indicator, since only one sample size can be used in MICS. Recommendations for choosing key indicator: Choose from among main indicators of interest in your country. Choose the one which will yield largest sample size. Usually for a single-year age group, and Usually DPT, measles, polio or tuberculosis immunization - or birth weight below 2.5 kg Exceptions: Do not choose infant or maternal mortality rates as the key indicators. Do not choose a low coverage indicator that is desirably low (such as malnutrition prevalence). Do not choose breast-feeding indicators for 4-month age groups.
Checklist for Target Group and Indicator To decide on the appropriate target group and indicator that you need to determine your sample size: 1.Pick children months old - the target population that comprises the smallest percentage of the total population – probably about 3 percent. 2. For that target group, pick the lowest from among the following coverage rates: - DPT immunization level - Measles immunization level - Polio immunization level - Tuberculosis immunization level 3.Do not pick from the desirably low coverage indicators that is already acceptably low.
Formula for Sample Size Different formula than MICS2000 MICS2005 formula emphasizes relative margin of error* instead of 5% absolute error (high coverage indicator) or 3% for low coverage indicator. Less confusing Does not depend on high or low coverage * The Relative Margin of Error is the percentage of tolerable difference that the estimated proportion can differ from its true value with a given confidence level. It determines the relative length of the confidence interval.
Formula n=[4 (r) (1 - r) (deff) (1.1)] / [(.12r ) 2 (p)(ave-size)] where n is the required sample size, expressed as number of households, for the KEY indicator 4 is factor to achieve 95 percent level of confidence, r is anticipated prevalence (coverage) rate for key indicator, 1.1 is factor to raise sample size by 10 percent for potential nonresponse, deff is shortened symbol for design effect, 0.12r is margin of error to be tolerated, defined as 12 percent of r (12 percent thus represents the relative sampling error of r), p is proportion of total population that smallest group comprises, and ave-size is average household size. You may use the table on the next page instead of formula if all conditions are satisfied for that table in your country.
Sample Size (Households) Calculation for Proportion Estimation Using Smallest Target Population
Example 1 Target group: Children 12 to 23 months old Percent of population:3 percent Key indicator: DPT immunization coverage Prevalence (Coverage): 30 percent Deff: No information Non-response: No information Average household size: 6 Checking table => n = 5941
Checklist for Use of Sample Size formula The formula to determine your sample size : n = [4 (r) (1 - r) (f) (1.1)] / [(.12r) 2 (p) (n h )]. Use it if any (one or more) of the following applies in your country: 1)p – the proportion of one-year-old children is other than 3% 2)n h – the average household size is less than 4.5 persons or greater than 6.5 3)r – the coverage rate of your key indicator is under 20 or over 40 percent 4)f - the sample design effect for your key indicator is different from 1.5, according to accepted estimates from other surveys in your country 5)your anticipated non-response rate is more or less than 10 percent.
Example 2 Target group:Children 12 to 23 months old Percent of population: 3.5 percent Key indicator:DPT immunization coverage Prevalence (Coverage): 25 percent Deff: 1.6 Non-response adjustment = 1.05 (response rate 95%) Average household size: 6 n = [4 (.25) (.75) (1.6) (1.05)] / [(.12*.25)2 (.035) (6)] = 1.26/ = 6667.
Stratification & Sample Allocation Stratification is the process of regrouping similar PSUs into sub-groups (strata). Effects: better precision, flexible design, small sub-population coverage (or over sampling). How to do stratification? (region) X (residence type) Sample allocation: proportional, power allocation, equal size allocation (if budget is too tight). Implicit stratification: sort the sampling frame according to certain characters such as regions, urban-rural residence, sub-regions, districts, etc.., then select a pps sample. There is no unique rule for stratification, it depends on country situation
Number of PSUs and Cluster Size Survey costs depend not only on number of households but their distribution among Primary Sampling Units (PSUs). In general, the more PSUs the better for reliability but the greater the cost (usually travel costs). We recommend 300 to 400 PSUs or more. Number of PSUs also depends on cluster size. Cluster size should be as small as practical for reliability. Example: 8000 households selected in 400 PSUs of 20 households each is much more reliable sample than 200 PSUs of 40 each, but more expensive.
MICS Sampling Option 1 USE AN EXISTING SAMPLE Piggy-back MICS onto DHS or other survey if timely and feasible. Or, use sample from a previous survey and re-interview households for MICS. Or, use old survey sample EAs and construct new listing of households to select for MICS. Old sample must be probability-based, national in scope. Possibilities – DHS, other national health survey, recent labour force survey Possibilities – DHS, other national health survey, recent labour force or household expenditure surveys Important: design parameters must be known (such as selection probability, stratification, etc..)
OPTION 1 - USE OF AN EXISTING SAMPLE, continued Advantages of old sample - cost savings - maps available for interviewers - design rigor - simplicity Limitations of old sample - burden on respondents - sample design may need modification * sample size * sub-national coverage * number of PSUs or clusters => Balance between loss and gain
MICS Sampling Option 2 USE NEW SAMPLE WITH HOUSEHOLD LISTING OPERATION Design new MICS sample based on prototype Two stages with census as frame (see comprehensive discussion in Chapter 4 on frame construction and up-dating old frames) Use of implicit stratification, systematic selection of census EAs at first stage with pps Create standard segments (DHS approach) List households in selected segments Select households systematically from list Interview only the selected households, no replacement will be allowed
OPTION 2 - NEW SAMPLE WITH HOUSEHOLD LISTING, continued Advantages of option 2 - simple design - probability-based - if possible self-weighting (national level) Limitations of option 2 - expense of listing households - time necessary to list households [Example, sample size of 5000 households may need to households to be listed.]
DHS Method - Option 2 Create standard segments. Divide census population in each EA by 500 to determine number of standard segments. Map sketch segments in each EA. Choose 1 segment at random. List households in selected segment only (instead of entire EA). Purpose is to reduce listing workload to a manageable size.
MICS Sampling Option 3 USE NEW SAMPLE WITHOUT HOUSEHOLD LISTING OPERATION (Modified Segment, or Cluster, Design) Design new MICS sample based on prototype. Two stages with census as frame Use of implicit stratification, systematic selection of census EAs at first stage with pps Pre-determine number of segments based on desired cluster size. Map sketch segments in each EA. Choose 1 segment at random. Interview all households in selected segment
OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued Illustration: Suppose desired cluster size is 20 households. Suppose first sample EA contains 112 census households (according to frame). Divide 112 by 20 = 5.6 (round to 6). Map sketch exactly 6 segments based on canvass of EA. Select one segment at random. Interview all households (no matter how many are currently in the selected segment).
OPTION 3 - NEW SAMPLE WITHOUT HOUSEHOLD LISTING, continued Advantages of option 3 avoids listing completely probability-based self-weighting (national level) Limitations of option 3 less reliable than option 2 (households are clustered together in compact segments) segmentation itself can be time-consuming and complicated difficult to control sample size
Special Topics Sub-national estimates, domains Water and sanitation estimates Survey weighting, sampling errors Other – sample frame construction, selection techniques Country examples
Sub-national Estimates, Domains Number of separate areas (domains) for which separate, equally reliable estimates are wanted affects sample size. If, say, 5 regional estimates are wanted, then, theoretically, sample should be increased by factor of 5. Must be careful therefore in producing separate estimates for domains. Either limit number of domains to avoid large increase in sample size, Or be prepared to accept domain estimates with much higher sampling errors than national.
Water and Sanitation Estimates These are an important component of MICS. Sampling errors will be high, however (extremely high in some cases). MICS sample is design primarily for person variables rather than household variables such as water/sanitation. Sample design effects for water and sanitation indicators will be much higher than for other indicators. Consequently, sampling reliability is very low. Estimates can nevertheless be useful to estimate trends in water/sanitation if previous surveys exist upon which to make comparison.
Survey Weighting and Sampling Errors All analysis based on survey data must apply survey weights in order to prevent biased results. Survey weighting is design-specific. Non-response must be taken into account. Formulas for calculating weights depend on the exact sample design used in each country.
Sampling Error Estimation Calculation of sampling errors necessary to evaluate reliability of survey estimates Should be done for important indicators Methodology is complex and design-specific There are several options for sampling error calculations: May use existing software (Clusters, WesVar, CenVar, PCCarp, etc.) Latest version of SPSS currently evaluated whether new routines on sampling error are appropriate for MICS3 surveys Routines in CSPro can be used Or use simple, variance spreadsheet that will be available on the MICS website,
Sampling Error Estimation, continued With spreadsheet, only necessary to enter: Survey weights for each cluster Unweighted indicator estimate for each cluster Sampling error automatically calculated Confidence limits, design effect automatically calculated
Other Topics Other key information to be included in the MICS3 manual for the sampling statistician to review: Sample frame construction When new sample is used for MICS Especially important if frame is old Selection techniques Details of systematic sampling PPS sampling (probability proportionate to size) Country examples from MICS2000 Papua New Guinea, Lebanon, Angola