# Multiple Indicator Cluster Surveys Data Processing Workshop

## Presentation on theme: "Multiple Indicator Cluster Surveys Data Processing Workshop"— Presentation transcript:

Multiple Indicator Cluster Surveys Data Processing Workshop
Sample Weights MICS Data Processing Workshop

What are sample weights ?
Sample weight: a statistical correction factor used to correct for imperfections in the sample that might lead to bias: Unequal probabilities of selection Non-response Constant sampling weight: self-weighting sample

Self-weighting sample
Constant sampling weight: self-weighting sample Stratum level (e.g., urban and rural within region) National level: overall self-weighting sample (almost inexistent in household surveys)

Self-weighting sample
Advantages Equally representative for every unit Reduced sampling errors Disadvantages: Difficult for survey management (e.g., to distribute the work-load) because of the variant sample take by PSU Difficult to control the expected sample size

Self-weighting sample
Disadvantages Self-weighting is not exact because of the rounding of the sample takes and this will bring bias in the survey estimation In most MICS surveys, if not all, samples are not self-weighting. Therefore, sample weights must be used for reporting national estimates

Example - Sample Weights
For example, the weights for North and West regions (Popstan) North region 10,000/500 = 20 West region 10,000/250 = 40 In North region, each household selected represents 20 households in that region – same figure is 40 in West Overall, every household selected in Popstan represents households (20,000/750)

Example - Sample Weights
In other words, relative to a proportional selection (should be 375 households selected from each region), more households have been selected from North, less have been selected from West This has to be “compensated” by using sample weights during analysis to re-calibrate the sample to the national level

Example - Sample Weights
In our example let’s assume that: 25 percent of households in North use improved water sources 75 percent of households in West use improved water sources If the sample was selected proportionally (375 households from each region), then our survey estimate would be ((375 * 0.25) + (375 * 0.75)) / 750 = 0.50

Example - Sample Weights
If we do not weight, then our national estimate will be ((500 * 0.25) + (250 * 0.75)) / 750 = 0.417 Because, we have over-sampled a region (North region) where use of improved water sources is less We need to calculate sample weights to “correct” this situation

Example - Sample Weights
If we assigned a weight of 20 to each household in North, and 40 to each household in West, this would do the trick (500 * 20 * 0.25) + (250 * 40 * 0.75) (500 * 20) + (250 * 40) = 0.50

Example - Sample Weights
This is fine, but SPSS tables would show 20,000 households as the denominator We do not want this So, we normalize the weights We calibrate (normalize) them so that the average of the weights in the data set is equal to 1

Example - Sample Weights
The normalized weight for the North region is calculated as (10000/500)/(20000/750) = 0.75 And for the West region, (10000/250)/(20000/750) = 1.5 When we calculate the national use of improved water sources by using normalized weights, (500 * 0.75 * 0.25) + (250 * 1.5 * 0.75) 375 = (500 * 0.75) + (250 * 1.5) 750

Sample weights Based on the design of the sample, there are two (common) approaches to calculating weights: Each cluster has a unique sample weight (weights.xls) Each stratum has a unique sample weight (weights_alt.xls) We have templates for both. You will need to work with your sampling expert to see which one you will use

Sample Weights Objects
weights.xls spreadsheet that calculates weights weights_table.sps SPSS program that provides input data for spreadsheet weights.sps SPSS program that defines structure of spreadsheet’s output weights_merge.sps SPSS program that merges weights onto the MICS data files

Calculating sample weights
The spreadsheet weights.xls is used to calculate the sample weights It has two worksheets, calculations and output. The calculations worksheet performs the calculations The output worksheet contains only the sample weights and a list of cluster numbers; format useful for reading the data into SPSS

Weights calculation template

weights_table.sps produces data needed for calculating the sample weights weights_merge.sps adds the appropriate sample weights to the analysis files

Steps in calculating sample weights
The process of calculating sample weights and adding them to your analysis files can be broken down into six steps

Steps in calculating sample weights
Adjust the number of rows in the calculations and output worksheets so that there is one row per cluster in your survey. After you have added or deleted rows, be sure to check that doing so did not affect the totals row in the calculations worksheet

Steps in calculating sample weights
Enter required information for columns B to F and for columns H and I

Steps in calculating sample weights
Update the definition of strata (or domains) on lines 3 through 10 of the program weights_table.sps The standard programs assume that strata are formed by all combinations of area (that is, urban and rural) and region and that there are four regions (the program should be modified to reflect the strata or domains in use in your sample)

Steps in calculating sample weights
Execute the program weights_table.sps.

Steps in calculating sample weights
Copy the information in the table and paste it into the calculations worksheet of weights.xls When you complete this step, weights.xls will automatically calculate the sample weights

Steps in calculating sample weights
Execute the program weights_merge.sps Once you have completed the sixth step, be sure to check the output list for error messages and to open the analysis files and confirm that the weights have been properly merged