Presentation on theme: "SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)"— Presentation transcript:
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
To put your footer here go to View > Header and Footer 2 Learning Objectives By the end of this session, you will be able to explain the role of sampling weights in estimating population parameters calculate sampling weights for very simple sampling designs appreciate that calculating sampling weights for complex survey designs is non- trivial and requires professional expertise
To put your footer here go to View > Header and Footer 3 What is meant by sampling weights? Real surveys are generally multi-stage At each stage, probabilities of selecting units at that stage are not generally equal When population parameters like a mean or proportion is to be estimated, results from lower levels need to be scaled-up from the sample to the population This scaling-up factor, applied to each unit in the sample is called its sampling weight.
To put your footer here go to View > Header and Footer 4 A simple example Suppose for example, a simple random sample of 500 HHs in a rural district (having 7349 HHs in total) showed 140 were living below the poverty line Hence total in population living below the poverty line = (140/500)*7349 =2058 Data for each HH was a 0,1 variable, 1 being allocated if HH was below poverty line. Multiplying this variable by 7349/500=14.7 & summing would lead to the same answer. i.e. sampling weight for each HH = 14.7
To put your footer here go to View > Header and Footer 5 Why are weights needed? Above was a trivial example with equal probabilities of selection In general, units in the sample have very differing probabilities of selection, i.e. rare to get a self-weighting design To allow for unequal probabilities of selection, each unit is weighted by the reciprocal of its probability of selection Thus sampling weight=(1/prob of selection)
To put your footer here go to View > Header and Footer 6 Weights in stratified sampling Consider To the Woods example data set discussed in Session 10. Mean number of large trees were: – in region 1, based on n 1 =8 – in region 2, based on n 2 =6 Hence total number of large trees in the forest can be computed as (96*97.875) + (72*83.5) = So what are the sampling weights used for each unit (plot)?
To put your footer here go to View > Header and Footer 7 Self-weighting again The sampling weights are the same for all plots, whether in region 1 or region 2. Why is this? What are the probabilities of selection here? –In region 1, each unit is selected with prob=8/96 –In region 2, each unit is selected with prob=6/72 Recall that a design where probabilities of selection are equal for all selected units is called a self-weighting design. So regarding the sample as a simple random sample should give us the correct mean.
To put your footer here go to View > Header and Footer 8 Results for means The mean number of large trees, using the formula for stratified sampling, gives [(96/168)* ] + [(72/168)*83.5] = Regarding the 14 observations pretending they were drawn as a simple random sample gives as the answer. The results for variances however differ –Variance of stratified sample mean=1.28 –Variance of mean ignoring stratification = 2.18
To put your footer here go to View > Header and Footer 9 Results for means Important to note that the weights used in computing a mean, i.e. –(96/168)*(1/8) = 1/14 for plots in region 1, & –(72/168)*(1/6) = 1/14 for plots in region 2, are not sampling weights Sampling weights refer to the multiplying factor when estimating a total. Essentially they represent the number of elements in the population that an individual sampling unit represent.
To put your footer here go to View > Header and Footer 10 Other uses of weight Weights are also used to deal with non- responses and missing values If measurements on all units are not available for some reason, may re-compute the sampling weights to allow for this. e.g. In conducting the Household Budget Survey 2000/2001 in Tanzania, not all rural areas planned in the sampling scheme were visited. As a result, sampling weights had to be re-calculated and used in the analysis.
To put your footer here go to View > Header and Footer 11 Computation of weights General approach is to find the probability of selecting a unit at every stage of the sample selection process e.g. in a 3-stage design, three set of probabilities will result Probability of selecting each final stage unit is then the product of these three probabilities The reciprocal of the above probability is then the sampling weight
To put your footer here go to View > Header and Footer 12 Difficulties in computations Standard methods as illustrated in textbooks on sampling, often do not apply in real surveys Complex sampling designs are common Computing correct probabilities of selection can then be very challenging Usually professional assistance is needed to determine the correct sampling weights and to use in correctly in the analysis
To put your footer here go to View > Header and Footer 13 Software for dealing with weights When analysing data from complex survey designs, it is important to check that the software can deal with sampling weights Packages such as Stata, SAS, Epi-info have facilities for dealing with sampling weights However, need to be careful that the approaches used are appropriate for your own survey design
To put your footer here go to View > Header and Footer 14 References Brogan, D. (2004) Sampling error estimation for survey data. Chapter XII, pp , of the UN Publication An Analysis of Operating Characteristics of Household Surveys in Developing and Transition Countries: Survey Costs, Design Effects and Non-Sampling Errors. Available at (accessed 10th September 2007) Lohr, S.L. (1999) Sampling: Design and Analysis. International Thomson Publishing. ISBN Rao, P.S.R.S. (2000) Sampling Methodologies: with applications. Chapman and Hall, London.
To put your footer here go to View > Header and Footer 15