Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sample size calculation and development of sampling plan

Similar presentations


Presentation on theme: "Sample size calculation and development of sampling plan"— Presentation transcript:

1 Sample size calculation and development of sampling plan
Real population value

2 Overview Bias versus sampling error Level of precision
Calculating sampling size Single survey using random sampling Single survey using two-stage cluster sampling Comparing two surveys Drawing the sample First stage Second stage Developing a sampling plan

3 Introduction Result from survey is never exactly the same as
the actual value in the population WHY? (Read text and question, then solicit answers from the participants, then:) We will explore the answer to this question in great detail.

4 Bias and sampling error
Point estimate from sample 45% True population value 50% Total difference = 5 percentage points This slide illustrates an example in which the true population prevalence of some outcome, such as anemia in children, is 50% (Reveal next item) However, the estimate calculated from the survey sample is only 45% The total difference between the true value in the population and the estimate from our survey is 5 percentage points This difference is due to two things: Bias causes part of the difference Sampling error also makes up part Sampling error Bias

5 Bias Results from: Enumerator/respondent bias
Incorrect measurements (anthropometric surveys) Selection of non-representative sample Likelihood of selection not equal for each sampling unit (Read bullet points then, Reveal and read text on bottom of slide)

6 Unlike bias, it can be predicted, calculated, and accounted for
Sampling error Sampling error Difference between survey result and population value due to random selection of sample Influenced by: Sample size Sampling method (Read bullet points then, Reveal and read text on bottom of slide) Unlike bias, it can be predicted, calculated, and accounted for

7 What is the sampling error and possible bias in this example?
A poll for Le Figaro newspaper showed Sarkozy, the nominee for President Jacques Chirac's governing party, drawing 28.5 percent of votes for the first round. Royal tallied 25 percent. But the margin of error in surveys of its size is plus or minus three percentage points. The poll was based on 570 interviews conducted by phone on 20 March 2010.

8 Confidence interval What does a 95% confidence interval mean?
If you repeat the same survey 100 times, with the same methodology and the same sample size, 95 of the surveys will have 95% confidence intervals which include the true value in the population. OR You are 95% sure that the true population value lies somewhere within the 95% confidence interval. (Read question and solicit answers from the participants, When ready, reveal and read first answer.) This can be expressed in another way which is a bit more intuitive: (Reveal and read second answer)

9 Confidence interval cont.
Point estimate from sample 45% Prevalence 0% 100% Sampling error as measured by 95% confidence interval Now we never do multiple surveys using the same method to measure the same outcome. We usually have only one survey for which we know the point estimate and the confidence interval, as shown here.

10 Confidence interval cont.
Let’s look at this concept graphically: Here is a graph of results of 100 surveys done using exactly the same method and sample size The true population prevalence is 50% Because of sampling error, these surveys will have different point estimates The greatest number of surveys will come up with results close to the true value 50% But some surveys will have a result farther from the true population prevalence The vertical lines show where 95% of the surveys fall Our survey with the result of 45% is within the range of 95% of the surveys Point estimate from sample 45% True population prevalence 50%

11 Confidence intervals as measure of precision
We can also visualize confidence intervals as a circle around a dart The dart represents the point estimate. The circle tells us how wide is the range in which we are 95% confident that the true value lies. The left dart has a large circle, meaning that we are 95% sure that the true population value lies only within a very large range Perhaps the 95% confidence interval is 25% to 49%, as shown on the line below. The right dart is in a much smaller circle. Here we are 95% sure that the true population value lies in a much smaller range In this case, perhaps the 95% confidence interval is 35% to 39%, as shown on the line below. Note that the point estimate for both darts is the same: 37% 25% 37% 49% 37% 35% 39%

12 Example 1: Small or large sample? Bias or no bias?
Small sample size without bias: But how does this relate to sample size? If you have a small sample size producing a lack of precision, but your measurement and sampling methods are good so there is little bias, repeated surveys would look like this. Each survey’s result is far from the true value in the population But if you averaged the results of repeated surveys, the average would be near the true value A small sample size produces imprecision, but if your methods are good, you have little inaccuracy. Real population value

13 Example 2: Small or large sample? Bias or no bias?
Large sample size without bias How about this situation? You have increased your sample size to reduce imprecision, so now the point estimates from repeated surveys are much closer together. You have used the same good methods so you have little bias As a result, each surveys point estimate is closer to the true population value Real population value

14 Example 3: Small or large sample? Bias or no bias?
Large sample size with bias: But now you have a problem. You have used a large sample size so that repeated surveys have good precision. This makes the results of repeated surveys very similar to one another But you have used bad methods and injected bias into your surveys This makes the point estimate of each survey far from the true population value Real population value

15 Precision versus bias Larger sample size increases precision
It does NOT guarantee absence of bias Bias may result in very incorrect estimate Quality control is more difficult the larger the sample size Therefore, you may be better off with smaller sample size, less precision, but much less bias. OK, another yellow border slide (Read bullet points)

16 Calculating sample size - Introduction
Sample size calculations can tell you how many sampling units you need to include in the survey to get some required level of precision (Read bullet points)

17 FS & nutrition surveys: sampling considerations
If a nutritional survey is conducted as part of CFSVA, additional considerations on sampling are required to reconcile the 2 surveys. Why? Food security analysis has less strict demands on the precision of a single indicator (convergence of evidence) → few % points difference in prevalence of FS is acceptable. A difference of a few % points in wasting prevalence can have considerable implications for programmes.

18 FS & nutrition surveys: sampling considerations (cont.)
The final HH sample size depends on the objective of collecting nutritional info. goal is to study the link between food security & nutrition → smaller sample sizes are sufficient goal is to provide accurate & precise prevalence on nutrition indicators → nutrition sample size has to be adequate (larger)

19 Nutrition survey sample size: 30*30 (?)
Why the 30*30 method was proposed and why is not necessary to compute the nutrition survey sample size? Proposed to ensure enough precision. It assumes that: Prevalence of the core indicator is 50% Desired precision is -/+ 5 percentage points DEEF = 2 15% of the HHs or kids will refuse

20 Single survey using simple random sampling
2 1 3

21 Calculate sample size – single survey using random sample
To estimate sample size, you need to know: Estimate of the prevalence of the key indicator (e.g. rate of stunting) Precision desired (for example: ± 5%) Level of confidence (always use 95%) Expected response rate Population For nutrition surveys: number of eligible individuals per household The last two items in the list are in grey because they are not really worth considering The total size of the population has little influence on the sample size if the total population is 10,000 or greater Because the population in which assessment surveys are done are usually larger than this, you do not usually need to take the population size into account The level of confidence is also not really a factor. We almost always use 95% confidence intervals If for some reason you want to use 90% or 99% or other confidence intervals, you can look up the correct formula in any statistics book.

22 Changing the estimated prevalence
This graph demonstrates how the estimated prevalence affects sample size This line shows the changes in sample size assuming that You are using 95% confidence intervals Want 95% confidence intervals of plus and minus 5 percentage points And that the population being surveyed is greater than 10,000 Note that as the prevalence of the condition gets closer to 50%, the sample size becomes greater to achieve the same level of precision

23 Changing the desired precision
This graph demonstrates how changing the desired precision affects sample size This line shows the changes in sample size assuming that You are using 95% confidence intervals The prevalence is 50% And that the population being surveyed is greater than 10,000 Note that as the precision increases (that is, as the confidence intervals become narrower), the required sample size rises. Below plus and minus 5 percentage points, the sample size increases very rapidly Because of this, it is rare that an emergency assessment surveys attempts to get precision greater than plus/minus 5 percentage points.

24 Changing the population size
This graph shows why the size of the population does not matter if it is greater than 10,000 Note that above 10,000 on the x-axis, the sample size changes little Below 10,000, the sample size decreases as the finite population correction is applied But carrying out surveys in such small populations is rare

25 NOTE: As long as the target population is more than a few thousand people, you do not need to consider it in the sample size. You do NOT generally need a larger sample size if the population is bigger. Here is another yellow border slide (Read text) This is an important point Many people do not understand the true relationship between population size and sample size They think that if the population is bigger, you must have a bigger sample size. Many times such thinking results in doing a survey with MUCH too big a sample size As we learned earlier, this increases the likelihood of bias It also wastes scarce resources In reality, we can use exactly the same sample size to measure the prevalence of anemia in women of child-bearing age in Lichtenstein and China.

26 Sample size formula (single survey using random sampling)
To calculate sample size for estimate of prevalence with 95% confidence limit N = x (P)(1-P) d2 1.96 = Z value for 95% confidence limits P = Estimated prevalence (e.g. 0.3 for 30%) d = Desired precision (e.g for ± 5%)

27 Steps for calculating the sample size
1. Decide on key indicator 2. Estimate prevalence of key indicator 3. Decide on precision required 4. Calculate initial sample size using formula 5. Adjust for individual non-response 6. Adjust for eligible members 7. Adjust for household non-response Final sample size

28 Step 2: Where do get information to make assumption about prevalence?
Prior surveys Qualitative estimates “Worst case” scenario: use prevalence of 50% (Read bullet points) As shown on the graph from a few slides ago, if you really have no idea what the prevalence may be, use 50%. This will give the largest sample size, ensuring that if the prevalence is different in the population, you will still have a large enough sample size.

29 Step 3: How do we decide on the needed precision?
One-time results for advocacy alone does not need much precision (0.10 good enough) Results that you will need to compare against in the future need greater precision (0.05 if program will have large impact) Results you will monitor frequently (e.g. year by year) need even greater precision (0.02) (Read bullet points)

30 Step 4: Calculate required sample size
Survey anemia in children under-5 Assume prevalence of anemia = 40% (0.40) Need precision of +/- 5% (0.05) N = x (P)(1-P) d2 1.96 = Z value for 95% confidence limits P = Estimated prevalence d = Desired precision OK, now your turn: Calculate the sample size for a survey measuring the prevalence of anemia in young children where the assume prevalence is 40% and you need a precision of plus/minus 5 percentage points. (Give participants about 5 minutes to calculate or until it appears that most have completed the calculation, Then reveal the answer. Discuss mistakes made by participants.)

31 Step 4 (cont.): Calculate required sample size
N = x (P)(1-P) d2 1.96 = Z value for 95% confidence limits P = Estimated prevalence d = Desired precision (for example, 0.05 for ± 5%) OK, now your turn: Calculate the sample size for a survey measuring the prevalence of anemia in young children where the assume prevalence is 40% and you need a precision of plus/minus 5 percentage points. (Give participants about 5 minutes to calculate or until it appears that most have completed the calculation, Then reveal the answer. Discuss mistakes made by participants.) P = x 0.4 x = 369 (1-P) = children d = 0.05 31

32 Step 5: Adjust sample size for individual non-response
Some children will refuse or be unavailable Therefore, inflate sample size to account for individual non-response Example: If 10% non-response (90% response): Sample size x 0.9 = new sample size 369 / = 410 children The formula gives us the number of children for whom we need to determine whether or not they have anemia But no survey has 100% response rate, we must adjust for individual non-response Some children (or their mothers) will refuse or be unavailable To meet our desired precision, we must actually collect data on 369 individual children As a result, we inflate the sample size to account for non-response If we expect that 10% of children will not give us any data, this is 10% individual non-response We could consider this 90% individual response Therefore, the number of selected children multiplied by 0.9 gives us the number of children from whom we expect to collect data To correct for non-response, we divide the calculated number of children for whom we need data (369) by the response rate to (0.9) get an adjusted sample size We would therefore randomly select 410 individuals from who to try to collect data

33 Step 6: Adjust sample size for number of eligible individuals
If the unit of analysis is the individual within the household there is a need to readjust the number of selected households Example: If 0.7 children per household: Sample size (children) / 0.7 = Number of HHs needed 410 = 586 HHs 0.7 But most surveys select households at random from the population, not individuals Remember our discussion of terminology If we are measuring anemia in young children, young child is the unit of analysis If we are selecting households at random from the population, the basic sampling unit is household Therefore, the unit of analysis is different from the basic sampling unit The sample size calculation so far gives us the number of individuals who must be included in the sample But we are selecting households, not individuals We must calculate the number of households to select to recruit the required number of individuals In our example, we divide the number of children we need to include in the sample by the average number of children per household to get the number of households we need in the sample to have the right number of individual children. We find out that each household in the population has, on average, 0.7 children < 5 years of age We divide the number of children <5 years of age we need to include in the sample (410) by the average number of children < 5 years of age per household (0.7) to get 586 households. This tells us that we need to have 586 households in the sample to have 410 children in the sample to collect hemoglobin data on 369 children.

34 Step 7: Adjust sample size for household non-response
Some households will refuse or be unavailable Therefore, inflate sample size to account for household non-response Example: If 15% non-response (85% response): Sample size x = new sample size 586 / = 689 HHs The formula gives us the number of children for whom we need to determine whether or not they have anemia But no survey has 100% response rate, we must adjust for individual non-response Some children (or their mothers) will refuse or be unavailable To meet our desired precision, we must actually collect data on 369 individual children As a result, we inflate the sample size to account for non-response If we expect that 10% of children will not give us any data, this is 10% individual non-response We could consider this 90% individual response Therefore, the number of selected children multiplied by 0.9 gives us the number of children from whom we expect to collect data To correct for non-response, we divide the calculated number of children for whom we need data (369) by the response rate to (0.9) get an adjusted sample size We would therefore randomly select 410 individuals from who to try to collect data Final sample size 34

35 Sample size for more than 1 key indicator
Decide what are the key indicators Calculate sample sizes for each individual key indicator Choose largest sample size required Example: Key indicator Children <5 Households Wasting 300 200 Stunting 600 400 % of HHs with poor food consumption - 350 But most surveys measure more than 1 outcome How to we calculate the sample size for the survey? (Read bullet points) 35

36 Single survey using two-stage cluster sampling
2 1 3

37 Introduction Cluster sampling results in loss of precision compared to simple random sampling. When calculating sample size, must increase sample size to obtain the same precision. When calculating confidence intervals during data analysis, must take into account cluster sampling. (Read bullet points)

38 Calculate sample size – single survey using two-stage cluster sampling
To estimate sample size, you need to know: Estimate of the prevalence of the key indicator (e.g. rate of stunting) Precision desired (for example: ± 5%) Level of confidence (always use 95%) Expected response rate Population For nutrition surveys: number of eligible individuals per household The last two items in the list are in grey because they are not really worth considering The total size of the population has little influence on the sample size if the total population is 10,000 or greater Because the population in which assessment surveys are done are usually larger than this, you do not usually need to take the population size into account The level of confidence is also not really a factor. We almost always use 95% confidence intervals If for some reason you want to use 90% or 99% or other confidence intervals, you can look up the correct formula in any statistics book. IN ADDITION TO THE ABOVE, THE DESIGN EFFECT

39 6. Calculate sample size Cluster surveys
To calculate sample size for estimate of prevalence with 95% confidence interval taking into account cluster sampling N = DEFF x x (P)(1-P) d2 DEFF = Design effect 1.96 = Z value for p = 0.05 or 95% confidence intervals P = Estimated prevalence d = Desired precision (for example, 0.05 for ± 5%) (Read first paragraph text) The formula is the same as the formula for simple random sampling, but now you must multiply the simple random sampling sample size by the design effect.

40 Design effect Design effect increases when To minimize design effect
Key indicators are highly geographically clustered (e.g. water source, access to health care) When number of clusters are decreased and size of clusters are increased To minimize design effect Include more clusters of smaller size Stratify sample into more homogeneous groups (Read bullet points)

41 Example: Prevalence of malaria in 6 villages
= child with malaria (Read bullet points) In the following slides, children with and without malaria will be represented by these pictures = child without malaria

42 Example 1: Malaria evenly spread throughout population
Prevalence = 50% Prevalence = 50% Prevalence = 50% Village E Village A Village C Village F Village D Village B This schematic shows 6 villages each with 4 children Malaria is distributed equally in all villages, or clusters, so that the prevalence of malaria is the same in each village (50%) This is the situation in which all parts of the population are about equal; there is little variation in your health outcome within the population

43 Example 2: Malaria unevenly spread throughout population (2 clusters)
Prevalence = 50% Prevalence = 100% Prevalence = 0% Village A Village C Village E Village B Village D Village F However, in this situation, malaria is spread unevenly throughout the population Some villages have no malaria, while some villages have lots of malaria Perhaps the villages on the left are in the mountains above the altitude at which malaria is transmitted. The villages on the right may be in the swampy lowlands where malaria is almost universal Now let’s see what happens if we select samples of 2 villages from among these 6 villages

44 Example 3: Malaria unevenly spread throughout population (3 clusters)
Prevalence = 83% Prevalence = 17% If we select 3 villages instead of 2, then the most extreme estimates of malaria prevalence are 17% and 83% If we randomly selected the orange sample of 3 villages, our estimate would be 17% If we randomly selected the purple sample of 3 villages, our estimate would be 83% These estimates are not as bad as the extreme estimates of 0% and 100% which are possible if only 2 villages are selected The variance is lower and the precision is now better.

45 6. Calculate sample size Cluster surveys
Unsafe Safe Some health and nutrition outcomes are usually very clustered One example is water source People in the same village often have the same water source Therefore, there are large differences between clusters CLICK TO REVEAL GREEN OVALS In this group of 6 villages, 3 villages have a covered borehold, a safe water source, as shown in green ovals CLICK TO REVEAL RED CIRCLES 3 villages get water from a stream or river, an unsafe water source, as shown in red circles You can see that if 2 villages with a borehold are randomly selected, the estimated prevalence of safe water source would be 100% If 2 villages getting water from a river or stream are randomly selected, the estimated prevalence of safe water source would b 0% This outcome is highly clustered The design effect for water source is often as high as 4 or 5 This means that to get the same precision as a survey using simple random sampling, you must select 4 or 5 times as many households if you are using cluster sampling

46 Where do you get design effect to calculate sample size?
Ideally prior surveys We often use ‘2’ as an estimate for a two-stage cluster sampling (comes from immunization coverage in rural Africa) (Read bullet points)

47 Example: Design effect for selected key indicators in Mongolia
assumed Iodated salt 4.5 Stunting in children 1.5 Nutritional status in mothers 1.3 Anemia in children 2.1 Anemia in mothers 1.9 Why is this so high? Here are the design effects calculated from the data in a survey in Mongolia. Why is the design effect so high for salt iodation?

48 Example: Design effect for selected key indicators in Mongolia (cont.)
Percent of salt positive for iodate, by province <33% iodine positive >66% iodine positive 33-66% iodine positive Ulaan baatar This map shows the province-specific prevalence of salt iodation in households selected for the survey. Note that salt iodation is relatively rare in the western provinces, as shown in blue Salt in these provinces was either imported from Kazakstan where no iodation was done or dug from the beds of dried salt lakes by individual households Iodation is relative common, in more than 2/3s of households, in the central provinces closest to the capital Salt in these provinces came from the main salt factory in the capital city Thus, clusters were very different from each other Clusters in the west had low prevalence Clusters near Ulaan Baatar had high prevalence This produced a high design effect

49 Example: Design effect for selected key indicators in Mongolia
If sample size for simple random sampling = 120 What is sample size for a 2-stage cluster survey if iodated salt is key indicator (DEFF 4.5)? DEFF = 4.5, therefore 120 x 4.5 = 540 (Ask the question “What is the sample size for a cluster survey?” and solicit answers from the participants. You may need to remind participants that the assumed design effect is 3. When ready reveal answer and discuss mistakes made by participants.)

50 How do we decide on how many clusters?
More clusters of smaller size results in smaller design effect But more clusters increases cost and time required Fewer than 30 clusters results in high design effect But >30 clusters doesn’t decrease design effect much (Read bullet points)

51 Comparing two surveys

52 Comparing two surveys - Introduction
When comparing 2 surveys Want to be sure any difference is not due only to sampling error Uses different formula To calculate sample size, make following assumptions: Estimated prevalence in survey 1 Decide difference between 2 surveys you want to be able to detect So far, we have calculated the sample size for only 1 survey to achieve a certain desired level of precision We can also calculate the sample size needed to compare 2 surveys (Read bullet points)

53 Comparing two surveys - Formula
Calculate sample size to compare prevalence estimates from 2 surveys; both surveys with equal sample sizes This is the formula to calculate sample size to compare prevalence estimates from 2 surveys, assuming both surveys will have the same sample size In this formula, you must make assumptions of the prevalence in both surveys But you can also make an assumption about the prevalence in one survey, then decide how much of a difference you want to be able to see with statistical significance Then calculate what the prevalence in the other survey must be

54 Comparing two surveys – Formula (cont.)
But no need to memorize these equations: Found in most statistics books (e.g. FANTA sampling guidelines) Use computer to calculate sample sizes, including EpiInfo, nutri-survey, etc. (Read bullet points)

55 Two-stage cluster survey – Drawing the sample – First stage

56 First stage – Cluster requirements
Size Smaller clusters make second stage sampling easier But not too small – each cluster should have the required minimum number of households Known boundaries Must be able to tell if individual households belong to a cluster or not Possible clusters: villages, blocks, enumeration areas (Read text and bullet points)

57 What does proportioal to population size (PPS) mean?
Larger clusters are more likely to be chosen than smaller clusters If choose PSUs probability proportional to size (PPS), probability of any single household or person in population being chosen is the same 50 HHs 10 HHs (Read bullet points)

58 How to select clusters Step 1: Make a list of all clusters
Step 2: Add a column with the population of each PSU Step 3: Add a column with the cumulative populations Step 4: Calculate the sampling interval as . Total population of all clusters . Number of clusters in the sample Step 5: Select a random number between 1 and the sampling interval – where this number lies in the cumulative population column is the first cluster Step 6: Add the sampling interval to the random number – this is the second cluster Step 7: Continue to add the sampling interval to select clusters

59 Example: Select 5 out of 17 EAs (PPS)
Enumeration area No of HHs Cum no of HHs A 250 B 90 340 C 100 440 D 130 570 E 75 645 F 490 1135 G 30 1165 H 280 1445 I 270 1715 J 40 1755 K 160 1915 L 285 2200 M 410 2610 N 200 2810 O 50 2860 P 460 3320 Q 25 3345 Sampling interval: 3345/5=669 Select random start (randomly one number between 1 and 669; e.g. 300) Add sampling interval 1. selected EA 300 2. selected EA 968 3. selected EA 1636 4. selected EA 2304 5. selected EA 2972

60 Two-stage cluster survey – Drawing the sample – Second stage

61 How would you select HHs within a cluster?
Option 1: Simple random sampling Option 2: Systematic random sampling Option 3: EPI method (bottle spinning) Option 4: Spatial sampling Option5: Segmentation method

62 Example: Option 1 Simple or systematic random sampling of households 1
HOUSEHOLD LIST 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Etc. 1 2 3 26 14 13 12 10 9 8 7 5 4 29 21 28 27 25 24 22 20 16 15 18 19 11 17 30 6 23 Perhaps the best way to select households is to find or make a list of all the eligible households in the village Number the households on the list Then select as many random numbers from the random number table as you need households This way, every household has the same likelihood of being selected But you must be careful If you use someone else’s list, you must be sure it is up-to-date and includes all households currently living in the village If it is not, you need to add missing households to the list before sampling If you make your own list, you must be sure no one is left off of it Be sure certain types of households are included, such as Households headed by women Households of ethnic minorities Households of lower class or caste Etc. 62

63 Example: Option 2 Systematic random sampling if HH list is not available 5 4 6 2 8 3 7 13 1 12 9 11 10 14 26 25 23 28 27 30 22 15 24 16 21 29 17 18 20 19 63

64 Example: Option 3 EPI (bottle spinning) method of selecting households
1 We will use this schematic map of a village to illustrate this method and the problems with it Each rectangle is a house, and the village has 30 houses In our survey, we want to randomly select 10 of these houses for our survey sample The method consists of the following steps: (Reveal the pink dot at the “center” of the village) Go to the center of the village. In this schematic, the center is the pink dot (Reveal the yellow arrow) Spin a pencil or bottle to select a random direction from the center. In this schematic, the yellow line is the randomly selected line has been randomly selected Then walk along this line counting houses which fall on this imaginary line (Click mouse to count houses) Number these houses 1 through whatever number of houses there are. In this schematic, there are three houses which lie on this line. Randomly select one of these houses. Let’s say we randomly select house #3. This is the first house to include in the survey sample from this PSU (Click the mouse to show movement to next house) When finished with data collection at this house, go to the next closest house and include it in the survey sample Continue selecting the next closest house until the correct number of houses have been completed in this cluster But this method has some serious problems 2 3

65 Example: Option 3 (limitations)
EPI (bottle spinning) method of selecting households 10 9 5 3 7 4 Yet another problem with the EPI method of selecting households After the first house is selected by choosing a random line the next houses are selected by which is the closest to the house which the survey team just completed. In most villages, like in our schematic, houses at the center of the village are closer together than houses at the periphery. Therefore, if the team moves to the next closest house, it will move toward the center of the village (Click mouse to indicate the first house) Let’s assume that the house on the line farthest from the center of the village is randomly selected as the first house Now after each house is completed, the next closest houses is selected The selection of the 10 houses for the survey sample might look something like this: (Mouse click to start house selection) As you can see, the general movement is toward the center of the village where houses are closer together But does this really produce a sample biased toward the center of the village? (Reveal circle in the center of the village) This circle includes or touches on the central 15 houses in this village Remember that this village contains 30 houses, so the circle includes the 50% of houses in this village which are closer to the center The circle includes 7 of the 10 selected households As a result, 70% of the sample consists of houses closer to the center of the village Therefore, the EPI selection method has produced a sample with a disproportionate number of houses closer to the center of the village We have demonstrated two ways in which the EPI method is more likely to select houses near the center of the village (Ask question “Who lives near the center of the village? Are they different from people living on the peripheray?” and solicit answers from the participants. When ready, suggest that perhaps richer people live nearer the center. Or in some places like the United States, poorer people often liver nearer the center of towns and cities.) As a result of these differences, the people and households selected for the survey sample will be different from the overall population. This is sampling bias – you select a non-representative sample because not everyone in the sampling universe has the same chance of selection. 8 6 2 1

66 Example: Option 4 Dart throw method (or spatial sampling) of selecting households Let’s discuss some other ways to select households Again, we have our village If we draw a map of the village, something like our schematic Then throw a dart at random at the map Wherever the dart lands we pick the closest household to the dart This type of spatial sampling is done frequently using a randomly selected compass coordinates Locating a randomly chosen latitude and longitude is now easy using GPS However, there is a big problem (Reveal first polygon) If the dart lands anywhere within this orange box, the orange house is selected (Reveal second polygon) If the dart lands anywhere within this yellow box, the yellow house will be selected You can immediately see that the orange house is much more likely to be selected than the yellow house In general, with spatial sampling, houses which are farther apart have a greater likelihood of being selected than houses which are closer together This is the opposite bias as seen with the EPI sampling method Now households in the center are LESS likely to be selected

67 Example: Option 5 Segmentation method of selecting households 1 2 3 6 4 Anothermethod of randomly selecting a sample is called segmentation The village is divided up into segments each containing approximately equal numbers of households Each segment is numbered Then regular random tecnniques are used to select as many segments as you need to get the correct number of households Then include all the households in each selected segement in the survey sample (Reveal segments) In this schematic, our village is divided into 6 segments each containing 5 households These segments are numbered 1 through 6 We need 10 households, so we select 2 random numbers between 1 and 6 from our random number table to select 2 segments All 5 households in each selected segment would then be recruited into the survey This is a frequently used sampling technique for large, nationwide surveys These surveys have resources to map out each selected PSU However, in more urgent surveys with fewer resources, this technique may not be feasible 5

68 How to select individuals within HHs?
In a household with more than 1 eligible child, include all in survey or only 1? In most cases, include all eligible persons in selected households Why? (Disadvantages of selecting 1 child) Requires additional sampling step Produces biased sample Requires weighted analysis Let’s discuss a couple of sampling questions which are frequently asked and frequently done incorrectly First, a household with more than 1 eligible child, should you include all the children in the survey or only 1 child Let’s first ask ourselves some other questions (Reveal next question) Do all eligible children have the same probability of selection for the survey sample? If the basic sampling unit is the household, as it usually is, what is the probability of selection for a single child in a household?” and solicit answers from the participants What is the probability of selection of an individual child if two eligible children live in the same household and you select only one?” (Solicit answers from the participants. When ready, explain that the answer to the first question is that the child’s selection probability is the same as the selection probability for the household And that if you pick only one of two children in a household, that child’s selection probability is only ½ the selection probability of the household.” ) Therefore, all children in the population do not have the same selection probability The selection probability depends on how many eligible children live in the household (Reveal and read the next set of bullet points)

69 Replacing households (?)
If household unavailable, replace with neighboring household? Recommendations Make an accurate estimate of household non-response when calculating sample size Do not replace missing households If necessary, select a few more households per cluster to use if household non-response is much greater than predicated The next issue is if a household is unavailable, should that household be replaced with a neighboring household (Reveal next set of bullet points and ask questions “Is the replacement household randomly selected?” and “Is the resulting sample more or less representative of all househholds?” Solicit answers from the participants When ready, suggest that no, the replacement household is NOT randomly selected It was selected because it happened to live near a randomly selected household Also, the sample with replacements would be less representative of the entire population. If a household is missing for certain reasons, replacing that household with one that is available may bias the sample For example, if that household is missing because the mother has taken the child to the hospital because of illness, replacing that child with a child from a household which is home may result in a sick child being replaced by a healthy child. This would bias the sample toward healthy children (Reveal and read next set of bullet points)

70 Developing a sampling plan

71 What information should a sampling plan contain
Objective(s) of assessment/survey Sampling methodology Sample size calculation (including assumptions) Description of how sample is selected (1. stage/ 2. stage) Work plan and data collection staff required Who is responsible for what?

72 Data collection staff and time required
Considerations: How long does one interview take? What are the walking distances from HH to HH? How many interviews can be conducted by one enumerator per day? What are the driving distances between enumerators For nutrition, 2 additional enumerators are required  Data collection should not take longer than 1 month

73 Resources for sampling
EFSA guidelines CFSVA guidelines CDC/WFP Manual: Measuring and Interpreting Malnutrition and Mortality SMART guidelines Software for emergency nutrition assessment: OpenEpi sample size calculator: Sampling guidelines for vulnerability analysis: EFSA: T-square method: FANTA Sampling Guide:

74 Thank you

75 Final exercise: Developing sampling plan for case study
Case study: You are requested to conduct a country-wide survey in Yemen with the objective to assess the level of HH food insecurity and child malnutrition representative at agro-zone level. Key indicators (prevalence from previous studies): HHs with poor consumption = 11.8% Children (6-59 months) wasted = 12.4% Children (6-59 months) stunted = 53.1% Other factors: On average, there is 1.2 children (6-59 months) in each HHs It is estimated that 10% of HHs are not available or refuse to respond Survey should take no longer than 4 weeks

76 Final exercise: Tasks Decide on appropriate sampling method
Calculate required sample size taking the key indicators into account Decide on required level of precision Use OpenEpi sample size calculator Estimate how many enumerators will be required if average travel time between clusters is one day and 1 enumerator can complete 5 questionnaires per day Prepare a sampling plan (bullet points)

77 OpenEpi Sample Size calculator


Download ppt "Sample size calculation and development of sampling plan"

Similar presentations


Ads by Google