Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating the size and characteristics of MARPs using Network Scale-up Chris McCarty PHC6716 July 20, 2011.

Similar presentations


Presentation on theme: "Estimating the size and characteristics of MARPs using Network Scale-up Chris McCarty PHC6716 July 20, 2011."— Presentation transcript:

1 Estimating the size and characteristics of MARPs using Network Scale-up Chris McCarty PHC6716 July 20, 2011

2 The Problem Certain populations are at high risk for contracting and spreading HIV Most At Risk Populations (MARPs) typically fall into one of three categories – Female Sex Workers – Men Who Have Sex With Men – IV Drug Users Members of all three populations engage in behavior that increases the chance of contracting HIV All three populations are difficult to measure directly

3 What is known about MARPS? Many studies have been done to estimate the prevalence of HIV among these populations, and to measure characteristics of the populations Representative samples are drawn to estimate the proportion of the population with HIV Unlike other sample surveys there is no known population size to which these proportions can be applied This means that the size of the problem remains largely unknown, particularly on a regional or local level

4 Methods to Estimate the Size of MARPs (http://data.unaids.org/pub/Manual/2003/20030701_gs_estpopulationsize_en.pdf)http://data.unaids.org/pub/Manual/2003/20030701_gs_estpopulationsize_en.pdf Methods that require a sample frame – Census Counting all members – Enumeration: Counting members in a sample frame then scaling up – Population Survey: Draw a representative sample (similar to enumeration) Methods that do not require a sample frame – Capture-Recapture – Multiplier

5 Capture-Recapture Method originated in biology to estimate the size of fish and wildlife populations The method involves five steps: 1.“Capture” a sample of subjects 2.Tag them 3.Release them back into the population 4.“Re-capture” a sample at a later time 5.Estimate the population size based no the proportions With humans “tagging” is sometimes done by providing a unique object Otherwise tagging is done by using information about respondent (e.g. Social Security Number or other identifying characteristic)

6 Capture-Recapture (cont.) N=MC/R where: – N = Estimate of total population size – M = Total number captured and marked on the first visit – C = Total number captured on the second visit – R = Number of captured on the first visit that were recaptured on the second visit Example: M=200, C=200, R=10 then N=200x200/10=4,000 Assumes a closed system without in or out migration More complex models allow for multiple sites

7 Multiplier Relies on overlap of information between two sources: 1.Data on attendance by target population at an institution that serves them (e.g. a clinic) 2.Data from target population about their attendance Example – Clinic screened 3,500 sex workers in a two week period – A survey of 600 sex workers yielded 404 who said they had been screened – The multiplier = 600/404 = 1.49 – The population estimate = 3,500 x 1.49 = 5,215

8 Problems with these approaches All these methods require interviews with members of the target population The Census, Enumeration and Population Surveys require sample frames which are lacking for hidden or elusive populations The Capture-Recapture and Multiplier methods are difficult to do across large geographies

9 A note on RDS Respondent Driven Sampling (RDS) is a method to measure the characteristics of an elusive population (http://www.respondentdrivensampling.org/)http://www.respondentdrivensampling.org/ This starts as a snowball sample where respondents recruit other respondents using coupons Each respondent must report the names of others they know in the population RDS requires completion of minimal chains without breaks (RDS does not work in disconnected populations) RDS is a weighting procedure that adjusts for the non-random procedure for collecting the data RDS will NOT give estimates of the size of a population

10 An Alternative: Network Scale-up This is a population-based survey approach that does not require a sample frame of the target population The method relies on asking respondents (not necessarily in the target population) about people they know in the target population – Not talking to the target population was politically unpopular in the Ukraine This is a method developed over the past 20 years but has recently gained recognition

11 Background on Network Scale-up The idea came from Russ Bernard after the 1985 earthquake in Mexico City Official reports estimated deaths at around 7,000 These estimates did not jibe with anecdotal reports from residents many whom knew someone who died in the earthquake, and opposition newspapers who had the death toll as high as 22,000 A study was conducted asking a random selection of 400 residents how many people they knew who had died – 23 percent said they did A model was created to estimate what their personal network size would have to be to account for this high percentage – this suggested a much higher death rate Later reports by the Red Cross established the deaths at more than 25,000

12 This suggests a primitive model t = the size of a population (e.g. Mexico City) e = the size of some subpopulation within it (e.g. all those who died in the earthquake) m = the number of people a respondent knows in e c = personal network size Assumption: Everyone’s network in a society reflects the distribution of subpopulations in that society

13 New model We can use reports about many populations of known size to back estimate personal network size c Given an individual c and reports of an unknown population m we can then back- estimate e

14 What we did We conducted a series of telephone surveys For each respondent we asked how many people they knew in populations of known size We also asked how many people they knew in populations of unknown size with estimates from other sources We estimated the distribution of c and back-estimated e for each unknown subpopulation 1998 Killworth, P.D., E.C. Johnsen, C. McCarty, G.A. Shelley, and H.R. Bernard. A Social Network Approach to Estimating Seroprevalence in the United States. Social Networks 20:23-50. 1998 P. D. Killworth, C. McCarty, H. R. Bernard, G. A. Shelley, and E. C. Johnsen. Estimation of Seroprevalence, Rape and Homelessness in the U.S. Using a Social Network Approach. Evaluation Review 22:289–308. Killworth, P. D., C. McCarty, H. R. Bernard, G. A. Shelley, and E. C. Johnsen. Estimation of Seroprevalence, Rape and Homelessness in the U.S. Using a Social Network Approach. Evaluation Review 22:289–308

15 Populations in survey Known PopulationAverage known Known PopulationAverage known Unknown Population Average known Native Americans3.5Michael4.8HIV positive0.7 Gave birth in past 12 months 3.6Christina1.3Women raped in past 12 months 0.2 Women who adopted a child in past 12 months 0.3Christopher1.8Homeless0.7 Widow(er) under 65 years old 3.2Jacqueline0.7 On kidney dialysis0.6James3.4 Postal worker2.2Jennifer2.3 Commercial pilot0.7Anthony1.7 Member of Jaycees1.1Kimberly1.4 Diabetic3.3Robert4.1 Opened a business in past 12 months 1.1Stephanie1.3 Have a twin brother or sister 2.0David3.5 Licensed gun dealer0.5Nicole1.1

16 Estimates of unknowns RDD telephone survey of 1554 adults in the U.S. in 1994. – Seroprevalence: 800,000 ± 43,000 – Homeless: 526,000 ± 35,000 – Women raped in the last 12 months: 194,000 ± 21,000 These were all close to other estimates made with various enumeration or surveillance methods. 1998 P. D. Killworth, C. McCarty, H. R. Bernard, G. A. Shelley, and E. C. Johnsen. Estimation of Seroprevalence, Rape and Homelessness in the U.S. Using a Social Network Approach. Evaluation Review 22:289–308.

17 Estimates of c are reliable across multiple surveys Across seven surveys, we consistently find an average network size (c) of 290 (sd 232, median 231). And 290 is not an average of averages. It’s a repeated finding.

18 Is 290 is an artifact of the method? We tested this in three ways: 1.Make the estimates using a different method. 2.Experiment with parameters and see if the outcome varies in expected ways. 3.Compare values of c across populations of known relative sizes.

19 Reliability I: Compare to a different method In one survey, we estimated c by asking people how many people they know in each of 16 relation categories and summing. We also used the known populations The summation method produced a mean c of 290.7, while the known population method produced a mean c of 290.8 McCarty, C., P. D. Killworth, H. R. Bernard, E. Johnsen, and G. A. Shelley. Comparing Two Methods for Estimating Network Size. Human Organization 60:38–39 CategoryAverage known Immediate family3.5 Other birth family24.0 Family of spouse or significant other12.3 Co-workers35.6 People at work but don’t work with directly 62.1 Best friends/confidantes4.3 People know through hobbies/recreation 12.3 People from religious organization43.4 People from other organization17.1 School relations18.3 Neighbors12.8 Just friends22.6 People known through others22.6 Childhood relations6.8 People who provide a service7.7 Other3.9

20 Reliability II: Change the data We changed reported values at or above 5 to a value of 5 precisely. – The mean dropped to 206, a change of 29%. We set values of at least 5 to a uniformly distributed random value between 5 and 15. We repeated the random change (5 – 15), but only for large subpopulations (with >1 million). – The mean increased to 402, a change of 38% -- in the opposite direction.

21 Reliability III: Survey a population with en expected large network size We surveyed a national sample of 159 members of the clergy – people who are widely thought to have large networks. Mean c = 598 for the scale-up method Mean c = 948 for the summation method

22 So, 290 is not a coincidence 1.Two different methods of counting produce the same result. 2.Changing the data produces large changes in the results, and in the expected directions. 3.People who are widely thought to have large networks do have large networks.

23 Can we predict what we do know? We can test our model by seeing how well we do on the 29 populations of known size The overall result is encouraging, but we don’t estimate some populations well There is a tendency for people to overestimate small populations ( 3 million). The two largest populations are people who have a twin and diabetics, the two outliers in the upper left Without these two outliers, the correlation rises from r =.79 to r =.94

24 Another encouraging result Charles Kadushin ran a national survey to estimate the prevalence of crimes in 14 cities, large and small, in the U.S. He asked 17,000 people to report the number of people they knew who had been victims of six kinds of crime and the number of people they knew who used heroin regularly. 2006 C. Kadushin, P. D. Killworth, H. Russell Bernard, and A. Beveridge. Scale-up methods as applied to estimates of heroin use. Journal of Drug Issues 36:417-440.

25 Compromising assumptions Barrier Error – Everyone in t has an equal chance of knowing someone in e. Transmission Error – Everyone knows everything about everyone they know. Inaccurate recall – People don’t recall accurately the number of people they know in the subpopulations we ask them about.

26 Correlation between the mean number of Native Americans known and the percent of the state population that is Native American is 0.58, p = 0.0001. Barrier Error

27 Network social barriers Race (African Americans may know more diabetics than White people do.) Gender (Men may know more gun dealers than women do.) Even first names are associated with the barrier effect. We address the barrier effect by using a random, nationally representative sample of respondents

28 Transmission Error Study We recruited 30 members of one of the known populations used in the network scale-up method. We randomly selected male and female first names proportionate to the 1990 U.S. Census. For each of 25 hits, the respondent provided some information about the alter, including the alter’s phone number. Total=30x25=750. We contacted 220 of 750 named alters and asked them things about themselves and about ego.

29 Findings from the study We see from this table that it is much easier to know people in some populations than in others. It is much easier to know that someone is a kidney dialysis patient than it is to know that they are a diabetic. Diabetes is much less visible. Population% who knew % who did not know Respondents# of alters Am. Ind.1000212 Diabetic5545644 Birth in last 12 mos. 937327 Gun dealer928112 Member of JC’s 5842112 Dialysis8812526 Business in last 12 mos. 7525416 Postal worker 1000110 Has twin8812224 Widowed <65 973438

30 Can we account for these errors? Can we use this kind of information to tweak the model? We tried to develop weightings for classes of characteristics about subpopulations … classes like “things that carry a strong stigma” and “things that carry a moderate stigma” and “things that just don’t come up in conversation.” While we found some signals like these, we don’t know how to know whether two populations require the same weighting. Matt Salganik has recently completed a study in Brazil attempting to refine these weights

31 Informant Inaccuracy We tried procedures to improve accuracy 1.Asking respondents to provide names for all the knowns they nominated 2.Asking respondents to report on knowns twice 3.Asking respondents on a scale of 1 to 5 how confident they were in their answers None of these procedures changed the results much

32 Countries where Network Scale-up has been implemented United States Mexico Ukraine Moldova Peru Brazil Thailand

33 How to conduct a network scale-up survey

34 Network scale-up begins like most surveys Define respondent population Choose sample frame Choose survey mode Choose sample size Design questionnaire (This is the part that’s different)

35 Selecting respondent population Respondent population is not the same as the population to be estimated (target population) – U.S. respondents to estimate homeless population – Urban population to estimate heroin users You must know the size of the respondent population Do transmission and barrier errors suggest using a respondent population with more ties to target population? This opportunity to do this research in multiple countries could help solve this problem

36 Choose sample frame The sample frame represents the respondent population For our work we used random digit dial telephone numbers For face-to-face a general population survey may rely on census or voter registration data

37 Choosing mode There are five survey modes – Face-to-face – Telephone (this is what we used) – Mail – Drop and collect – Web There is a large literature on mode effects in surveys For the populations of interest to UNAIDS a face-to-face or mixed mode makes sense

38 Choose sample size As with any survey, the sample size should be based on expected margins of error For this survey we have margins of error associated with network size Although estimates of network size are remarkably reliable, they have large standard deviations Our data suggest that a survey of 400 respondents would generate a margin of error of ±26 alters A survey of 1,000 would generate a margin of error of ±16 alters Keep in mind these are based on variance for U.S. respondents

39 Design questionnaire Network scale-up questionnaire has three parts 1.Demographics used to estimate bias 2.Question to estimate the number of alters respondents knows in the target population 3.Questions to estimate network size (c) Steps 2 and 3 require a boundary definition of who is counted as a network alter

40 Alter boundary Definition of who is an alter can have enormous effects on the estimate Defining the alter boundary as 12 months will generate different network sizes than a boundary of two years Our definition: – You know them and they know you by sight or by name. You have had some form of contact with them in the past two years and you could contact them if you had to – Question: Should respondents be instructed to exclude those met on networking sites such as Facebook?

41 There are two ways to estimate c Scaling from known populations The summation method

42 Using known populations Select a set of known populations, the more the better Populations should vary in size and type – Limiting the study to populations related to health conditions, although plentiful, may introduce barrier error – Using only large populations (such as men or people over age 65) introduces a lot of estimation error – Using only small populations introduces error from very few hits – Known populations should be within.1% to 4% of population (this may change as we learn more) The demographic characteristics of the known populations should match as closely as possible the demographic characteristics of the population upon which the known estimates are based Populations are often related to transmission and barrier effects In the past we assumed that by using populations of multiple size and type these effects are cancelled out

43 Examples of populations we used In the U.S. there are a variety of sources for known populations: – The U.S. Statistical Abstract – The U.S. Census – The FBI Crime Statistics Ideally collection of sub-population data will be recurring so that they can be used in subsequent years It is important that the data all reflect the same year (be aware that some population data lags) Known populations are very susceptible to transmission and barrier error

44 Relationship between number known and demographic characteristics

45 We experimented with names Census provides estimates of both first names and last names We experimented with both types and found problems with each The advantage of names is that they vary in size and are typically ascribed Countries and cultures vary in the way they use names They are prone to barrier error

46 Relationship between number known and demographic characteristics

47 Summation method We can estimate network size (c) directly by asking respondents to tell us how many people they know This is an unreasonable task unless it is broken into reasonable subtasks We use culturally relevant categories of relation types that are mutually exclusive and exhaustive These are small enough that respondents can estimate them reliably

48 Relation categories we used Immediate family Other birth family Family of spouse or significant other Co-workers People at work but don't work with directly Best friends/confidantes People know through hobbies/recreation People from religious organization People from other organization School relations Neighbors Just friends People known through others Childhood relations People who provide a service Other

49 Developing a protocol for discovering summation categories We assume that relation categories used to elicit estimates will be culturally relative – Different languages will require their own category names – The way people maintain people in their mind will almost certainly vary by culture Further research is needed to determine the best protocol for discovering these categories Summation categories must be mutually exclusive, exhaustible and small enough that respondents count ratrher than estimate

50 Approaches we are studying Our current categories emerged from a previous study about the ways people know each other This is not ideally suited to this study We are exploring using cultural consensus analysis or personal network structure to quickly develop these categories An empirical approach is to start with very large culturally relevant categories and use alter characteristics to split them when they are too large

51 Estimates of network size from two methods (scaling from known and summation) are very close Scaling from known populations – 290.8 (SD 264.4) Summation method – 290.7 (SD 258.8) We checked in multiple ways to see whether this was an artifact of the method It wasn’t

52 Advantages of the summation method It is quicker, taking about half the time or less than estimating from known sub-populations It should not be subject to transmission or barrier error It does not require finding known populations, which could be a problem in some countries

53 Disadvantages of summation method It cannot be verified statistically It may be easy for respondents to double count network alters as they are multiplex relations (such as co-worker and social contact) Network size calculated from scaling known populations can be checked by back-estimating each known with the other knowns

54 Modeling issues At this point in our work we are convinced that our estimates of network size are relatively reliable, but not absolutely reliable If my network is 300 then I am confident it is half as large as that of someone with a network of size 600 I am not confident that the network size is actually 300 This compromises our ability to estimate the absolute size of a population Again, the opportunity to replicate this method may yield solutions

55 How to generate scale-up estimates There are two steps – Estimate network size c – Use c with respondents’ estimates of unknown populations to scale-up to the size of the unknown in the population We will look at these steps separately

56 Step 1: Estimating c using summation method With the summation method you add up the estimates form each relation category to get a c value for each respondent The c used in the formula will be the average of all those c values from each respondent

57 Step 1: Estimating c using known populations This procedure requires three parameters t=the size of the population to which you are scaling up (this is the same for each respondent) e=the sum of all the known populations you are using in the survey (this is the same for each respondent) m=the sum of all the reported known subpopulation sizes for each respondent c for each respondent is (m*t)/e The c used in the formula will be the average of all those c values from each respondent

58 Step 2: Applying c This step also requires three parameters t=the size of the population to which you are scaling up (this is the same for each respondent) c=the average c value, either from the scale-up or the summation method m=the average of all respondents’ estimates of the number of people they know in the unknown subpopulation The formula to estimate the size of the unknown subpopulation e=(m/c)*t


Download ppt "Estimating the size and characteristics of MARPs using Network Scale-up Chris McCarty PHC6716 July 20, 2011."

Similar presentations


Ads by Google