Presentation on theme: "Social Epidemiology and Dental Public Health دکتر سید ابراهیم جباری فر تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر."— Presentation transcript:
Social Epidemiology and Dental Public Health دکتر سید ابراهیم جباری فر تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر
Aims and Objectives Aim: by the end of the session you should be able to choose and appropriate sampling framework for your research Objectives: by the end of the session you should be able to Recognize the difference between a sample and a population, a survey and a census Understand and take account of sampling error and bias Recognize sources of bias (and avoid them) Recognize and use stratification, why it used and what for Identify the (dis) advantages of random sampling, cluster sampling, multi stage sampling and non-random sampling Understand the concepts of design effects and sampling framework
Samples and populations Sample A sample is a portion of the whole population which should be representative of the population of interest. Population The set of individuals, items, or data from which a statistical sample is taken. Survey and Census Survey (sample) Census (whole population) Target (reference) population The one of interest e.g.all universities in London, all students in London. Study population The one from which we draw our sample (maybe more limited) e.g. HSE wants to measure health of population of England, but does not sample people living in institutions
Sampling example We want to know what is the favorite food in England We cant ask everybody in England (the population) so we decide to ask (a sample of) 100 people. Answers depend on the characteristics of the 100 people. e.G. Of 100 children aged 5-15, 80% said chips. Of 100 People whilst they were eating in an Italian restaurant 50% said pizza. We asked 100 commuters at the tube station at 9 am and 75% said corn flakes We want 100 people who are representative Successful sampling depends on where, when and how you do it.
Equal probability of selection method sampling EPSEM – probability samples where each observation in the population has the same known probability of being selected into the sample. EPSEM samples have certain desirable properties; e.g. simple formulas for computing means, standard deviations, and so on can be applied to estimate the respective parameters in the population. Random sampling: removes selection bias; know probability of being chosen needed for statistical estimations
How not to select an EPSEM sample Haphazardly (not the same as randomly in statistical Terminology) Knocking on some doors, going to some streets, but not others; judgment alone What the investigator thinks is representative interesting Accessibility – convenience sampling and volunteer sampling The first 100 people that you successfully stop on the street. All patients who attend your drop – in clinic. The first 200 people who respond to an email sent around UCL. The first 50 people who volunteer (e.g.for payment) to be in your study Quota and purposive sampling Judgment and accessibility : A pre- determined (using judgment) number of individuals from sub groups e.g. age groups. Sex. From those easily available. Snowball / chain The first interviewee is asked to suggest other interviewees and so on.
Pros and cons Bias : You might avoid poor quality housing, tow blocks, badly lit streets. Attendees at clinic may be different to non – attendees. The people that respond to an email/ letter/call may have particular point of view Not way of measuring how representative Cannot calculate sampling error Judgment/ snowball – may be useful for difficult to find groups individuals Convenience – cheap – may be good for exploratory research Quota and purposive – similar to stratified sometimes used in pilot studies; often used in market research
Sampling Simple random sampling Draw up a list of all units and randomly sample them. Number 1 to N. random number tables, calculators, RAND command in SPSS, RAND and RAND Between in Excel Lottery – suitable for small and geographically compact. Often used as final method e.g.to select houses within streets. Excerpt from a random number table 68948438401857754149 89115861029137211 5694212197898363335 279817314446965776
Alternative types of random sampling Systematic Cluster Stratified Random walk Multi stage (may use a combination of several methods) PPS Sampling (probability proportional to size)
Systematic sampling Selecting every kth unit Clinical records from a drawer Houses along a street Phone numbers on a page Postcodes from PAF Sample size n= 100 Population size 1000 1 in 10 sample (100/1000)so start randomly between 1 and 10, RAND = k=5; 5+5=10; 10+5=15; 15+5=20… Not to be used if there is a cyclic pattern to units on the list
Stratified random sampling Independent for each geographical or other region – Strata Random sample for each strata, sample schemes may be different in each strata e.g. urban/rural e.g.SHS: 10,000 people from Scotland (population 5 million)
Sampling framework Depending on your study design the sampling framework may or may not have been determined beforehand. Secondary analysis Analysis of previously collected data- a sampling framework will exist. You may need to take this into account statistical analysis e.g.the used of weights and stratifying variables. Primary analysis If you design your data collection / collect your own data you will need to design your sampling framework
Sample design Scheme of selecting sampling units from which to sample Sampling unit: individual, household, university etc. Sampling frame – list of units in population Uses a list of units e.g. adults in England from electoral register, list of patients registered at GP, small areas in Wales from census, villages in India, postcodes from PAF pitfalls: Missing / Underrepresented groups e.g. Immigrants and homeless; harder in developing countries where no (reliable) census
Sample Error The results from any sample are very unlikely to be equal to those from the whole population Taking any sample will lead to two types of error. Error= random error (chance)+ systematic error (bias) Sample size reduces random error n=1000>> better than n=100 Larger samples can have more or less systematic error (be biased).
Other sources of error Non response: not everyone responds to survey questions and those who are unable to or unwilling to do so may differ from those that respond. What are limitations of surveys carried out at peoples private homes during the daytime? Measurement error: when the question is worded badly or in the interviewer asks in ascertain way (e.g.using) emphasis or (inappropriate) probing or follow up? E.g.are you really sure..? Are any of the girls in your class naughty, or is it just the bodies?
Cluster sampling Units sampled are chosen in clusters, close to each other e.g. households in the same street The population is divided into clusters, and some of these are then chosen at random. Within each cluster units are then chosen by simple random sampling or some other method. Ideally the clusters chosen should be dissimilar so that the samples is as representative of the population as possible.
Two stage sampling If there is no sampling rime available (developing countries or face to face interviews are required and population is geographically dispersed Draw up a list of primary sampling units (PSUs) natural groups / clusters e.g. streets Select a random sample of clusters Draw up a sampling frame of second stage units e.g. households or individuals Select a random sample of second stage units Advantages: Reduced cost and reduced work Disadvantages: Less precise, but may allow for larger sample size by saving time and money.
Multi stage sampling Two stage is simplest from Three stage example – patient satisfaction survey - Selection of sample of health authorities (Primary sampling unit- PSU) - Selection of a number of GP practices within each HA (sampled using probability proportional to size- PPS ) - Systematic selection of patients from each GP register
PPS Sampling Create a list of health authorities clusters (no particular order) with cumulative population size Select a systematic sample (random start) Total cumulative population size + the number of clusters= sampling interval Select units within the cluster list based on random start + sampling interval May have more than one unit in some clusters Will be covered in advanced statistics course
Implications of sampling design for analysis Need to use weights if not using EPSEM If not using simple random sampling the sample design effects or DEFTs impact on analysis. DEFTs need to be taken into account when determining sample size. DEFTs are related to standard errors.