Small Area Estimation (in survey research). Knock! Knock! Whose there? (without opening the door) "The census taker." "Go away - I don't want my senses.

Slides:



Advertisements
Similar presentations
Dr. G. Johnson, Sampling Demystified: Sample Size and Errors Research Methods for Public Administrators Dr. Gail Johnson.
Advertisements

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
1.2.4 Statistical Methods in Poverty Estimation 1 MEASUREMENT AND POVERTY MAPPING UPA Package 1, Module 2.
Confidence Intervals for Proportions
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Synthetic estimators in Ireland Anthony Staines DCU.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Clustered or Multilevel Data
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
7-2 Estimating a Population Proportion
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Stat Notes 4 Chapter 3.5 Chapter 3.7.
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
Sampling Designs Avery and Burkhart, Chapter 3 Source: J. Hollenbeck.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
CAUSAL-COMPARATIVE RESEARCH Prepared for: Eddy Luaran Prepared by: Nur Hazwani Mohd Nor ( ) Noriziati Abd Halim ( ) Noor fadzilah.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Analysis of Clustered and Longitudinal Data
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Arun Srivastava. Small Areas What is a small area? Sub - population Domain The Domain need not necessarily be geographical. Examples Geographical Subpopulations.
Regression and Correlation Methods Judy Zhong Ph.D.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Introduction to Statistical Inferences
1 CHAPTER 7 Homework:5,7,9,11,17,22,23,25,29,33,37,41,45,51, 59,65,77,79 : The U.S. Bureau of Census publishes annual price figures for new mobile homes.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
IB Business and Management
Lesson 11 - R Review of Testing a Claim. Objectives Explain the logic of significance testing. List and explain the differences between a null hypothesis.
Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
1 Sources of gender statistics Angela Me UNECE Statistics Division.
United Nations Economic Commission for Europe Statistical Division Sources of gender statistics Angela Me UNECE Statistics Division.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Section 10.1 Confidence Intervals
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Issues in Estimation Data Generating Process:
Understanding Sampling
Chapter Thirteen Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Part III – Gathering Data
Chapter 6: 1 Sampling. Introduction Sampling - the process of selecting observations Often not possible to collect information from all persons or other.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Sampling and Sampling Distributions Basic Business Statistics 11 th Edition.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
 When every unit of the population is examined. This is known as Census method.  On the other hand when a small group selected as representatives of.
Chapter 3 Surveys and Sampling © 2010 Pearson Education 1.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Plan for Today: Chapter 1: Where Do Data Come From? Chapter 2: Samples, Good and Bad Chapter 3: What Do Samples Tell US? Chapter 4: Sample Surveys in the.
Sampling Design and Analysis MTH 494 LECTURE-11 Ossam Chohan Assistant Professor CIIT Abbottabad.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
4.05 Understand marketing-research design considerations to evaluate their appropriateness for the research problem/issue 4.00 Understand promotion and.
Part III – Gathering Data
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Adjusting Census Figures
Presentation transcript:

Small Area Estimation (in survey research)

Knock! Knock! Whose there? (without opening the door) "The census taker." "Go away - I don't want my senses taken." "No, you don't understand, I just want to survey you." "A statistical sample of one isn't valid -- go away." "You aren't the only one." "So you are bothering a whole bunch of people, go away." "Look you are unique and I don't want to miss you in the survey." "How do you know I'm unique when you haven't surveyed me yet?" "Ok, I don't know you are unique, but you might be." "You mean you think I'm an oddball." "No, maybe more like an outlier." "Now you are calling me an out and out lier, go away." "No, I mean you are far from the average Joe." "I hope so, I'm Sally. " "Look Sally, we are trying to get population data, how many people live here?" "Gosh, how would I know, I think there about 15 thousand in Smugville." "No, I mean in this house!" "Oh, that's a question of a different nature." "So, how many?" "Sometimes one, sometimes two, sometimes four, now -- go away." "No, I need a precise number." "Ok, how about 1.34" "How did you come up with that?" "I live here sometimes during the week, my sistor visits me on weekends, and my mother visits me every second week, my two cats are sometimes here, and my …. and that’s none of your business". "Thanks Sally have a great day." (census taker wrote -- "NO PERSONS LIVING HERE - UNOCCUPIED.")

What is SAE? Small area estimation is the collective term for several statistical techniques involving the estimation of parameters for small sub-populations, generally used when the sub-population of interest is included in a larger survey. - Wikipedia (the free encyclopedia) Small area: a sub-population for which there is not enough sample to construct reliable estimates directly based on the survey sample –small geographical area, such as LHA –small domain, such as demographic subgroups  Area with small number of respondents – estimates with low precision (large standard error)  Area with no respondent – no estimate

Why SAE? Growing demand for reliable small area statistics for policy analysis and planning purposes –there is “increasing government concern with issues of distribution, equity and disparity” –apportionment of government funds –regional planning Constraints of national surveys: – not designed to produce reliable estimates at the small area level due to cost constraints. Limitation of administrative data sources: –do not have the necessary information to provide the detailed statistics needed for small areas.

How to do SAE? “Borrow strength” from related or similar small areas through explicit or implicit models that connect the small areas via supplementary data –combine data obtained from large scale surveys containing measures of interest with a set of covariates available for all small areas from other sources Auxiliary information/covariates –correlated with the measure of interest –known for all small areas –common source: census, administrative registries

SAE Methods Simple approaches: –Demographic methods  local estimation of population in post-censual years  latest census data + administrative registries (e.g., birth, death, etc.) –Synthetic estimation  derived from direct survey estimate of a large area  the small area is covered by the large area  assumption: the small areas have the same characteristics as the large area  potential bias  Indirect standardization –Composite estimation  weighted average of the synthetic and survey direct estimates  balance the potential bias of a synthetic estimator and the instability of a direct estimator

SAE Methods Multi-level modeling: –using individual level covariates only –combining individual and area-level covariates –using area level covariates only  model-based SAE generated for a particular small area is the expected outcome for that area based on its characteristics as measured by the covariates.  example of interpretation: given the characteristics of the local population we would expect approximately x% of adults within LHA X to smoke/be obese etc.  enables us to provide information about the characteristics of all areas in the population, not just the sampled areas.

Indirect Standardization Applying national (large area) direct survey estimates of demographic class to area-level population counts to generate expected area estimates. –intuitively appealing Mean level of many variables in a population is highly related to the distribution of such demographic variables as age, sex and social class. –easy and inexpensive to apply  local level populations of demographic classes from the Census +  national estimates from survey –assumes that the national rates for each subgroup apply uniformly across all areas. Differences between areas are due solely to differences in their demographic composition.

Models using individual level covariates only Modeling the relationship between measure of interest and covariates on individual level based on survey data. Apply estimated model coefficients to covariates available as counts for all small areas (e.g. from the Census) to obtain expected area estimate for measure of interest. Data requirement –exact correspondence between the covariates used in the model and data available from the Census or other administrative data sources. –restricts the choice of covariates in these models. Within area clustering is ignored.

Models combining individual and area level covariates Multi-level models incorporating random effects –fixed effects of covariates + small area specific random effects –taking into account the clustering within small area  suited to the clustered nature of social surveys  provides more accurate standard errors estimates –enabling exploration of the association of area differences with individual and area level characteristics –stringent data requirements due to inclusion of individual level covariates

Models using area level covariates only The model gives a constant predicted value for all individuals within an area - the predicted mean of the area. –avoid the stringent data requirements –relatively low cost –a strong argument: controlling for differences in area level covariates is all that is needed for predicting area differences in study variable. –not support subgroup estimates within each small area such as gender-specific estimate

Data requirements survey dataset: holds both the outcome variables (e.g. smoking status), as well as the individual level covariate data (e.g. age, sex, SEC). area-level covariate dataset: contains the estimation area level means for a set of covariates – usually census, administrative and registration data – along with the estimation area identifiers, and any higher-level area covariates and identifiers. analysis dataset: the survey and covariate datasets matched on estimation area identifier. The analysis dataset contains only the areas sampled in the survey. This dataset is used for modeling. implementation dataset: a dataset covering all areas (not just those sampled) to produce the final estimates. The implementation dataset will be at the lowest estimation area level, nested within higher-level geographic identifiers. This will allow the production of higher-level estimates by aggregating estimates for the component small areas. external validation dataset: relevant local and/or national surveys or other administrative sources to provide direct estimates of relevant outcomes to compare against the SAE.

Cautions “Indirect estimators should be considered when better alternatives are not available, but only with appropriate caution and in conjunction with statistical research and evaluation efforts. Both producer and user must not forget that even after such efforts, indirect estimates may not be adequate for the intended purpose.” you never have to say you are certain.

You Might Be a Statistician if... no one wants your job. you are right 95% of the time. you feel complete and sufficient. you found accountancy too exciting. you never have to say you are certain. you may not be normal but you are transformable.

References M Ghosh, J.N.K. Rao. "Small area estimation: An appraisal", Statistical Science, vol 9, no.1 (1994), Danny Pfefferman. "Small area estimation - New developments and directions", International Statistical Review (2002), 70, 1, Goldstein H (2003) Multilevel statistical models (New York: Halstead Press). Rao JNK (2003). Small Area Estimation. John Wiley & Sons, Inc., Hoboken, New Jersey.

Top ten reasons to be a statistician 1.Estimating parameters is easier than dealing with real life. 2.Statisticians are significant. 3.I always wanted to learn the entire Greek alphabet. 4.The probability a statistician major will get a job is > If I flunk out I can always transfer to Engineering. 6.We do it with confidence, frequency, and variability. 7.You never have to be right - only close. 8.We're normal and everyone else is skewed. 9.The regression line looks better than the unemployment line. 10.No one knows what we do so we are always right.