List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics.

Slides:



Advertisements
Similar presentations
Introduction to Sampling : Censuses vs. Sample Surveys
Advertisements

11/19/2014 “Perceived” severity reported by individuals and “actual” disability as measured by clinical testing Washington Group on Disability Statistics.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Who and How And How to Mess It up
Chapter 4 Multiple Regression.
Fundamentals of Sampling Method
United Nations Workshop on Revision 3 of Principles and recommendations for Population and Housing Censuses and Census Evaluation Amman, Jordan, 19 – 23.
A new sampling method: stratified sampling
Palestinian Central Bureau of Statistics (PCBS) Palestine Poverty Maps 2009 March
Sampling Methods.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
United Nations Workshop on Revision 3 of Principles and recommendations for Population and Housing Censuses and Census Evaluation Amman, Jordan, 19 – 23.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Chapter 33 Conducting Marketing Research. The Marketing Research Process 1. Define the Problem 2. Obtaining Data 3. Analyze Data 4. Rec. Solutions 5.
Transforming a sample design for taking into account new statistical needs, new information or new technological instruments for data collection Elisabetta.
Determining Sample Size
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Census Sampling Frames and Sampling Section A 1.
RESEARCH A systematic quest for undiscovered truth A way of thinking
C M Clarke-Hill1 Collecting Quantitative Data Samples Surveys Pitfalls etc... Research Methods.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
IB Business and Management
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Crop area estimates with area frames in the presence of measurement errors Elisabetta Carfagna University of Bologna Department.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
Central egency for public mobilization and statistics.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Transition from traditional census to sample survey? (Experience from Population and Housing Census 2011) Group of Experts on Population and Housing Censuses,
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
Research Methodology Lecture No :14 (Sampling Design)
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Implementation of quality indicators in the Finnish statistics production process Kari Djerf Statistics Finland Q2008, Rome Italy.
The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.
for statistics based on multiple sources
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
New sources – administrative registers Genovefa RUŽIĆ.
Aim: Review Session 1 for Final Exploratory Data Analysis & Types of Studies HW: complete worksheet.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Copyright 2010, The World Bank Group. All Rights Reserved. Integrating Agriculture into National Statistical Systems Section B 1.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
Sampling technique  It is a procedure where we select a group of subjects (a sample) for study from a larger group (a population)
1 Data Collection and Sampling ST Methods of Collecting Data The reliability and accuracy of the data affect the validity of the results of a statistical.
Sampling & Simulation Chapter – Common Sampling Techniques  For researchers to make valid inferences about population characteristics, samples.
Copyright 2010, The World Bank Group. All Rights Reserved. Core and Supplementary Agricultural Topics Section A 1.
Nagraj Rao Statistician Asian Development Bank CROP CUTTING: AN INTRODUCTION.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
INFO 7470/ECON 7400/ILRLE 7400 Register-based statistics John M. Abowd and Lars Vilhuber March 4, 2013 and April 4, 2016.
Nagraj Rao Statistician Asian Development Bank CROP CUTTING: AN INTRODUCTION.
Typical farms and hybrid approaches
Data Collection Techniques
Short Training Course on Agricultural Cost of Production Statistics
Sampling.
General Concepts on Sampling Frames
LIVESTOCK PRODUCTION AND PRODUCTIVITY
Workshop on Area Sampling Frame Key features of area sampling frame
Overview of Census Evaluation and Selected Methods Pres. 2
Overview of Census Evaluation and Selected Methods Pres. 2
Methods of Associating Segments with Reporting Units
Overview of Approaches to Register-Based Populating Censuses
Overview of Census Evaluation and Selected Methods Pres. 2
Istat - Structural Business Statistics
Presentation transcript:

List frames area frames and administrative data, are they complementary or in competition? Elisabetta Carfagna University of Bologna Department of Statistics via Belle Arti Bologna

Many different data on agriculture are available in the various countries in the world. Administrative data are common almost everywhere In some countries, a specific data collection, based on list or area frames or both, is performed for producing agricultural statistics Rationalization is felt as a strong need Various non-comparable data Maintaining different data acquisition systems is very expensive Analysis of risks, advantages, disadvantages and requirements of the use of administrative data for statistical purposes Proposal of some methods to combine list frames, area frames and administrative data for producing accurate agricultural statistics

Administrative data Definitions, coverage and quality depend on administrative requirements Acquisition regulated by law, have to be collected whatever their cost Very difficult to calculate costs Administrative data relevant for agricultural statistics: taxation, social insurance and subsidies Traditionally used for updating a list for sample surveys Increase of ability to handle large sets of data Capacity of some administrative departments to collect data through the web Budget constraints Suggest to use administrative data more extensively and even to produce statistics through direct tabulation

Administrative data versus sample surveys Register: complete list of objects belonging to a defined objects set and with identification variables that allow to update the register itself Huge amount of data collected Sometimes purposive sample controlled to apply sanctions A statistical systems based on a register allows; saving money reducing response burden producing figures for very detailed domains estimating transition over time Sample survey: Population identified, decision about: parameters and levels of accuracy, taking into account budget constraints Much care devoted to data collection and quality control Efficient sample designs for reducing sampling errors

Disadvantages of direct use of administrative data data already collected information acquired is not exactly the one needed collected for purposes relevant for the respondent –coverage problems often objects in the registers are partly statistical units of the population partly something else study in Sweden: only 79% of farms have a one to one match with the IACS register (created for European agricultural subsidies) 6,4% have a one to many or many to many match and14.6% of farms have no match –incompleteness of data inflates the risk of bias for some crops (in Sweden about 20%) non clear dynamics can be generated by controls comparability over time is influenced by change coverage level

Errors in administrative data Direct tabulation suggested if sum of values presented by all objects in register is an unbiased estimator of the total of a variable. Estimator applied to data affected by errors E.g. IACS declarations for a crop c are affected by: –commission errors (some parcels declared as covered by crop c are covered by another crop or their surface is inflated) –omission errors (some parcels covered by crop c are not included in IACS declarations or their surface is less than the true). If commission and omission errors compensate, sum of declarations for crop c unbiased estimator of total surface IACS Purposive sampling;for detecting irregularities, 2003, Italian level, durum wheat error 3.5% of controlled surface Commission errors 7.8% of the sum of declarations in Puglia and 8.4% in Sicily. Omission error: 13.9% of ITA Consorzio estimate in Puglia and 23.3% in Sicily

Alternatives to direct tabulation One procedure for: reducing the risk of bias due to under-coverage of registers avoiding double data acquisition Is the following: Sampling farms from a complete and updated list and performing record linkage with the register for capturing data corresponding to farms selected from the list If the register is unreliable for some variables, related data have to be collected through interviews as well as data not found in the register due to record linkage difficulties Combined use of various registers improves the coverage of the population and data quality allows to describe the socio-economic situation of rural households it doesn’t solve all problems due to under-coverage and incorrect declaration. Statistical methodological work to be done is very heavy

Calibration estimators Probabilistic sample survey whose efficiency is improved by the use of register data as auxiliary variable in calibration estimators Improved efficiency allows to reach the same precision reducing sample size, survey costs and response burden AGRIT 2000, IACS data as auxiliary variable in regression estimator CV reduced from 4.8% to 1.3% in Puglia and from 5.9% to 3.0% in Sicily. (Landsat TM data reduced CVs to 2.7% and 5.6%) Advantages: register data included in the estimation procedure reduction of sample size, survey costs and respondent burden if frame complete and without duplications no under-overage data are collected for pure statistical purposes Disadvantages: costs and respondent burden higher than in direct tabulation difficulty to produce reliable estimates for small domains

Combined use of different frames Various incomplete registers, information included in their records is not sufficiently reliable to be directly used for statistics, thus a sample survey has to be designed to collect information through interviews. Multiple frames approach Treating these registers as multiple incomplete lists from which separate samples can be selected Two-stage estimator combines estimates calculated on non- overlapping sample units belonging to the different frames with estimates calculated on overlapping sample units Does not require record matching of listing units of different lists Some two-stage estimators need identification of identical units only in the overlap samples and some others have been developed for cases in which these units cannot be identified Completeness assumption has to be made: every unit in the population of interest should belong to at least one of the frames

Area frames When completeness is not guaranteed by combined use of different registers, an area frame should be adopted for avoiding bias, since an area frame is always complete and useful for a long time The completeness of area frames suggests their use in many cases: other complete frame is not available existing list of sampling units changes very rapidly an existing frame is out of date existing frame was obtained from a census with low coverage a multiple purpose frame is needed for estimating many different variables (agricultural, environmental etc.) Allow objective estimates of characteristics that can be observed on the ground, without interviews Materials used for survey and information collected help to reduce non sampling errors in interviews and are a good basis for data imputation for non-respondents Area sample survey materials becoming cheaper and more accurate

Combining a list and an area frame Disadvantages of area frames cost of implementing the survey program necessity of many cartographic materials sensitivity to outliers and instability of estimates if survey conducted through interviews and respondents live far from selected area unit, their identification may be difficult and expensive, and missing data tend to be relevant Multiple frame sample survey design for avoiding instability of estimates and improving their precision A list of very large operators and operators that produce rare items If this list is short, it is generally easy to construct and update Identification of the area sample units included in the list frame is needed for avoiding upwards bias of estimates Sample units belonging to list and not to the area frame do not exist and the size of intersection domain has the size of the list Approach convenient if the list contains units with large values and survey cost in the list is much lower than in area frame

Conclusions 1 Increase of ability to handle large sets of data Capacity of some administrative departments to collect data through the web Budget constrains Suggest to use administrative data more extensively and even to produce statistics through direct tabulation. reducing response burden producing figures for very detailed domains allowing estimation of transition over time However problems for producing statistics: definitions, coverage, information acquired, aims of data collection and quality controls Combined use of registers improves coverage and data quality and allows describing socio-economic conditions good identification variables sophisticated record linkage system are needed and a heavy statistical methodological work has to be done effect of imperfect matching

Conclusions 2 Sampling farms from a complete and updated list and performing record linkage with the register for capturing data Probabilistic sample survey whose efficiency is improved by the use of register data as auxiliary variable in calibration estimators –improved efficiency allows to reach the same precision reducing sample size, survey costs and response burden –register data included in the estimation process –reduction of sample size,,survey costs and respondent burden –if frame complete and without duplications no under-overage –data are collected for pure statistical purposes Multiple frame approach –does not require record matching of listing units of the different lists –when completeness not guaranteed by the different registers, area frame area frame allows avoiding bias –multiple frame sample survey design allows to avoid instability of estimates based on an area frame and to improve their precision