Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann.

Slides:



Advertisements
Similar presentations
System Integration Verification and Validation
Advertisements

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
From Objectives to Methods (d) Research methods A/Prof Rob Cavanagh April 7, 2010.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
United Nations Statistics Division Principles and concepts of classifications.
Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.
Introduction to Research Methodology
Regional Workshop for African Countries on Compilation of Basic Economic Statistics Pretoria, July 2007 Administrative Data and their Use in Economic.
TURKISH STATISTICAL INSTITUTE Metadata and Standards Department 1 Nezihat KERET Gülhan Eminkahyagil Metadata and Standards Department Turkish Statistical.
1 Editing Administrative Data and Combined Data Sources Introduction.
Introduction to Communication Research
Assessing and Evaluating Learning
Dimensions of Data Quality M&E Capacity Strengthening Workshop, Addis Ababa 4 to 8 June 2012 Arif Rashid, TOPS.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Population Estimates and Projections in the U. S. John F. Long
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
Chapter 1: Introduction to Statistics
DR. AHMAD SHAHRUL NIZAM ISHA
CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic The use of administrative data sources (experience and challenges)
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
Dutch Virtual Census Presentation at the International Seminar on Population and Housing Censuses; Beyond the 2010 Round November, 2012 Egon Gerards,
Q2010, Helsinki Development and implementation of quality and performance indicators for frame creation and imputation Kornélia Mag László Kajdi Q2010,
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.
Transition from traditional census to sample survey? (Experience from Population and Housing Census 2011) Group of Experts on Population and Housing Censuses,
User Study Evaluation Human-Computer Interaction.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
1 Improving Data Quality. COURSE DESCRIPTION Introduction to Data Quality- Course Outline.
Research Seminars in IT in Education (MIT6003) Research Methodology I Dr Jacky Pow.
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
for statistics based on multiple sources
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 5.
Post enumeration survey in the 2009 Pilot Census of Population, Households and Dwellings in Serbia Olga Melovski Trpinac.
Statistik.atSeite 1 Norbert Rainer Quality Reporting and Quality Indicators for Statistical Business Registers European Conference on Quality in Official.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
DMAIC 1. (Define, Measure, Analyze, Improve and Control) is a process for continual improvement. It is a systematic and fact based approach to plan, sequence.
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Why register-based statistics? Eric Schulte Nordholt Statistics Netherlands Division Social and Spatial Statistics Department Support and Development Section.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Compilation of Meta Data Presentation to OG6 Canberra, Australia May 2011.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing processes Core business of the NSO Part 1 Strengthening Statistics Produced in Collaboration.
S T A T I S T I K A U S T R I A Quality Assessment of register-based Statistics A Quality Framework Manuela LENK Directorate.
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.
SCOPE DEFINITION,VERIFICATION AND CONTROL Ashima Wadhwa.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
Public Libraries Survey Data File Overview. What We’ll Talk About PLS: Public Libraries Survey State level data Public library data (Administrative Entities)
Public Libraries Survey Data File Overview. 2 What We’ll Talk About PLS: Public Library Survey State level data Public library data (Administrative Entities)
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
14-Sept-11 The EGR version 2: an improved way of sharing information on multinational enterprise groups.
First meeting of the Technical Cooperation Group for the Population and Housing Censuses in South East Europe Vienna, March 2010 POST-ENUMERATION.
Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.
Chapter 3: Cost Estimation Techniques
Implementation of Quality indicators for administrative data
Development of Strategies for Census Data Dissemination
Towards more flexibility in responding to users’ needs
Dual Mode of Data Collection – A New Approach in the Population, Housing and Dwelling Census in Slovakia in 2011 European Conference on Quality in Official.
Survey phases, survey errors and quality control system
Survey phases, survey errors and quality control system
Administrative Data and their Use in Economic Statistics
Preliminaries Training Course «Statistical Matching» Rome, 6-8 November 2013 Mauro Scanu Dept. Integration, Quality, Research and Production Networks.
Quality assurance and assessment in the vital statistics system
Presentation transcript:

Towards a Process Oriented View on Statistical Data Quality Michaela Denk, Wilfried Grossmann

Contents  Approaches Towards Data Quality  Example Data Integration  A Generic Statistical Workflow Model  Quality Assessment  Conclusions 2Grossmann, Denk

Approaches Towards Data Quality  The usual approach towards data quality is the Reporting View  Define a number of so called quality dimensions and evaluate the final product according to criteria for these dimensions Some frequently used dimensions: Accuracy, Relevance, Accessibility, Timeliness, Coherence, Comparability,... 3Grossmann, Denk

Approaches Towards Data Quality  These dimensions are many times broken down in sub-dimensions Example Accuracy: Sampling Effects, Representativity, Over-Coverage, Under-Coverage, Missing Values, Imputation Error,....  Such an approach is fine as long as production of data follows a predefined scheme, which has limited degrees of freedom 4Grossmann, Denk

Approaches Towards Data Quality  If we have a number of different opportunities for data production such an approach is probably not the best one  Compare the ideas of Total Quality Management (TQM) in industrial production: Systematic treatment of the influence of different production steps on quality of the final product  We need a Processing View on data quality: How is data quality influenced by production? 5Grossmann, Denk

Approaches Towards Data Quality  How can we arrive at a Processing View on data quality?  We need a statistical workflow model  We have to organize the processing information necessary for quality assessment in appropriate way C ompare (old) ideas of B. Sundgren about capture of metadata 6Grossmann, Denk

Approaches Towards Data Quality  We have to know functions for assessing quality Output_Quality = F(Input_Quality, Processing_Quality)  Such functions have to be applied according to The object we are interested in, e.g. a variable or a population or a classification The quality aspect we are interested in 7Grossmann, Denk

Example Data Integration  Data integration occurs many times in statistical data production, in particular in case of data production from administrative sources  It uses a number of operations usually understood as data pre-processing  Basic goal: Combine information from two or more already existing data sets 8Grossmann, Denk

Example Data Integration  Example for a Data Integration Dataflow Input → Integration → Post-alignment 9Grossmann, Denk

Example Data Integration  Top level task description  Match the datasets according matching key  Align V1 (gender)  Align V2 (status) 10Grossmann, Denk

Example Data Integration  Details, Decisions to be made  Are datasets appropriate? Quality of matching keys Quality of data sources  Method for identification of matches?  Method for handling ambiguities in V1 (Gender)?  Method for imputation of V2 (Status)?  How is quality measured At level of a summary measure? At level of a specific variable? At level of individual records? 11Grossmann, Denk

Example Data Integration  There are no generally accepted standard tools and methods for answering such questions  Probably we have to compare a number of alternative approaches  Apply the generic format for different datasets  Try different statistical methods and models  Use different methods for quality assessment Traditional formulas Simulation based evaluation Assessment by using strategic surveys 12Grossmann, Denk

Example Data Integration  Conclusion  Different statistical methods may be an essential part of data production and quality assessment  There is no longer such a clear distinction between “objective” data collection and statistical analysis  Statistics generates added value beyond (administrative) accounting and IT 13Grossmann, Denk

A Generic Statistical Workflow Model  Statistical Workflow: A mixture from  Business Workflow (Process oriented)  Scientific Workflow (Data oriented)  Quality evaluation is the main control element of the process  We have to consider the workflow at two levels  Meta-level (Control of the process)  Data-level (Production of data) 14Grossmann, Denk

A Generic Statistical Workflow Model  Building blocks of the workflow model  Transformations (Basic data operations)  Process components (Tasks) defined by: Task definition Pre-Alignment Feasibility Check Main Transformation Post-Alignment Quality Evaluation  Workflow (Sequence of Process components) 15Grossmann, Denk

A Generic Statistical Workflow Model  Example for Data Integration Component Workflow 16Grossmann, Denk

A Generic Statistical Workflow Model  In order to understand how statistics influences the boxes and data quality let us zoom into the box for post-alignment 17Grossmann, Denk

Quality Assessment  For quality assessment we need a detailed description of the changes in meta-information during the dataflow 18Grossmann, Denk

Quality Assessment  Example for meta- information flow in data integration  Details for register based census in the presentation of Fiedler/Lenk in Session 26 (Thursday) 19Grossmann, Denk

Quality Assessment  Example: Assessment of accuracy of variables V1 (Gender) and V2 (Status) in the example 20Grossmann, Denk

Quality Assessment  V1 (Gender)  Input Coincidence of matching keys in both datasets Matching of the variable Gender in both datasets Beliefs about quality of the variable in both sources  Accuracy Assessment It seems that models developed in decision analysis (calculus from belief networks) are appropriate Alternatively we can use a strategic sample to check whether our prior beliefs are correct and our decision rule is confirmed by statistical arguments 21Grossmann, Denk

Quality Assessment  V2 (Status):  Input Coincidence of matching keys in both datasets Reliability of the model used for imputation Measurement technique for quality of imputation  Accuracy Assessment In this case we can apply traditional statistical techniques like false classification rate, ROC-curve, simulation 22Grossmann, Denk

Conclusions  We have presented a model, which allows tighter coupling of quality assessment to the data production process  Such a model seems useful if data production has more degrees of freedom  What data should be used?  What techniques should be used  The approach allows identification of the different factors influencing quality 23Grossmann, Denk

Conclusions  It allows formulation of precise questions about possible alternatives and defines new issues for research in statistical data quality  Hopefully it helps to understand better the added value generated by statistics 24Grossmann, Denk

Thank you for attention 25Grossmann, Denk