CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 Jeroen Pannekoek and Li-Chun.

Slides:



Advertisements
Similar presentations
Operation Research Chapter 3 Simplex Method.
Advertisements

Copyright (c) 2003 Brooks/Cole, a division of Thomson Learning, Inc
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
The General Linear Model Or, What the Hell’s Going on During Estimation?
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch 2. 6 – Solving Systems of Linear Inequalities & Ch 2
Easy Optimization Problems, Relaxation, Local Processing for a small subset of variables.
The General Linear Model. The Simple Linear Model Linear Regression.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
The Simplex Method: Standard Maximization Problems
The Simplex Algorithm An Algorithm for solving Linear Programming Problems.
Linear Programming (LP)
Chapter 11 Multiple Regression.
The Simplex Method.
D Nagesh Kumar, IIScOptimization Methods: M2L5 1 Optimization using Calculus Kuhn-Tucker Conditions.
D Nagesh Kumar, IIScOptimization Methods: M3L4 1 Linear Programming Simplex method - II.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Math Dept, Faculty of Applied Science, HCM University of Technology
How To Find The Reduced Row Echelon Form. Reduced Row Echelon Form A matrix is said to be in reduced row echelon form provided it satisfies the following.
Chapter 3 Linear Programming Methods 高等作業研究 高等作業研究 ( 一 ) Chapter 3 Linear Programming Methods (II)
1. The Simplex Method.
1 USING A QUADRATIC PROGRAMMING APPROACH TO SOLVE SIMULTANEOUS RATIO AND BALANCE EDIT PROBLEMS Katherine J. Thompson James T. Fagan Brandy L. Yarbrough.
This presentation shows how the tableau method is used to solve a simple linear programming problem in two variables: Maximising subject to two  constraints.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Water Resources Development and Management Optimization (Linear Programming) CVEN 5393 Mar 4, 2011.
Systems of Equations and Inequalities Systems of Linear Equations: Substitution and Elimination Matrices Determinants Systems of Non-linear Equations Systems.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
4  The Simplex Method: Standard Maximization Problems  The Simplex Method: Standard Minimization Problems  The Simplex Method: Nonstandard Problems.
Part 4 Nonlinear Programming 4.5 Quadratic Programming (QP)
 Minimization Problem  First Approach  Introduce the basis variable  To solve minimization problem we simple reverse the rule that is we select the.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
How Errors Propagate Error in a Series Errors in a Sum Error in Redundant Measurement.
Optimization unconstrained and constrained Calculus part II.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
(iii) Lagrange Multipliers and Kuhn-tucker Conditions D Nagesh Kumar, IISc Introduction to Optimization Water Resources Systems Planning and Management:
Simplex Method Simplex: a linear-programming algorithm that can solve problems having more than two decision variables. The simplex technique involves.
Part 3. Linear Programming 3.2 Algorithm. General Formulation Convex function Convex region.
1 NONLINEAR REGRESSION Suppose you believe that a variable Y depends on a variable X according to the relationship shown and you wish to obtain estimates.
Thinking Mathematically Algebra: Equations and Inequalities 6.4 Linear Inequalities in One Variable.
LINEAR PROGRAMMING 3.4 Learning goals represent constraints by equations or inequalities, and by systems of equations and/or inequalities, and interpret.
Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc. Linear Programming: An Algebraic Approach 4 The Simplex Method with Standard Maximization.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Sullivan Algebra and Trigonometry: Section 12.9 Objectives of this Section Set Up a Linear Programming Problem Solve a Linear Programming Problem.
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 4 The Simplex Algorithm and Goal Programming
The Simplex Method. and Maximize Subject to From a geometric viewpoint : CPF solutions (Corner-Point Feasible) : Corner-point infeasible solutions 0.
3.1 Solve Linear Systems by Graphing Algebra II. Definition A system of two linear equations in two variables x and y, also called a linear system, consists.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
2.7 Linear Programming Objectives: Use linear programming procedures to solve applications. Recognize situations where exactly one solution to a linear.
Linear Regression.
Perturbation method, lexicographic method
PROGRAMME 4 DETERMINANTS.
Bayesian Models in Machine Learning
Chapter 3 The Simplex Method and Sensitivity Analysis
Ying shen Sse, tongji university Sep. 2016
PROGRAMME 4 DETERMINANTS.
Part 3. Linear Programming
7.4 – The Method of Least-Squares
Chapter 1: Linear Equations in Linear Algebra
Biointelligence Laboratory, Seoul National University
Automatic Editing with Soft Edits
Part 4 Nonlinear Programming
Presentation transcript:

CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011 Jeroen Pannekoek and Li-Chun Zhang Partial (donor) imputation with adjustments

CBS - SSB Contents The problem of inconsistent micro-data Simple solutions and there limitations More general approaches

CBS - SSB Example VariableResponse IResponse IIDonor values x1: Profit 330 x2: Employees 2520 x3: Turnover main 1000 x4: Turnover other 30 x5: Turnover total x6: Wages x7: Other costs 200 X8: Total costs 700 x3 + x4 x5 x6 + x7 x8 x1=x5-x

CBS - SSB Simple solutions (for response pattern I) Prorating Edit 1: Turnover = Profit + Total Costs 950 ≠ multiply imputations by 950 /( )=0.92 Edit 2: Total costs = Wages + Other costs 0.92*700 ≠ multiply r.h.s. by 0.92 Ratio adjustment (ratio imputation ) with R = Turnover main (donor) / Turnover main (observed). In this case the same results as for prorating except that Employees, that doesn't appear in any edit rule is also adjusted.

CBS - SSB Problems with single constraint adjustments Consider response pattern II Edit violations E1: Turnover ≠ Profit + Total costs E2: Total costs ≠ Wages + Other costs Option: 1. Adjust Profit and Total costs to fit E1. 2. For the resulting value of Total costs adjust Other costs to fit E2. Problems: -Order does matter, different solution if we do it the other way around -Information on Wages is not used in adjusting Total costs -Infeasible solutions for adjusted Total costs do occur (adjusted Total costs < Wages)

CBS - SSB Edit constraints as a system of equations For the vector of values x the constraints are Ex=0 with Each row of E is a constraint and the columns correspond to the variables. Constraints E1 and E2 are linked because they have variable x5 (Turnover total) in common. E2 and E3 are also linked (through E1).

CBS - SSB An optimization approach Change the values of the imputed variables such that: Edit rules are satisfied Change is as small as possible Formally, find an adjusted data vector x A such that: x A = arg min D(x A,x) s.t. Ex A ≤ 0. Ex A ≤ 0 means that we consider both equalities and inequalities.

CBS - SSB Distance functions Least Squares : (LS) Σ i (x i – x i A ) 2 Weighted Least Squares : (WLS) Σ i w i (x i – x i A ) 2 Kullback-Leibler Divergence: (KL) Σ i x i (ln x i – ln x i A )

CBS - SSB Adjustments models 1/2 Least squares(LS): D= Σ i (x i – x i A ) 2 x i A = x i + Σ k e ki α k Additive adjustments: total adjustment for a variable is a sum of adjustments to each of the constraints. The same adjustment parameter (α k ) for all variables in constraint k. Weighted least squares (WLS): D=Σ i w i (x i – x i A ) 2 x i A = x i + (1/w i )Σ k e ki α k Additive adjustments but amount of adjustment varies according to the weights.

CBS - SSB Adjustments models 2/2 Kullback-Leibler Divergence (KL): D=Σ i x i (ln x i – ln x i A ) x i A = x i × Π k exp(e ki α k ) Factor can be written as β k if e ki =1 and 1/ β k if e ki = -1 Multiplicative adjustments, the total adjustment to a variable is the product of adjustments to each constraint. The same multiplicative adjustment parameter β for all variables in constraint k. It can be shown that for weights 1/x i KL ≈ WLS.

CBS - SSB Algorithm Simple iterative procedures exists to estimate the adjustments for general convex distances. Adjust the x-vector to each constraint one by one. This series of single constraint adjustments are easy to perform. After all constraints are visited one iteration is completed. Repeat. For sum-to-total constraints and KL-divergence equivalent to repeated prorating and Iterative Proportional Fitting But, more general constraints: differences, linear inequalities, interval constraints. And more general distances and confidence weights

CBS - SSB The generalized ratio approach 1/2 Methods so far adjust only variables that appear in edit constraints. Aim is only to satisfy “hard” edits. Inconsistencies between imputed and observed values indicate a difference between the donor record and receptor record. Therefore: adjust all donor values to better fit the receptor record. For response pattern I, with only Turnover total observed, all donor values were multiplied by the ratio Observed/Donor Turnover. Thus rescaling with a measure of “size”.

CBS - SSB The generalized ratio approach 2/2 As a generalisation we propose the following component- wise multiplicative adjustments x i A = x i δ i The δ i are determined by minimizing their variance subject to the resulting adjusted record satisfying the edit constraints. Adjustments are as uniform as possible as with ratio- imputation. But, all kinds of constraints can be satisfied.

CBS - SSB Example revisited (response pattern II) Variable Imputed unadj. LS adjust.WLS / KLGen. ratio x1: Profit x2: Employees 25 x3: Turnover main x4: Turnover other x5: Turnover total 950 x6: Costs wages 550 x7: Costs other X8: Costs total

CBS - SSB Concluding remarks Optimization approach to solving inconsistency problems. Simultaneous adjustment to all constraints Generalizes prorating and ratio adjustment for single constraints Minimum distance approach that aims at consistency with minimum (optimal) adjustments. Generalized ratio approach, aims to better preserve the structure of the imputed record as in ratio- imputation.