Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences.

Slides:



Advertisements
Similar presentations
Basic Statistics Correlation.
Advertisements

1.1 Line Segments, Distance and Midpoint
Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.
February 14, 2002 Putting Linear Programs into standard form
Graphs & Linear Equations
The Derivative in Graphing and Application
Preserving Positivity (and other Constraints?) in Released Microdata Alan Karr 12/2/05.
Slope Problems.
NN – cont. Alexandra I. Cristea USI intensive course Adaptive Systems April-May 2003.
Linear Equations in Two Variables
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
On / By / With The building blocks of the Mplus language.
The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.
Gradient of a straight line x y 88 66 44 2 44 4 For the graph of y = 2x  4 rise run  = 8  4 = 2 8 rise = 8 4 run = 4 Gradient = y.
Table of Contents Solving Linear Inequalities Graphically It is assumed you already know how to solve linear inequalities algebraically. A inequality is.
Covariance Matrix Applications
Transformations Getting normal or using the linear model.
Linear Inequalities and Linear Programming Chapter 5 Dr.Hayk Melikyan/ Department of Mathematics and CS/ Linear Programming in two dimensions:
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Matrices: Inverse Matrix
Chapter 6 Linear Programming: The Simplex Method Section 3 The Dual Problem: Minimization with Problem Constraints of the Form ≥
Chapter 6 Linear Programming: The Simplex Method
Rational Equations and Partial Fraction Decomposition
Part 3 Linear Programming 3.4 Transportation Problem.
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
Ordinary least squares regression (OLS)
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
LINEAR PROGRAMMING SIMPLEX METHOD.
Review of Measures of Central Tendency, Dispersion & Association
1 Tendencia central y dispersión de una distribución.
Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Presentation by: H. Sarper
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
14 Elements of Nonparametric Statistics
Do Now Make a table for –2 ≤ x ≤ 2 and draw the graph of: y = 2 x (Problem #1 from today’s packet)
CHAPTER 5 EXPRESSIONS AND FUNCTIONS GRAPHING FACTORING SOLVING BY: –GRAPHING –FACTORING –SQUARE ROOTS –COMPLETING THE SQUARE –QUADRATIC FORMULA.
?v=cqj5Qvxd5MO Linear and Quadratic Functions and Modeling.
Review of Measures of Central Tendency, Dispersion & Association
SVD: Singular Value Decomposition
Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Chapter 6 Linear Programming: The Simplex Method Section 3 The Dual Problem: Minimization with Problem Constraints of the Form ≥
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Learning Objectives for Section 6.3 The student will be able to formulate the dual problem. The student.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
Lecture 4 Linear machine
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Simplex Method Simplex: a linear-programming algorithm that can solve problems having more than two decision variables. The simplex technique involves.
Chapter 5: Introductory Linear Regression
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
CHAPTER 5 EXPRESSIONS AND FUNCTIONS GRAPHING FACTORING SOLVING BY: –GRAPHING –FACTORING –SQUARE ROOTS –COMPLETING THE SQUARE –QUADRATIC FORMULA.
Correlation I have two variables, practically „equal“ (traditionally marked as X and Y) – I ask, if they are independent and if they are „correlated“,
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
The regression model in matrix form
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond.
5.2 Least-Squares Fit to a Straight Line
5.4 General Linear Least-Squares
6.1 Introduction to Chi-Square Space
CHAPTER 12 More About Regression
Generally Discriminant Analysis
Numerical Analysis Lecture 17.
Eigen Decomposition Based on the slides by Mani Thomas
Karl’s Pearson Correlation
Linear regression.
Principal Component Analysis
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Presentation transcript:

Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences

Motivation: Example

Different distributions, but the same moments and estimates of regression coefficients. How about making D3 have the same mean and covariance?

1. Linear Transformation Let D1 be the original p-dimensional data with mean, E1 and covariance matrix S1. Let D2 be the post-microaggregated p- dimensional data with mean, E2 and covariance matrix, S2. Transform D2 into T(D2) such that E[T(D2)]=E1 & S[T(D2)]=S1.

How to compute A and b? Mathematically, A and b are obtained as. Use SVD decomposition to calculate

NOTES Linear transformed masked data yields the same analysis based on mean and covariance. How about higher moments? There is no clear answer, but higher moments rely on distributions other than A, b, mean and covariance. We need data utility measures. Linear transformation does not preserve positivity. Can we improve data utility of other SDLs through linear transformation?

Question: Other masked data?

Linear transformation with constraint of positivity. Partition X into Transform X2 but not X1. Replace final negative values with minimum of original data or zero after transforming X2. It is the middle of non-transformed microaggregated and transformed microaggregated data. The utility of this method depends on how many negative values are in transformed microaggregated data.

How to partition X? The way of partitioning X: 1. Initially, transform X in Y=AX+b. 2. Sort Y according to descending order. 3. Count how many records are negative, n`. 4. Partition Y into Y1 and Y2, where Y1 has 1 st to (n`+ n*p)-th observations of Y and Y2 contains the rest of them. 5. Partition X in X1 and X2 corresponding to Y1 and Y2. More observations are added to Y1 in order to reduce the possibility of getting negative values after transforming X2.

Example Here are eight different types of data. For most of data violating signs, the procedure above improves utilities. Since it is the middle of non-transformed and transformed microaggregated data, it does not always improve three data utilities comparing to transformed microaggregated data. Improvement of Non-symmetric Low Positive is the largest, that of Non-symmetric High Positive is the next, and the last one is Non- symmetric Low Negative.

END