Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences.

Slides:

Advertisements

Similar presentations

Basic Statistics Correlation.

Advertisements

1.1 Line Segments, Distance and Midpoint

Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.

February 14, 2002 Putting Linear Programs into standard form

Graphs & Linear Equations

The Derivative in Graphing and Application

Preserving Positivity (and other Constraints?) in Released Microdata Alan Karr 12/2/05.

Slope Problems.

NN – cont. Alexandra I. Cristea USI intensive course Adaptive Systems April-May 2003.

Linear Equations in Two Variables

The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.

On / By / With The building blocks of the Mplus language.

The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.

Gradient of a straight line x y 88 66 44 2 44 4 For the graph of y = 2x  4 rise run  = 8  4 = 2 8 rise = 8 4 run = 4 Gradient = y.

Table of Contents Solving Linear Inequalities Graphically It is assumed you already know how to solve linear inequalities algebraically. A inequality is.

Covariance Matrix Applications

Transformations Getting normal or using the linear model.

Linear Inequalities and Linear Programming Chapter 5 Dr.Hayk Melikyan/ Department of Mathematics and CS/ Linear Programming in two dimensions:

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.

Matrices: Inverse Matrix

Chapter 6 Linear Programming: The Simplex Method Section 3 The Dual Problem: Minimization with Problem Constraints of the Form ≥

Chapter 6 Linear Programming: The Simplex Method

Rational Equations and Partial Fraction Decomposition

Part 3 Linear Programming 3.4 Transportation Problem.

Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.

Ordinary least squares regression (OLS)

MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.

LINEAR PROGRAMMING SIMPLEX METHOD.

Review of Measures of Central Tendency, Dispersion & Association

1 Tendencia central y dispersión de una distribución.

Economics 173 Business Statistics Lecture 2 Fall, 2001 Professor J. Petry

Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.

Presentation by: H. Sarper

Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.

14 Elements of Nonparametric Statistics

Do Now Make a table for –2 ≤ x ≤ 2 and draw the graph of: y = 2 x (Problem #1 from today’s packet)

CHAPTER 5 EXPRESSIONS AND FUNCTIONS GRAPHING FACTORING SOLVING BY: –GRAPHING –FACTORING –SQUARE ROOTS –COMPLETING THE SQUARE –QUADRATIC FORMULA.

?v=cqj5Qvxd5MO Linear and Quadratic Functions and Modeling.

Review of Measures of Central Tendency, Dispersion & Association

SVD: Singular Value Decomposition

Relationship between two variables Two quantitative variables: correlation and regression methods Two qualitative variables: contingency table methods.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.

Chapter 6 Linear Programming: The Simplex Method Section 3 The Dual Problem: Minimization with Problem Constraints of the Form ≥

Barnett/Ziegler/Byleen Finite Mathematics 11e1 Learning Objectives for Section 6.3 The student will be able to formulate the dual problem. The student.

New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.

Lecture 4 Linear machine

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Simplex Method Simplex: a linear-programming algorithm that can solve problems having more than two decision variables. The simplex technique involves.

Chapter 5: Introductory Linear Regression

Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)

CHAPTER 5 EXPRESSIONS AND FUNCTIONS GRAPHING FACTORING SOLVING BY: –GRAPHING –FACTORING –SQUARE ROOTS –COMPLETING THE SQUARE –QUADRATIC FORMULA.

Correlation I have two variables, practically „equal“ (traditionally marked as X and Y) – I ask, if they are independent and if they are „correlated“,

...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001

The regression model in matrix form

EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005

Relationship between two continuous variables: correlations and linear regression both continuous. Correlation – larger values of one variable correspond.

5.2 Least-Squares Fit to a Straight Line

5.4 General Linear Least-Squares

6.1 Introduction to Chi-Square Space

CHAPTER 12 More About Regression

Generally Discriminant Analysis

Numerical Analysis Lecture 17.

Eigen Decomposition Based on the slides by Mani Thomas

Karl’s Pearson Correlation

Linear regression.

Principal Component Analysis

Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.

Presentation transcript:

Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences

Motivation: Example

Different distributions, but the same moments and estimates of regression coefficients. How about making D3 have the same mean and covariance?

1. Linear Transformation Let D1 be the original p-dimensional data with mean, E1 and covariance matrix S1. Let D2 be the post-microaggregated p- dimensional data with mean, E2 and covariance matrix, S2. Transform D2 into T(D2) such that E[T(D2)]=E1 & S[T(D2)]=S1.

How to compute A and b? Mathematically, A and b are obtained as. Use SVD decomposition to calculate

NOTES Linear transformed masked data yields the same analysis based on mean and covariance. How about higher moments? There is no clear answer, but higher moments rely on distributions other than A, b, mean and covariance. We need data utility measures. Linear transformation does not preserve positivity. Can we improve data utility of other SDLs through linear transformation?

Question: Other masked data?

Linear transformation with constraint of positivity. Partition X into Transform X2 but not X1. Replace final negative values with minimum of original data or zero after transforming X2. It is the middle of non-transformed microaggregated and transformed microaggregated data. The utility of this method depends on how many negative values are in transformed microaggregated data.

How to partition X? The way of partitioning X: 1. Initially, transform X in Y=AX+b. 2. Sort Y according to descending order. 3. Count how many records are negative, n`. 4. Partition Y into Y1 and Y2, where Y1 has 1 st to (n`+ n*p)-th observations of Y and Y2 contains the rest of them. 5. Partition X in X1 and X2 corresponding to Y1 and Y2. More observations are added to Y1 in order to reduce the possibility of getting negative values after transforming X2.

Example Here are eight different types of data. For most of data violating signs, the procedure above improves utilities. Since it is the middle of non-transformed and transformed microaggregated data, it does not always improve three data utilities comparing to transformed microaggregated data. Improvement of Non-symmetric Low Positive is the largest, that of Non-symmetric High Positive is the next, and the last one is Non- symmetric Low Negative.

END