Preserving Positivity (and other Constraints?) in Released Microdata Alan Karr 12/2/05.

Slides:



Advertisements
Similar presentations
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Advertisements

Characterization and Management of Multiple Components of Cost and Risk in Disclosure Protection for Establishment Surveys Discussion of Advances in Disclosure.
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Linear Transformation of post-microaggregated data Mi-Ja Woo National Institute of Statistical Sciences.
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Normal Distribution 2 To be able to transform a normal distribution into Z and use tables To be able to use normal tables to find and To use the normal.
Chapter 3 Properties of Random Variables
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Spatial Filtering (Chapter 3)
Exploiting Homography in Camera-Projector Systems Tal Blum Jiazhi Ou Dec 11, 2003 [Sukthankar, Stockton & Mullin. ICCV-2001]
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
1 A Common Measure of Identity and Value Disclosure Risk Krish Muralidhar University of Kentucky Rathin Sarathy Oklahoma State University.
Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
2015/6/201 Minimum Spanning Tree Partitioning Algorithm for Microaggregation 報告者:林惠珍.
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
Linear filtering.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall 9.1.
1 Tel Aviv April 29th, 2007 Disclosure Limitation from a Statistical Perspective Natalie Shlomo Dept. of Statistics, Hebrew University Central Bureau of.
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey.
Digital Imaging and Remote Sensing Laboratory NAPC 1 Noise Adjusted Principal Component Transform (NAPC) Data are first preprocessed to remove system bias.
Assessing Disclosure for a Longitudinal Linked File Sam Hawala – US Census Bureau November 9 th, 2005.
Investigating improvements in quality of survey estimates by updating auxiliary information in the sampling frame using returned and modelled data Alan.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Normal Distribution.
Problem 3.26, when assumptions are violated 1. Estimates of terms: We can estimate the mean response for Failure Time for problem 3.26 from the data by.
Chapter 5: Neighborhood Processing
Linear filtering. Motivation: Noise reduction Given a camera and a still scene, how can you reduce noise? Take lots of images and average them! What’s.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
Anonymization of longitudinal surveys in the presence of outliers Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Endmember Extraction from Highly Mixed Data Using MVC-NMF Lidan Miao AICIP Group Meeting Apr. 6, 2006 Lidan Miao AICIP Group Meeting Apr. 6, 2006.
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Psychology 202a Advanced Psychological Statistics September 10, 2015.
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Nonlinear State Estimation
Correlation. u Definition u Formula Positive Correlation r =
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil,
Linear filtering. Motivation: Image denoising How can we reduce noise in a photograph?
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 10, 2015 TRIANGLE MESHES 3D MESHES MESH OPERATIONS.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences.
Spatial Filtering (Chapter 3) CS474/674 - Prof. Bebis.
Predicting Brain Size Samantha Stanley Jasmine Dumas Christopher Lee.
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Measures for Information Loss in Protected Data
What we’ll cover today Transformations Inferential statistics
Information Management course
What Are Preservatives And Additives.
Global Disclosure Risk for Microdata with Continuous Attributes
Lecture 15: Data Cleaning for ML
The Registration Mark Problem
Strategies to achieve SDC harmonisation at European level: multiple countries, multiple files, multiple surveys Daniela Ichim and Luisa Franconi Istat,
Imputation as a Practical Alternative to Data Swapping
Jerome Reiter Department of Statistical Science Duke University
Presentation transcript:

Preserving Positivity (and other Constraints?) in Released Microdata Alan Karr 12/2/05

Problem O = original microdata –Constrained to satisfy o ij >= 0 for all data records i and (some or all) attributes j Want masked data release M that –Satisfies same positivity constraints –Has high utility –Has low disclosure risk

Motivating Example O = original data M 1 = MicZ applied to O –Conceptually, data have shrunk D = O – M 1 N = normally distributed noise with –Mean 0 –Cov(N) = Cov(D) M 2 = M 1 + N –Works well for utility and risk –Does not preserve positivity

Which Methods Preserve Positivity? MethodYes/NoComments Mic*YesEdge effects may be significant SwappingYes Additive noiseNo, in general Noise distributions that preserve positivity induce edge effects TransformationNo, in general Synthetic dataPossibleIf model preserves positivity

The Problem with Mic*

Some (1/2, ¼, 1/ )-Wit Ideas Transform data to remove constraints (e.g., take logs), do SDL on transformed data, and untransform –Microaggregation becomes weighted –How to choose noise distribution? Use rejection sampling for additive noise –M+N|M+N>= 0 –Does it restore full covariance?

More Ideas Microaggregation with weights: move points near edges less –May be bad for risk –No longer preserves first moments Linear transformation, if constraint satisfied (Mi-Ja) –M = {T(x): T(x) >= 0} {x: T(x) not >=0} ?????

Other Questions More complex constraints –x 1 <= x 2 (time precedence; net income <= gross) –x 1 + x 2 <= x 3 –Are all convexity-like constraints alike? Multiple constraints –Methods need to be scalable Soft constraints