State Statistical Institute Berlin-Brandenburg Jörg Höhne / Julia HöningerResearch Data Centre Morpheus – Remote Data Access with a Quality Measure Joint.

Slides:



Advertisements
Similar presentations
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Advertisements

PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Collinearity. Symptoms of collinearity Collinearity between independent variables – High r 2 High vif of variables in model Variables significant in simple.
Using synthetic data to improve the accessibility of the SLS Susan Carsley, SLS Project Manager.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
Joint UNECE/Eurostat work session on statistical data confidentiality October 2013 Ottawa, Canada Improvement of access to European microdata Outcome.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
Development of Remote Access Systems Tanvi Desai LSE Research Laboratory Data Manager Research Laboratory IASSIST 2008: Stanford.
UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
Session 4. Panel session: How useful is the notion of “circle of trust” concept ? A vision for the future. Maurice Brandt Destatis Germany 2ND EUROPEAN.
LT6: IV2 Sam Marden Question 1 & 2 We estimate the following demand equation ln(packpc) = b 0 + b 1 ln(avgprs) +u What do we require.
ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Returning to Consumption
FORMS OF COOPERATION BETWEEN NATIONAL STATISTICAL INSTITUTES AND DATA ARCHIVES Sebastian Kočar (ADP, UL) First Regional Workshop – Microdata Access in.
Dependent Samples: Hypothesis Test For Hypothesis tests for dependent samples, we 1.list the pairs of data in 2 columns (or rows), 2.take the difference.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
Session IV - Use of administrative data for data collection - Statistics Belgium Geneva, 31 October – 2 November.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
Access to microdata in Europe P resented by Michel Isnard – Insee DwB Training Course, Barcelona, Jan
Mara Cammarrota Italian National Institute of Statistics Development of Information System and Corporate Products, Information Management and Quality Assessment.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
Administrative procedures for microdata access at SURS October 2013.
Some aspects concerning analytical validity and disclosure risk of CART generated synthetic data Hans-Peter Hafner and Rainer Lenz Research Data Centre.
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Anonymization of longitudinal surveys in the presence of outliers Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
Creating Something from Nothing: Working with Synthetic Files ACCOLEDS /DLI Training: December 2003 Chuck Humphrey University of Alberta.
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Survival Analysis approach in evaluating the efficacy of ARV treatment in HIV patients at the Dr GM Hospital in Tshwane, GP of S. Africa Marcus Motshwane.
© Statistisches Bundesamt, I/A Case study Federal Statistical Office Germany (Destatis) Joint UNECE/ EUROSTAT/ OECD Work Session on Statistical Metadata.
Estimation of the Probit Model From Anonymized Micro Data Gerd Ronning and Martin Rosemann Universität Tübingen & IAW Tübingen UNECE Work Session on Statistical.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
1 Regression-based Approach for Calculating CBL Dr. Sunil Maheshwari Dominion Virginia Power.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
Michelle Simard Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain, November 23 rd, 2011 Progress on Real Time Remote.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Joint UNECE/Eurostat work session on statistical data confidentiality October 2015 Helsinki, Finland Circle of trust Maurice Brandt DESTATIS.
Managerial Economics & Decision Sciences Department tyler realty  old faithful  business analytics II Developed for © 2016 kellogg school of management.
RANDOM TESTING & PROTOTYPING Presented By: Presented By: Vipul Rastogi M.Tech (S.E.) SRMSCET Presented To: Dr. Himanshu Hora Chief Proctor SRMSCET.
Computer aided teaching of statistics: advantages and disadvantages
QM222 Class 9 Section A1 Coefficient statistics
Data Confidentiality and the Common Good.
assignment 7 solutions ► office networks ► super staffing
QM222 Class 11 Section A1 Multiple Regression
MODELING of SWIFT & Survey to Survey Imputation
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
QM222 Class 8 Section A1 Using categorical data in regression
Item 5.6 of the Agenda Remote access to confidential data for scientific purpose Jean-Marc Museux/ Aleksandra Bujnowska - Unit B2 Methodology and research.
Protecting Confidential Data
microdata.no Instant Access to Microdata
Federal Statistical Office Germany Research Data Centre
EPP 245 Statistical Analysis of Laboratory Data
SAFE – a method for anonymising the German Census
microdata.no Instant Access to Microdata
Presentation transcript:

State Statistical Institute Berlin-Brandenburg Jörg Höhne / Julia HöningerResearch Data Centre Morpheus – Remote Data Access with a Quality Measure Joint UNECE /Eurostat Work Session on Statistical Data Confidentiality 26 – 28 October 2011

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 2 Need for automatic output checking Current situation: remote processing Researchers want results fast Manual control time-consuming  In need for a system that can check results automatically  On the way to real remote access  Work part of the project „infinitE - an informational infrastructure for the E-Science Age“  funded by German Federal Ministry of Education and Research

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 3 Morpheus – general idea Users run calculations on anonymous data Additional to anonymous results a measure of quality Results (almost) in real time For final publication: Results on original data, checked manually

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 4 Morpheus – functionality Distance of anonymous to original result

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 5 Advantages for users Compared to current remote processing: Fast return of results Browsing in data set allowed (almost) all analysis permitted Compared to scientific use files: Measure of quality for all results Challenge: Reading output

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 6 Challenges for RDC staff Generate anonymous files for all data sets  Census data  Business data  Panel data Perturbative anonymisation methods or synthetic data  Reduction of manual work load for RDC staff if the anonymous data set is good enough

Höhne / Höninger State Statistical Institute Berlin-Brandenburg Morpheus – a first prototype First case study with Morpheus Statistical software Stata Anonymous file for business statistics Program from an actual user Works for output from SPSS and SAS. tabstat expshare2000, stats(N mean sd p25 p50 p75) variable | N mean sd p25 p50 p expshare2000 |

Höhne / Höninger State Statistical Institute Berlin-Brandenburg Morpheus - regression output. xtreg lnapro export pers perssq hc, fe r | Robust lnapro | Coef. Std. Err. t P>|t| [95% Conf export | pers | e E perssq | 1.37e e e E E E E-12 hc | _cons |

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 9 Disclosure risk of absolute deviation Risks can arise if direction of deviation is clear because of logical reasons: Variable must be non-negative, but distance is bigger than absolute value  Example: average number of employees = 1000  Adding stochastic noise to the quality 1500

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 10 Variants of modifying the distance stochastically result with anonymous data true result a) Unbiased point estimate b) Maximum distance

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 11 Remaining questions Remaining disclosure risks How to deal with distance 0? Testing acceptance among users  „use Morpheus or wait“ Independent of anonymisation technique IT server environment

Höhne / Höninger State Statistical Institute Berlin-Brandenburg 12 Thank you for your attention! Contact Dr. Jörg Höhne Julia Höninger