ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality 28-30 October 2013 Daniel Elazar

Slides:



Advertisements
Similar presentations
Access to Microdata The Australian Bureau of Statistics Approach Teresa Dickinson
Advertisements

Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Continued Psy 524 Ainsworth
The Microdata Analysis System (MAS): A Tool for Data Dissemination Disclaimer: The views expressed are those of the authors and not necessarily those of.
Apr-15H.S.1 Stata: Linear Regression Stata 3, linear regression Hein Stigum Presentation, data and programs at: courses.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression Example: Horseshoe Crab Data
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
Introduction to Data Mining with XLMiner
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) School of Social Sciences (SSS) Jawaharlal Nehru University (JNU) New Delhi -
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
1 Econ 240A Power Outline Review Projects 3 Review: Big Picture 1 #1 Descriptive Statistics –Numerical central tendency: mean, median, mode dispersion:
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Generalized Linear Models
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
WP.5 - DDI-SDMX Integration
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
SDA: a tool for teaching and research with microdata Laine Ruus University of Toronto. Data Library Service.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Dissemination to support Research & Analysis John Cornish.
TheDataWeb & DataFerrett Rebecca Blash Bill Hazard The DataWeb Applications Branch U.S. Census Bureau.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Sociological metodology Quantification Petr Soukup.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
IPEDS Tools SHEEO/NCES Network Conference & IPEDS Workshop 3/30/04 John Milam HigherEd.org, Inc.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Heartbase May 9-11, 2005SIR users’ conference Report Writing with heartbase Report Writing with heartbase.
Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
State Statistical Institute Berlin-Brandenburg Jörg Höhne / Julia HöningerResearch Data Centre Morpheus – Remote Data Access with a Quality Measure Joint.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Michelle Simard, Thérèse Lalor Statistics Canada CSPA Project Manager UNECE Work Session on Statistical Data Confidentiality Helsinki, October 2015 Confidentialized.
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Advanced Data Analysis We are doing basic regression analysis There are many ways to analyze data that are more niche specific and more powerful.
ABS Statistical Databases Session 6 Mark Viney Australian Bureau of Statistics 6 June 2007.
An ecological analysis of crime and antisocial behaviour in English Output Areas, 2011/12 Regression modelling of spatially hierarchical count data.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)
Stata: Getting Starting and Being Productive with VA Data Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham.
The NCCS Data Web: An Introduction The National Center for Charitable Statistics at the Urban Institute January.
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Access to European microdata for scientific purposes
Tue 8-10, Period III, Jan-Feb 2018
Agenda About Excel/Calc Spreadsheets Key Features
Federal Statistical Office Germany Research Data Centre
Getting the most out of interactive and developmental data
Presentation transcript:

ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar

Traditional Framework for Analysis of Microdata Users' Environment – Basic CURFs on CD-ROM Remote Execution - RADL – Remote access to Basic and Expanded CURFs for statistical analysis in SAS, SPSS and STATA. On-site - ABSDL - Access to Expanded or Specialist CURFs Special Data Service/Consultancies

Analysi s Service CURFs Remot e Access Data Lab ABS Data Lab Special Data Service / Consul tancies Most Sophisticated Survey Table Builder Publica tion Output Less Sophisticated ABS Analysis Services by “Market Segment”

Evaluation of Current Framework Pluses  Analysis of Confidentialised URF CD-ROM or RADL  RADL supports SAS, SPSS or STATA  ’Free’ coding suited to complex manipulations of data  Variety of household survey datasets available for analysis Minuses  RADL protections not tight enough to enable analysis of more detailed data  Limited to SAS, SPSS or STATA  Very few Business CURFs  Lengthy CURF creation process  Metadata not searchable

Future ABS Tabulation Environment Future ABS Research Environment MURF Table Builder Output Filter 1 Multinomial Probit Logistic Linear Tabular Filter 2 Filter 3 Filter 4 Filter 5 Data Transforms User selects technique Confidentiality Filters Confidentialised Outputs Output MURF

TableBuilder Functionality WeightedRSEs Counts  Estimates  Means  Quantiles 

TableBuilder Protections ProtectionDescription PerturbationStatistical noise added to values Custom Rangesmin, max, min interval width Field Exclusion RulesCertain combinations of variable that increase identification risk are prohibited AdditivityRestores additivity of inner cells to margins Sparsity checksTables with too high a proportion of cells with a small number of contributors are not released RSEsFurther adjusted; quality cutoff

DataAnalyser Functionality Written in R Full User Authentication Audit System Exploratory Data Analysis Transformations / Derivations Analysis Procedures /Specifications Outputs Output Formats Summary statistics (sums, counts) Summary Tables Graphics (side-by-side box plots) Summary statistics (count) Graphics Logical derivations Categorical/ Dummy variables Category collapsing Expression Editor for categ. vars Drop variables / records Action List Robust Linear Regression Binomial logistic Probit Multinomial Poisson Diagnostics Weighted Analysis R-squared Pseudo R-squared Coefficients Standard errors Other Diagnostics CSV Storage of intermediate datasets Workflow Control Data Repository Interface Metadata Handler

DataAnalyser Protections (additional to TB) PerturbationStatistical noise added to regression score function Linear RobustHuber Mallows robustness incorporating perturbation for outliers and leverage points Hex Bin PlotsReplaces scatter plots Coverage and scope based Perturbation Perturbation controlled by the specific units included in scope and the definition of scope Drop k unitsOne record is dropped for each category of each explanatory categorical variable Explanatory Only VariablesDemographic variables not allowed in the response variable field SparsityRegressions based on to few units are not released LeverageRegressions on data containing units with excessive leverage are not released

Hex-bin plots

1Collaborations with other NSIs 2 Enhancements to TableBuilder and DataAnalyser: - hierarchical datasets - better performance with large datasets / high loads - linked datasets - sophisticated metadata handler 3 Conduct user consultation  More advanced functionality for DataAnalyser - e.g. multilevel models 4Business data 5 Single ABS publication system (single source of truth – consistency of confidentialised outputs) 6Measures of utility – information loss Future Directions