Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)

Slides:



Advertisements
Similar presentations
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Advertisements

Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
The Microdata Analysis System (MAS): A Tool for Data Dissemination Disclaimer: The views expressed are those of the authors and not necessarily those of.
Sampling: Final and Initial Sample Size Determination
Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.
United Nations Economic Commission for Europe Statistical Division Exploring the relationship between DDI, SDMX and the Generic Statistical Business Process.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Household projections for Scotland Hugh Mackenzie April 2014.
ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Overview of 2002 CIPSEA: Methods to Protect Confidential Tabular Data Amrut Champaneri, Ph.D. U.S. Department of Transportation Bureau of Transportation.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Estimates and Sample Sizes Lecture – 7.4
Confidence Intervals for the Mean (σ known) (Large Samples)
Dissemination to support Research & Analysis John Cornish.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Additional roles include the following: ► ► Group analysis by Separate descriptive statistics are generated for each group. The groups are determined.
InSPIRe Australian initiatives for standardising statistical processes and metadata Simon Wall Australian Bureau of Statistics December
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 10 Introduction to Estimation.
1 REMOTE ACCESS INFRASTRUCTURE FOR REGISTER DATA / 1 RAIRD Remote Access Infrastructure for Register Data Johan Heldal *, Elin Monstad **,
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Michelle Simard, Thérèse Lalor Statistics Canada CSPA Project Manager UNECE Work Session on Statistical Data Confidentiality Helsinki, October 2015 Confidentialized.
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
ABS Statistical Databases Session 6 Mark Viney Australian Bureau of Statistics 6 June 2007.
7b. SDMX practical use case: Census Hub
CENSUS OUTPUTS Dissemination Plans Chris Ashford 2011 Census Outputs : Technical Delivery.
The 2011 Census: Estimating the Population Alexa Courtney.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
ESTIMATION OF THE MEAN. 2 INTRO :: ESTIMATION Definition The assignment of plausible value(s) to a population parameter based on a value of a sample statistic.
Chapter Confidence Intervals 1 of 31 6  2012 Pearson Education, Inc. All rights reserved.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
NATIONAL STATISTICS OFFICES AND THE PROSUMER CHALLENGE New Techniques and Technologies for Statistics (NTTS) Seminar Brussels, February 2009 Space-Time.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Australian Census of Population and Housing Dissemination Strategies UNSC Seminar February 2011 Gillian Nicoll Australian Bureau of Statistics.
Metadata models to support the statistical cycle: IMDB
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Contents Introducing the GSBPM Links to other standards
Establishing an Automated Confidentiality Service in Stats NZ
Access to European microdata for scientific purposes
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Chapter 6 Confidence Intervals.
microdata.no Instant Access to Microdata
Contents Introducing the GSBPM Links to other standards
Estimates and Sample Sizes Lecture – 7.4
Presentation to SISAI Luxembourg, 12 June 2012
Item 4.3 Confidentiality on the fly
Confidence Intervals for the Mean (Large Samples)
The role of metadata in census data dissemination
Chapter 6 Confidence Intervals.
Confidentiality on the Fly
microdata.no Instant Access to Microdata
Presentation transcript:

Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)

Australian Bureau of Statistics Remote Execution Environment for Microdata (REEM )

The three main ideas Infrastructure designed to give users control over the outputs disclosure protection of count tables by coherent perturbation of cell values disclosure protection of model analysis outputs by perturbation of the score function

Current Research Environment for Users ABS Research Environment Remote Execution Environment for Microdata (REEM)

User customised tables MicrodataCreate tableConfidentialise Standard table Distribut e Producer = NSO Microdata User designs custom table Automated table creation Customised table Deliver Producer = User Automatic confidentialisation Send specifications Populate Extract

User Desired Analysis Functionality

Use of metadata standard in architectural design Why? –easy definition, discovery and reuse How? –REEM architecture separates statistical process from structured metadata describing data on which process acts. –Create, identify and store metadata throughout the collection and processing using standard metadata (DDI) –Microdata searchable on ABS Website Machine to machine discovery through web service using SDMX

Disclosure Control for Count Tables Extends methodology for Census Table Builder (Fraser and Wooten 2005) : –extended to include sample weights Perturb each cell in a way that –Perturbation has zero expected value: counts are unbiased over perturbation distribution –Perturbation has a fixed variance: control size of pertubation –Perturbation of different cells are uncorrelated : protects against differencing –Perturbation is coherent across tables: when the same records contribute to a cell, the perturbation for that cell will remain unchanged. –Additivity adjustment could be added to ensure internal cells add to margins. –

Disclosure Control for Modelled Outputs Based on methodology by Chipperfield et.al. Two levels of control are necessary: –Parameters and Inferences –Diagnostics e.g. residual plots, influence diagnostics For parameters : –Perturb the score function for estimating the model parameters –Calculate the variance post perturbation for inference purpose For diagnostics, residual plots: –Perturb diagnostics by an amount which would depend on maximum influence a record can have on the diagnostic measure –Show box diagrams rather than plots. Introduce general restrictions and attack specific restrictions

Perturbing Score Function