Presentation is loading. Please wait.

Presentation is loading. Please wait.

Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)

Similar presentations


Presentation on theme: "Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)"— Presentation transcript:

1 Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)

2 Australian Bureau of Statistics Remote Execution Environment for Microdata (REEM )

3 The three main ideas Infrastructure designed to give users control over the outputs disclosure protection of count tables by coherent perturbation of cell values disclosure protection of model analysis outputs by perturbation of the score function

4 Current Research Environment for Users ABS Research Environment Remote Execution Environment for Microdata (REEM)

5 User customised tables MicrodataCreate tableConfidentialise Standard table Distribut e Producer = NSO Microdata User designs custom table Automated table creation Customised table Deliver Producer = User Automatic confidentialisation Send specifications Populate Extract

6 User Desired Analysis Functionality

7 Use of metadata standard in architectural design Why? –easy definition, discovery and reuse How? –REEM architecture separates statistical process from structured metadata describing data on which process acts. –Create, identify and store metadata throughout the collection and processing using standard metadata (DDI) –Microdata searchable on ABS Website Machine to machine discovery through web service using SDMX

8 Disclosure Control for Count Tables Extends methodology for Census Table Builder (Fraser and Wooten 2005) : –extended to include sample weights Perturb each cell in a way that –Perturbation has zero expected value: counts are unbiased over perturbation distribution –Perturbation has a fixed variance: control size of pertubation –Perturbation of different cells are uncorrelated : protects against differencing –Perturbation is coherent across tables: when the same records contribute to a cell, the perturbation for that cell will remain unchanged. –Additivity adjustment could be added to ensure internal cells add to margins. –

9 Disclosure Control for Modelled Outputs Based on methodology by Chipperfield et.al. Two levels of control are necessary: –Parameters and Inferences –Diagnostics e.g. residual plots, influence diagnostics For parameters : –Perturb the score function for estimating the model parameters –Calculate the variance post perturbation for inference purpose For diagnostics, residual plots: –Perturb diagnostics by an amount which would depend on maximum influence a record can have on the diagnostic measure –Show box diagrams rather than plots. Introduce general restrictions and attack specific restrictions

10 Perturbing Score Function


Download ppt "Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)"

Similar presentations


Ads by Google