Presentation is loading. Please wait.

Presentation is loading. Please wait.

The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December.

Similar presentations


Presentation on theme: "The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December."— Presentation transcript:

1 The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December 2007

2 Tuesday 18 December 2007 2Janika Konnu Outline Data SDC-methods Results Conclusions Forthcoming research

3 Tuesday 18 December 2007 3Janika Konnu Data used in the study Data of teachers was originally collected for administrative purposes. Only high schools teachers (N=7798) were included in our study. Data included information about teachers: age, gender, position, etc. the schools those teachers taught in: the location of the school, number of students, etc.

4 Tuesday 18 December 2007 4Janika Konnu SDC Methods: Microaggregation First data is divided into groups of k observations and the group averages are released instead of original values of variable. MDAV-algorithm was used in grouping: algorithm finds the average observation with respect to the values and forms groups by using the distance from this average observation. Grouping the data is the crucial point for this method: when the most similar observations are contained in the group, information loss will be minimised. In our study microaggregation was used for categorical data although it is intended for numerical data.

5 Tuesday 18 December 2007 5Janika Konnu SDC Methods: The Post RAndomization Method Method changes values of a variable according to probability matrix (Markov matrix) example: When PRAM is applied, data user must take the probability matrix into account in order to obtain correct results. In our study we were testing usefulness of PRAM when probability matrix is not used in analysis.

6 Tuesday 18 December 2007 6Janika Konnu Empirical work:  -Argus software Software includes disclosure risk measurement and following methods: global recoding, local suppression, top and bottom coding, PRAM, numerical microaggregation, numerical rank swapping and Sullivan masking. Software produces protected data if suppressions are allowed. In our case, only SDC-methods PRAM and numerical microaggregation were studied. No suppressions were made, because we needed information on the difference between original and protected data.

7 Tuesday 18 December 2007 7Janika Konnu Results: Data protected by Microaggregation Group sizes used in protection are 2, 5, 8, 10 and 15 Microaggregation does not have an effect on frequencies. Unfortunately this implies that hardly any change occur in values. Conclusion: microaggregation does not give strong enough protection when it comes to categorical data.

8 Tuesday 18 December 2007 8Janika Konnu Results: Data protected by PRAM (no bandwidth) Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40 PRAM changes values of variables and that way data will be protected. Unfortunately PRAM leads to problems when categories have big differences in the frequencies. The larger frequency keeps getting smaller and the other way around.

9 Tuesday 18 December 2007 9Janika Konnu Results: Data protected by PRAM (bandwidth is 2) Changing probabilities: 0.05, 0.10, 0.20, 0.30 and 0.40 Restricting the change of values can not solve problem with difference in frequencies. Our study shows that frequencies in categories next to the one with largest frequency still grow too fast.

10 Tuesday 18 December 2007 10Janika Konnu Results: Data protected by PRAM No bandwidthBandwidth is 2

11 Tuesday 18 December 2007 11Janika Konnu Conclusion: Microaggregation Microaggregation perform well with numerical data, but its application for categorical data needs more research. Data protected by microaggregation includes almost the same information as the original data. Are we sure that microaggregation is able to protect categorical data properly?

12 Tuesday 18 December 2007 12Janika Konnu Conclusion: PRAM PRAM seems to perform quite well when it comes to protecting data, but there are some issues to overcome. PRAM can protect data with small changing probabilities, because it is based on uncertainty of identification. In this case our concern is with information loss. Is the protected data useful without using probability matrix?

13 Tuesday 18 December 2007 13Janika Konnu Forthcoming research Include more methods rank swapping noise adding Include disclosure risk measures Include more precise measurement for information loss

14 Tuesday 18 December 2007 14Janika Konnu Some preferences Domingo-Ferrer, J., Torra, V. 2001. A Quantitative Comparison of Disclosure Control Methods for Microdata. In Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Amsterdam: North-Holland. Gouweleeuw, J., Kooiman, P., Willenborg, L., and de Wolf, P. 1998. Post Randomisation for Statistical Disclosure Control: Theory and Implementation. Journal of Official Statistics. Vol. 14, No.4, s. 463--478. Group Crises. 2004. Research Reports: Microaggregation for Privacy Protection in Statistical Databases. In July 2005.. Thank You!


Download ppt "The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom 17-19 December."

Similar presentations


Ads by Google