Razi Mukatren Golan Salman 1 Workshop in information security Privacy in a Demographic Database Lecturer: Dr. Eran Tromer Teaching assistant: Mr. Nir Atias.

Slides:



Advertisements
Similar presentations
1 The Social Survey ICBS Nurit Dobrin December 2010.
Advertisements

Market Research Ms. Roberts 10/12. Definition: The process of obtaining the information needed to make sound marketing decisions.
What is Primary Research and How do I get Started?
Patient Web Portals: What’s the Convenience Worth to Patients? Kenneth Adler, MD, MMM Medical Director of Information Technology Arizona Community Physicians.
ENROLLED STILL UNINSURED Voices from the Newly- Enrolled And Still Uninsured A Survey about the Affordable Care Act’s First Open Enrollment Period June.
Section 1.3 Experimental Design © 2012 Pearson Education, Inc. All rights reserved. 1 of 61.
Section 1.3 Experimental Design.
Graphical Results Results The Effect of Virtual Worlds on Adolescents' Real World Lives Using an upper-division undergraduate social science course, students.
Chi-square Test of Independence
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 8: Significantly significant.
Active Learning Lecture Slides For use with Classroom Response Systems Exploring Data with Graphs and Numerical Summaries.
Math 161 Spring 2008 Lecture 2 Chapter 2 Samples, Good and Bad
2013 MEMBER PROFILE- CALIFORNIA REPORT. BUSINESS CHARACTERISTICS OF CA MEMBERS.
Research Methods If we knew what it was we were doing, it would not be called research, would it? Albert Einstein.
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
PRIVACY IN A DEMOGRAPHIC DATABASE Milestone #1 Razi Mukatren, Golan Salman.
New Annual Faculty Assessment... after the Beta.
10.3 Estimating a Population Proportion
Food and Nutrition Surveillance and Response in Emergencies Session 14 Data Presentation, Dissemination and Use.
Active Learning Lecture Slides For use with Classroom Response Systems Exploring Data with Graphs and Numerical Summaries.
Populations and Samples Chosing Appropriate Graphs to Represent Data.
Statistics The science of collecting, analyzing, and interpreting data. The Statistical Problem Solving Process: 1.Ask a question of interest 2.Produce.
An Overview of Statistical Inference – Learning from Data
Experimental Design 1 Section 1.3. Section 1.3 Objectives 2 Discuss how to design a statistical study Discuss data collection techniques Discuss how to.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
 Collecting Quantitative  Data  By: Zainab Aidroos.
Chapter 2. Section 1  Forming a research question  Constructs = things that can be assumed are there but cannot be seen directly (aggression, anxiety)
Active Learning Lecture Slides For use with Classroom Response Systems Exploring Data with Graphs and Numerical Summaries.
Sampling is the other method of getting data, along with experimentation. It involves looking at a sample from a population with the hope of making inferences.
IBIS-Admin New Mexico’s Web-based, Public Health Indicator, Content Management System.
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Introduction to Statistics 1.
Population distribution VS Sampling distribution
Population distribution VS Sampling distribution
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
GOLAN SALMAN RAZI MUKATREN PRIVACY IN A DEMOGRAPHIC DATABASE PROJECT PLAN.
SW318 Social Work Statistics Slide 1 Compare Central Tendency & Variability Group comparison of central tendency? Measurement Level? Badly Skewed? MedianMeanMedian.
Razi Mokatren Golan Salman Privacy in a Demographic Database.
Confidence Intervals for Proportions Chapter 8, Section 3 Statistical Methods II QM 3620.
Aim: How do we use sampling distributions for proportions? HW5: complete last slide.
Red Roofs Surgery Local Patient Participation Report We are a long-established practice, located close to the centre of Nuneaton, serving approximately.
Essex Dependent Interviewing Workshop 17/09/2004 British Household Panel Survey.
Analysis of Two-Way Tables Moore IPS Chapter 9 © 2012 W.H. Freeman and Company.
Non-parametric tests (chi-square test) Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
Measuring Disability: Results from the 2001 Census and the 2001 Post-Censal Disability Survey Statistics Canada January 10, 2003.
Scientific Method, Types of Experiments and Data Processing
NAMEMATRIC NUMBER PUA CHIN WEI LEE JING YU TIA’A SHIN YEE LIM CHIN HUI Lecturer : Dr. Agus Ridwan Title: Chapter 8 Develop research.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
Section 1.3 Experimental Design.
Institutional Repositories in Zambia: Era of change in information preservation, management and retrieval of research materials by University of Zambia.
STATISTICS STATISTICS Numerical data. How Do We Make Sense of the Data? descriptively Researchers use statistics for two major purposes: (1) descriptively.
Primary Research HSB 4UI ISU. Primary Research Quantitative Quantify (measure) Quantify (measure) Large number of test subjects Large number of test subjects.
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
4.3 Using Studies Wisely Objective SWBAT Describe the scope of inference that is appropriate in a statistical study.
Welcome to MDM4U (Mathematics of Data Management, University Preparation)
NEPF-ALIGNED STUDENT PERCEPTION SURVEY IMPLEMENTATION By Ms. Amanda Byrd.
COLLECTING DATA: SURVEYS AND ADMINISTRATIVE DATA PBAF 526 Rachel Garshick Kleit, PhD Class 8, Nov 21, 2011.
Section 1.3 Objectives Discuss how to design a statistical study Discuss data collection techniques Discuss how to design an experiment Discuss sampling.
Examining difference: chi-squared (x 2 ). When to use Chi-Squared? Chi-squared is used to examine differences between what you actually find in your study.
Component D: Activity D.3: Surveys Department EU Twinning Project.
The United Kingdom experience in data collection and statistics on disability Ian Dale Head of Disability Analysis Department for Work and Pensions Steel.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
An Overview of Statistical Inference – Learning from Data
An Overview of Statistical Inference – Learning from Data
The European Statistical Training Programme (ESTP)
Quasi-Experimental Designs
Chapter 5: The analysis of nonresponse
Warm-Up Honors Algebra /11/19
Presentation transcript:

Razi Mukatren Golan Salman 1 Workshop in information security Privacy in a Demographic Database Lecturer: Dr. Eran Tromer Teaching assistant: Mr. Nir Atias Advisor: Prof. Kobbi Nissim

2 Israel Central Bureau of Statistics (CBS, הלמ " ס ) 1.Annually holds a comprehensive survey, provides information on the role of Israel's population welfare and living conditions. 2.Order of the size of the survey is about 7000 people. 3.The survey referred to ‘The Social Survey’. 4.Very comprehensive survey (68 A4 pages), contains a lot of questions. 5.The results of the survey published online on the CBS website.

Some of the questions appeared in the survey. 3 What was the subject of your studies toward a first degree (B.A.)? Do you own a dwelling? Are you satisfied, in general, with the area in which you live? Are you satisfied with the amount of parks, public gardens or greenery in the area in which you live? Did you study in a learning institution towards an academic degree? In what year did you complete your studies toward a third degree (Phd.)? When you speak, do you mix languages? Are you satisfied with your relationships with family members?

4

5

The way the data appeared: The website allows to see the results divided and filtered according to different categories 6 Up to two filters and four variables.

The question we discussed in this project: Giving the data in the website, whether is it possible to retrieve all the answers of individual participants. 7 Let’s look at the following table created after the following selection: Filter A - Status / Widower Filter B - Military service / Yes Variable A - Sex Variable B - number of children.

Identifying unique records: Every 1 in the table created after choice of two filters and two variables, represents a participant who took part in the survey. Now, we can extract his records. How? 8 Recall that the table on the previous page we got after choosing two filters and two variables. Now, we’ll create the table with the same two filters, the same two variables, in addition to another new variable. A widower, served in the army, without children and male.

Querying unique records: Same as the previous one, only with the addition of a third attribute (employment status) 9 From the Previous table: A widower, served in the army, without children and male. From this table: Employed. This is 5 things.

At this point we know 5 things: How can we get the others? Repeatedly query the 3rd attribute 10

-We created a random database of 7064 records using the attribute distribution observed in the real database. -Our algorithm ran on those samples and extracted records from them. -Because the sample is random, we can make sure that the records extracted are true. -The random samples helped us understand that the algorithm is working. Let’s look at the real result After getting the records from the CBS, we can't compare them to any database, to make sure they are correct. So, how can we know that the records extracted from the CBS website reflect the real data? 11

The results: 12 From the 7064 records in the real survey, we managed to extract 1005 records. Why did we stopped on 1005? Each records include personal information of a person, who received a promise of confidentiality. How can we be sure that there 1000 records you retrieved indeed correspond to answers given by real responders? Answer: we cannot be 100% sure at this time, but the fact we managed to re-identify one record and confirm her data is a good indication.

Head hunting: Attempt to find one of the people who took part in the survey of  Friends and family.  Forums in the Internet.  The data we extracted.  Facebook.

At the same time we tried to look at the records and look for specific details which will help us to find a participate in the survey. 14 Then we noticed the following information: 1.A Muslim woman, 30 years old and single. 2.Monthly salary of over 21,000 ₪. 3.Lives in a small village in the north. 4.Commute time to work: an hour and a half. Razi suspected he knew the woman. To make it sure, he contacted her and asked if previously participated in the CBS poll. She said yes.

She is a group manager at leading High-Tech company, lives in Kfar Iksal 15 Morning Yom Kippur eve, : Representative of the CBS visited her house. He emphasize that the survey is anonymous. When we presented her the information she was unhappy that I know her salary Some of the details we managed to find out about Tal were personal: Personal gross incoming Family gross incoming Health condition Military service Religion Personal feelings No. of cars in the family And more… Some of the details we managed to find out about Tal were personal: Personal gross incoming Family gross incoming Health condition Military service Religion Personal feelings No. of cars in the family And more…

Conclusions 16 1.The survey answers of individual participants can be easily retrieved with a bit of programming. Note: We didn't attack or analyze the security of the web site implementation. We only used statistical and logical analysis of the published data. 2. Participants are unaware of the potential violation of their privacy.