Razi Mokatren Golan Salman Privacy in a Demographic Database.

Slides:



Advertisements
Similar presentations
1 The Social Survey ICBS Nurit Dobrin December 2010.
Advertisements

What is Primary Research and How do I get Started?
5.1 day 2 Simulations! .
ENROLLED STILL UNINSURED Voices from the Newly- Enrolled And Still Uninsured A Survey about the Affordable Care Act’s First Open Enrollment Period June.
Methodology Conducted from March 16 – 22, 2006
Bill of Rights ~ 1791.
Section 1.3 Experimental Design.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Confidence Intervals for Proportions.
Graphical Results Results The Effect of Virtual Worlds on Adolescents' Real World Lives Using an upper-division undergraduate social science course, students.
Inquiry Project: What Can We Learn From Weather Forecasts Online? By: Laura Stokes
Binomial Distributions
Understanding sample survey data
© 2003 Prentice Hall dr1 Drafting and Revising. © 2003 Prentice Hall dr2 THREE WAYS TO DRAFT Get started. Don’t wait until you have every detail. Your.
Chapter 7 Sampling Distributions
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Get the most information out of the time you have available.
[Introduce yourself].
PRIVACY IN A DEMOGRAPHIC DATABASE Milestone #1 Razi Mukatren, Golan Salman.
Musical Chairs! Change your table groups. One person may remain at each table. Everyone else move to another table— each going to a different new table.
Section Using Simulation to Estimate Probabilities
Chapter 24 Comparing Means.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
An Overview of Statistical Inference – Learning from Data
Each child born to a particular set of parents has probability of 0.25 having blood type O. Suppose these parents have 5 children. Let X = number of children.
01 Thinking Critically.
RSA Parameter Generation Bob needs to: - find 2 large primes p,q - find e s.t. gcd(e, Á (pq))=1 Good news: - primes are fairly common: there are about.
1 Factor Analysis of Promotion of employees in the workplace: The Gender Aspect Based on the Israeli Social survey 2008 Nurit Dobrin Geneva, March 2012.
Chapter 5 Lecture 2 Sections: 5.3 – 5.4.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Section 9.2 Sampling Proportions AP Statistics. AP Statistics, Section 9.22 Example A Gallup Poll found that 210 out of a random sample of 501 American.
CONFIDENCE INTERVALS Feb. 18 th, A STATS PROFESSOR ASKED HER STUDENTS WHETHER OR NOT THEY WERE REGISTERED TO VOTE. IN A SAMPLE OF 50 OF HER STUDENTS.
Sampling is the other method of getting data, along with experimentation. It involves looking at a sample from a population with the hope of making inferences.
Chapter 41 Sample Surveys in the Real World. Chapter 42 Thought Question 1 (from Seeing Through Statistics, 2nd Edition, by Jessica M. Utts, p. 14) Nicotine.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
GOLAN SALMAN RAZI MUKATREN PRIVACY IN A DEMOGRAPHIC DATABASE PROJECT PLAN.
MAT 1000 Mathematics in Today's World. Last Time 1.Two types of observational study 2.Three methods for choosing a sample.
Razi Mukatren Golan Salman 1 Workshop in information security Privacy in a Demographic Database Lecturer: Dr. Eran Tromer Teaching assistant: Mr. Nir Atias.
Confidence Intervals for Proportions Chapter 8, Section 3 Statistical Methods II QM 3620.
Parameters and Statistics What is the average income of American households? Each March, the government’s Current Population Survey (CPS) asks detailed.
Warm up State whether the statement is Descriptive or Inferential. 1) Joanna Burgos, a spokeswoman for the GOP convention, said 13% of delegates this year.
Sampling Distribution WELCOME to INFERENTIAL STATISTICS.
The z test statistic & two-sided tests Section
Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.2 Estimating a Population Proportion.
Chapter 10: Introduction to Statistical Inference.
Chapter 7 Sampling Distributions Target Goal: DISTINGUISH between a parameter and a statistic. DEFINE sampling distribution. DETERMINE whether a statistic.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Part III – Gathering Data
Scientific Method, Types of Experiments and Data Processing
Sight Words.
CONFIDENCE INTERVALS: THE BASICS Unit 8 Lesson 1.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
Parameter or statistic? The mean income of the sample of households contacted by the Current Population Survey was $60,528.
Joshua and Samuel Connecting to God Changes Lives.
CJ210: Interrogation: Purpose, Guidelines, Procedures, and the Miranda Ruling Unit 6 Seminar: Miranda, Interrogation, Interviews, and other.
Choices and Consequences By Brittany Noel Felber.
Primary Research HSB 4UI ISU. Primary Research Quantitative Quantify (measure) Quantify (measure) Large number of test subjects Large number of test subjects.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution?
4.3 Using Studies Wisely Objective SWBAT Describe the scope of inference that is appropriate in a statistical study.
WORDS DON’T COME EASY SURVEY SOCIAL MEDIA Laura & Fabienne.
Sampling Variability Section 8.1. Sampling Distribution Represents the long run behavior of the mean when sample after sample is selected. It is used.
 According to the CDC, in 2009  Among young adults ages 15 to 24 years old, there are approximately attempts for every completed suicide. 3.
Practical Math: Credit Cards Chapter 7 Notes. Vocabulary credit card: a card that allows you to buy items now and pay later monthly statement: form like.
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
CompSci Today’s Topics Computer Science Noncomputability Upcoming Special Topic: Enabled by Computer -- Decoding the Human Genome Reading Great.
Component D: Activity D.3: Surveys Department EU Twinning Project.
It’s actually way more exciting than it sounds!!! It’s actually way more exciting than it sounds!!! Research Methods & Statistics.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Privacy and Confidentiality An introduction. What is confidentiality? ‘Confidentiality’ means making sure that information is only available to those.
Part III – Gathering Data
Presentation transcript:

Razi Mokatren Golan Salman Privacy in a Demographic Database

Israel Central Bureau of Statistics (CBS, הלמ"ס) 1. Governmental unit that works under the auspices of the Prime Minister's Office. 2. Annually holds a comprehensive survey, provide information on the role of Israel's population welfare and living conditions. 3. Order of the size of the survey is about 7000 people. 4. The results of the survey publish online on the CBS website.

The way the data appeared: The website allow to see the results when it’s divided and filtered according to different categories Up to two filters and four variables.

The question we’ll deal in this project: Giving the data in the website, whether is it possible to restore the answers of one of the participants. Let’s look at the following table created after the following selection: Filter A - Status / Widower Filter B - Military service / Yes Variable A - Sex Variable B - number of children.

We found the following legal: Every 1 in the table created after choice of two filters and two variables represents a participant who took part in the survey. Now, we can restore his records. How? Recall that the table on the previous page we got after choosing two filters and two variables. Now, we’ll create the table with the same two filters, the same two variables in addition to another new variable A widower, served in the army, without children and male.

Let’s look at the following table: Same as the previous one, only with the addition of a third variable (employment status) From the Previous table: A widower, served in the army, without children and male. From this table: Employed. This is 5 things.

In this point we know 5 things: How can we get the others? In loop, we’ll switch the third parameter. How can we make sure that the records are correct?

-We created a lot of random samples of size Our algorithm ran on those samples and extracted records from them. -Because the sample is random, we can make sure that the records extracted are true. After getting the records from the CBS, we can't compare them to any database, to make sure they are correct. So, how can we know that the records extracted from the CBS website reflect the real data?

The Results: The random samples helped us understand that the algorithm is working. Let’s look at the real result: From the 7064 records in the real survey, we managed to restore amount of XXX records. Each records include personal information of a person, who received a promise of Confidentiality.

Once we realized that we can restore the records, we went to the next destination: Attempt to find one of the people who took part in the survey of  Friends and family.  Forums in the Internet.  The data we extracted.  Facebook.

At the same time we tried to look at the records and look for specific details which will help us to find a participate in the survey. Then we noticed the following information: 1. A Muslim woman, 30 years old and single. 2. Monthly salary of over 21,000 ₪. 3. Lives in a small village in the north (probably a village). 4. Commute time to work: an hour and a half. Razi suspected he knew the girl. To make sure he contacted her and asked if previously participated in the CBS poll. She said yes.

Nada Shaladi, 30 years old, live in Kpar Eicsel. Morning Yom Kippur eve, : Representative of the CBSknocked on Nada door. He emphasize Morning Yom Kippur eve, : Representative of the CBS knocked on Nada door. He emphasize that the survey is anonymous. When we presented to Nada the information we have, she couldn’t believe it.

Did you serve in the army? No What was your gross salary last month, before deductions, from all places of work? (in NIS) More then 21,000 NIS What is you Religion? Moslem How long does it usually take you to get to workplace? minutes Some of the details we managed to find out about Nada.

Return to the question we tried to answer: Is it possible to restore the answers of a particular person from the data appeared in the website? This project prove that the answer to the question is YES. What does it say about the security in the CBS website?

We could see in the CBS website severe privacy of the survey participants. 1. Even tough it’s not Immediate, a person who want to find out some personal details of a specific participant, could easily achieve it. 2. Most of the participants don’t aware to the fact that their personal data exposed to all in the website. 3. It is not clear whether the CBS people aware to the failure.

The question we’ll deal in this project: Giving the data in the website, whether is it possible to restore the answers of one of the participants.