The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015.

Slides:



Advertisements
Similar presentations
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Advertisements

Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen,
K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Publishing Microdata with a Robust Privacy Guarantee
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
CS573 Data Privacy and Security Statistical Databases
Evidence from Behavior INST 734 Doug Oard Module 7.
JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Topic 21: Data Privacy1 Information Security CS 526 Topic 21: Data Privacy.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Refined privacy models
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Time to Encrypt our DNA? Stuart Bradley Humbert, M., Huguenin, K., Hugonot, J., Ayday, E., Hubaux, J. (2015). De-anonymizing genomic databases using phenotypic.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Privacy-preserving data publishing
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
The world’s libraries. Connected. Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy Group Meeting in 2015.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Security Methods for Statistical Databases. Introduction  Statistical Databases containing medical information are often used for research  Some of.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Unraveling an old cloak: k-anonymity for location privacy
Topic 21: Data Privacy1 Information Security CS 526 Topic 21: Data Privacy.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies Florian Tramèr, Zhicong Huang, Erman Ayday,
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
HCI problems in computer security Mark Ryan. Electronic voting.
An agency of the European Union Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070 Industry.
Denise Chrysler, JD Director, Mid-States Region
University of Texas at El Paso
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Real-time Protection for Open Beacon Network
Lecture 27: Privacy CS /7/2018.
Presented by : SaiVenkatanikhil Nimmagadda
Published in: IEEE Transactions on Industrial Informatics
Some contents are borrowed from Adam Smith’s slides
Refined privacy models
Privacy-Preserving Data Publishing
Differential Privacy (1)
Presentation transcript:

The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Papers Discussed paper:  Tramèr, Florian; Huang, Zhicong; Ayday, Erman; Hubaux, Jean-Pierre: Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies. CCS’15 Associated paper:  Ninghui Li, Wahbeh H. Qardaji, Dong Su, Yi Wu, Weining Yang: Membership privacy: a unifying framework for privacy definitions. CCS’13 Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background What is Privacy? Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background What is Privacy? Privacy is the protection of an individual’s personal information Privacy  Confidentiality Security ProblemPrivacy Problem Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background Areas of Privacy Anonymity –Anonymous communication: e.g., The TOR software to defend against traffic analysis Web privacy –Understand/control what web sites collect, maintain regarding personal data Mobile data privacy, e.g., location privacy Privacy-preserving data usage Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background Privacy Preserving Data Sharing The need to sharing data –For research purposes E.g., social, medical, technological, etc. –Mandated by laws and regulations E.g., census –For security/business decision making E.g., network flow data for Internet-scale alert correlation –For system testing before deployment –… However, publishing data may result in privacy violations Seminar in 10 / 07 / 2015

GIC Incidence [Sweeny 2002] Group Insurance Commissions (GIC, Massachusetts) –Collected patient data for ~135,000 state employees –Gave to researchers and sold to industry –Medical record of the former state governor is identified Patient 1 Patient 2 Patient n GIC, MA DB …… AgeSexZip codeDisease 69M47906Cancer 65M47907Cancer 52F47902Flu 43F46204Gastritis 42F46208Hepatitis 47F46203Bronchitis Name Bob Carl Daisy Emily Flora Gabriel Re-identification occurs!

AOL Data Release [NYTimes 2006] In August 2006, AOL Released search keywords of 650,000 users over a 3-month period –User IDs are replaced by random numbers –3 days later, pulled the data from public access “landscapers in Lilburn, GA” queries on last name “Arnold” “homes sold in shadow lake subdivision Gwinnett County, GA” “num fingers” “60 single men” “dog that urinates on everything” Thelman Arnold, a 62 year old widow who lives in Liburn GA, has three dogs, frequently searches her friends’ medical ailments. AOL searcher # NYT Re-identification occurs!

Genome-Wide Association Study (GWAS) [Homer et al. 2008] A typical study examines thousands of singe-nucleotide polymorphism locations (SNPs) in a given population of patients for statistical links to a disease From aggregated statistics, one individual’s genome, and knowledge of SNP frequency in background population, one can infer participation in the study –The frequency of every SNP gives a very noisy signal of participation; combining thousands of such signals give high-confidence prediction 9 Membership disclosure occurs!

Need for Data Privacy Research Identification Disclosure (GIC, AOL) –Leaks the subject individual of one record Attribute Disclosure –leaks more precise information about the attribute values of some individual Membership Disclosure (GWAS) –leaks an individual’s participation in the dataset Research Program: Develop theory and techniques to anonymize data so that they can be beneficially used without privacy violations. How to define privacy for anonymized data? How to publish data to satisfy privacy while providing utility?

The world’s libraries. Connected. Background Membership Privacy Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Positive Membership-Privacy Seminar in 10 / 07 / 2015 Positive membership privacy Dataset T Query Output Adversary: t belongs to T

The world’s libraries. Connected. Positive Membership-Privacy If after the adversary sees the output of A, its posterior belief about an entity belonging to a dataset is not significantly larger than its prior belief. -positive membership-privacy under a family of prior distributions, which we denote as Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015 range(A): the set of possible values taken by A(T), for any

The world’s libraries. Connected. Linear Algebraic Model (2) If we set S: ; t: ; Equation (2) can be written as Equation (2) by itself, however, may not offer sufficient protection when the prior belief Pr[t] is already quite large For example: Setting γ = 1.2: if Pt[t] = 0.85, then (2) will bound the posterior belief Pr[t | S] <= 0.85*1.2 = Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Equation (3) can be written as In the above example, Pr[¬t | S] is lower-bounded by (1−0.85)/1.2 = 0.125, i.e., Pr[t | S] can increase from 0.85 to at most Seminar in 10 / 07 / 2015 If we set S: ; t: ;

The world’s libraries. Connected. Linear Algebraic Model (2) Equations (1) and (2) together are equivalent to: Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Efficient methods to guarantee the PMP privacy, for various distribution families Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background Method: Differential Privacy Seminar in 10 / 07 / 2015

Differential privacy range(A): the set of possible values taken by A(T), for any Tell me f(D) f(D)+noise x1…xnx1…xn Database User Seminar in 10 / 07 / 2015

Differential privacy range(A): the set of possible values taken by A(T), for any Positive membership privacy is analogous to differential privacy Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Relationship between PMP and Differential Privacy Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Whether the differential privacy is the gospel of data privacy? The answer is “No” Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Differential privacy range(A): the set of possible values taken by A(T), for any An adversary cannot tell with high confidence whether an entity t is part of a dataset or not, even if the adversary has complete knowledge over:  t’ data  all the other entities in the dataset Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Two observations Differential Privacy means that one cannot distinguish between D  {t} and D. With precise knowledge of D and t In practical, it seems unlikely, for an adversary to have such a high certainty about all entities For reasonably small values of ε, the medical utility is essentially null under DP Relax the adversarial setting of DP, with the goal of achieving higher utility Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Problem formalization Tradeoff between utility and privacy: a relaxation of differential privacy mechanism by considering a reasonable amounts of background knowledge hold by the adversary Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Background Relaxation of differential privacy: PMP for bounded priors Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) Adversary: prior knowledge: So strong, Release it! Weaker adversary: less background knowledge Goal: 1)strong PMP 2)Less data perturbation Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Linear Algebraic Model (2) The threat model Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Positive Membership-Privacy Core idea: relax adversaries’ prior Method: restricting ourselves to adversaries with a prior belief about uncertain entities bounded away from 0 and 1 Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Positive Membership-Privacy We get that γ(t) < γ for all entities for which 0 < Pr[t] < 1 -PMP actually gives us a privacy guarantee stronger than the bounds (2) and (3), for all priors bounded away from 0 or 1 Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Positive Membership-Privacy Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Two observations If we satisfy PMP, then we also satisfy - PMP, where  As γ < γ’, consider a weaker adversarial model, our privacy guarantee increases  For a fixed privacy level, the relaxed adversarial model requires less data perturbation γ’ < γ (ln γ’)-DP a weaker level of DP Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Two observations Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Selecting a level a DP For an specific PMP problem, we can Example:  Assuming PMP parameter γ = 2, the adversary prior is, then (ln 2)-DP provides the necessary privacy  If γ = 2, the adversary prior is, then (ln 3)-DP provides the necessary privacy We need less data perturbation, thus, improve utility Seminar in 10 / 07 / 2015

The world’s libraries. Connected. Thank You ! Seminar in 10 / 07 / 2015