Presentation is loading. Please wait.

Presentation is loading. Please wait.

The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015.

Similar presentations


Presentation on theme: "The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015."— Presentation transcript:

1 The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015

2 The world’s libraries. Connected. Papers Discussed paper:  Tramèr, Florian; Huang, Zhicong; Ayday, Erman; Hubaux, Jean-Pierre: Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies. CCS’15 Associated paper:  Ninghui Li, Wahbeh H. Qardaji, Dong Su, Yi Wu, Weining Yang: Membership privacy: a unifying framework for privacy definitions. CCS’13 Seminar in 10 / 07 / 2015

3 The world’s libraries. Connected. Background What is Privacy? Seminar in 10 / 07 / 2015

4 The world’s libraries. Connected. Background What is Privacy? Privacy is the protection of an individual’s personal information Privacy  Confidentiality Security ProblemPrivacy Problem Seminar in 10 / 07 / 2015

5 The world’s libraries. Connected. Background Areas of Privacy Anonymity –Anonymous communication: e.g., The TOR software to defend against traffic analysis Web privacy –Understand/control what web sites collect, maintain regarding personal data Mobile data privacy, e.g., location privacy Privacy-preserving data usage Seminar in 10 / 07 / 2015

6 The world’s libraries. Connected. Background Privacy Preserving Data Sharing The need to sharing data –For research purposes E.g., social, medical, technological, etc. –Mandated by laws and regulations E.g., census –For security/business decision making E.g., network flow data for Internet-scale alert correlation –For system testing before deployment –… However, publishing data may result in privacy violations Seminar in 10 / 07 / 2015

7 GIC Incidence [Sweeny 2002] Group Insurance Commissions (GIC, Massachusetts) –Collected patient data for ~135,000 state employees –Gave to researchers and sold to industry –Medical record of the former state governor is identified Patient 1 Patient 2 Patient n GIC, MA DB …… AgeSexZip codeDisease 69M47906Cancer 65M47907Cancer 52F47902Flu 43F46204Gastritis 42F46208Hepatitis 47F46203Bronchitis Name Bob Carl Daisy Emily Flora Gabriel Re-identification occurs!

8 AOL Data Release [NYTimes 2006] In August 2006, AOL Released search keywords of 650,000 users over a 3-month period –User IDs are replaced by random numbers –3 days later, pulled the data from public access “landscapers in Lilburn, GA” queries on last name “Arnold” “homes sold in shadow lake subdivision Gwinnett County, GA” “num fingers” “60 single men” “dog that urinates on everything” Thelman Arnold, a 62 year old widow who lives in Liburn GA, has three dogs, frequently searches her friends’ medical ailments. AOL searcher # 4417749 NYT Re-identification occurs!

9 Genome-Wide Association Study (GWAS) [Homer et al. 2008] A typical study examines thousands of singe-nucleotide polymorphism locations (SNPs) in a given population of patients for statistical links to a disease From aggregated statistics, one individual’s genome, and knowledge of SNP frequency in background population, one can infer participation in the study –The frequency of every SNP gives a very noisy signal of participation; combining thousands of such signals give high-confidence prediction 9 Membership disclosure occurs!

10 Need for Data Privacy Research Identification Disclosure (GIC, AOL) –Leaks the subject individual of one record Attribute Disclosure –leaks more precise information about the attribute values of some individual Membership Disclosure (GWAS) –leaks an individual’s participation in the dataset Research Program: Develop theory and techniques to anonymize data so that they can be beneficially used without privacy violations. How to define privacy for anonymized data? How to publish data to satisfy privacy while providing utility?

11 The world’s libraries. Connected. Background Membership Privacy Seminar in 10 / 07 / 2015

12 The world’s libraries. Connected. Positive Membership-Privacy Seminar in 10 / 07 / 2015 Positive membership privacy Dataset T Query Output Adversary: t belongs to T

13 The world’s libraries. Connected. Positive Membership-Privacy If after the adversary sees the output of A, its posterior belief about an entity belonging to a dataset is not significantly larger than its prior belief. -positive membership-privacy under a family of prior distributions, which we denote as Seminar in 10 / 07 / 2015

14 The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015 range(A): the set of possible values taken by A(T), for any

15 The world’s libraries. Connected. Linear Algebraic Model (2) If we set S: ; t: ; Equation (2) can be written as Equation (2) by itself, however, may not offer sufficient protection when the prior belief Pr[t] is already quite large For example: Setting γ = 1.2: if Pt[t] = 0.85, then (2) will bound the posterior belief Pr[t | S] <= 0.85*1.2 = 1.02. Seminar in 10 / 07 / 2015

16 The world’s libraries. Connected. Linear Algebraic Model (2) Equation (3) can be written as In the above example, Pr[¬t | S] is lower-bounded by (1−0.85)/1.2 = 0.125, i.e., Pr[t | S] can increase from 0.85 to at most 0.875 Seminar in 10 / 07 / 2015 If we set S: ; t: ;

17 The world’s libraries. Connected. Linear Algebraic Model (2) Equations (1) and (2) together are equivalent to: Seminar in 10 / 07 / 2015

18 The world’s libraries. Connected. Linear Algebraic Model (2) Efficient methods to guarantee the PMP privacy, for various distribution families Seminar in 10 / 07 / 2015

19 The world’s libraries. Connected. Background Method: Differential Privacy Seminar in 10 / 07 / 2015

20 Differential privacy range(A): the set of possible values taken by A(T), for any Tell me f(D) f(D)+noise x1…xnx1…xn Database User Seminar in 10 / 07 / 2015

21 Differential privacy range(A): the set of possible values taken by A(T), for any Positive membership privacy is analogous to differential privacy Seminar in 10 / 07 / 2015

22 The world’s libraries. Connected. Relationship between PMP and Differential Privacy Seminar in 10 / 07 / 2015

23 The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015

24 The world’s libraries. Connected. Linear Algebraic Model (2) Seminar in 10 / 07 / 2015

25 The world’s libraries. Connected. Whether the differential privacy is the gospel of data privacy? The answer is “No” Seminar in 10 / 07 / 2015

26 The world’s libraries. Connected. Differential privacy range(A): the set of possible values taken by A(T), for any An adversary cannot tell with high confidence whether an entity t is part of a dataset or not, even if the adversary has complete knowledge over:  t’ data  all the other entities in the dataset Seminar in 10 / 07 / 2015

27 The world’s libraries. Connected. Two observations Differential Privacy means that one cannot distinguish between D  {t} and D. With precise knowledge of D and t In practical, it seems unlikely, for an adversary to have such a high certainty about all entities For reasonably small values of ε, the medical utility is essentially null under DP Relax the adversarial setting of DP, with the goal of achieving higher utility Seminar in 10 / 07 / 2015

28 The world’s libraries. Connected. Problem formalization Tradeoff between utility and privacy: a relaxation of differential privacy mechanism by considering a reasonable amounts of background knowledge hold by the adversary Seminar in 10 / 07 / 2015

29 The world’s libraries. Connected. Background Relaxation of differential privacy: PMP for bounded priors Seminar in 10 / 07 / 2015

30 The world’s libraries. Connected. Linear Algebraic Model (2) Adversary: prior knowledge: So strong, Release it! Weaker adversary: less background knowledge Goal: 1)strong PMP 2)Less data perturbation Seminar in 10 / 07 / 2015

31 The world’s libraries. Connected. Linear Algebraic Model (2) The threat model Seminar in 10 / 07 / 2015

32 The world’s libraries. Connected. Positive Membership-Privacy Core idea: relax adversaries’ prior Method: restricting ourselves to adversaries with a prior belief about uncertain entities bounded away from 0 and 1 Seminar in 10 / 07 / 2015

33 The world’s libraries. Connected. Positive Membership-Privacy We get that γ(t) < γ for all entities for which 0 < Pr[t] < 1 -PMP actually gives us a privacy guarantee stronger than the bounds (2) and (3), for all priors bounded away from 0 or 1 Seminar in 10 / 07 / 2015

34 The world’s libraries. Connected. Positive Membership-Privacy Seminar in 10 / 07 / 2015

35 The world’s libraries. Connected. Two observations If we satisfy PMP, then we also satisfy - PMP, where  As γ < γ’, consider a weaker adversarial model, our privacy guarantee increases  For a fixed privacy level, the relaxed adversarial model requires less data perturbation γ’ < γ (ln γ’)-DP a weaker level of DP Seminar in 10 / 07 / 2015

36 The world’s libraries. Connected. Two observations Seminar in 10 / 07 / 2015

37 The world’s libraries. Connected. Selecting a level a DP For an specific PMP problem, we can Example:  Assuming PMP parameter γ = 2, the adversary prior is, then (ln 2)-DP provides the necessary privacy  If γ = 2, the adversary prior is, then (ln 3)-DP provides the necessary privacy We need less data perturbation, thus, improve utility Seminar in 10 / 07 / 2015

38 The world’s libraries. Connected. Thank You ! Seminar in 10 / 07 / 2015


Download ppt "The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015."

Similar presentations


Ads by Google