Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.

Similar presentations


Presentation on theme: "Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise."— Presentation transcript:

1 Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise

2 Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that a drug V increases likelihood of heart disease Do not leak info about individual patients Setting

3 Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Curator Analyst Message

4 Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Curator Analyst Summary Message

5 Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information. Message

6 Mathematical model of database and queries Attacks Somewhat accurate answers to all queries lead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Plan

7 [Dinur-Nissim] Simple Model (easily justifiable) Database: n -bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n − o ( n ) bits of x. Model

8 Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. Put database records in one-to-one correspondence with elements of a group. Think of a database as a function D from to {0,1}. Choose queries to ask for Fourier coefficients of D. Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity). Fourier attack

9 Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n 3 ) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown 0.239 fraction on answers. Linear programming attack

10 Ask O(n) random +1/-1 questions Obtain y = Ax + e, where e is the error vector A natural approach to recover x from y: Solve: min |e'| 0 such that y=Ax'+e‘, x' in R n (hard!) Solve a linear program [D, CT, MT]: min |e'| 1 such that y=Ax'+e' x' in R n Ax ' y Linear programming attack

11 Model: Questions have O(c) large coefficients Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c 4 ) computation and reliably recover any particular bit of the database. Arbitrarily large error on arbitrary and unknown 0.499 fraction on answers. Polynomial interpolation attack

12 Assume c is prime. Think of the space of queries as a linear space. To obtain a reliable answer to query x = (1,0, …, 0), draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. x q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 Polynomial interpolation attack

13 Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. Cannot just output a noisy table. Implications

14 Non-interactive approach has inherent limitations Interactive approach works Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. Future directions: Fewer queries Understand what can and what cannot be done privately


Download ppt "Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise."

Similar presentations


Ads by Google