Differentially Private Aggregation of Distributed Time-Series Vibhor Rastogi (University of Washington) Suman Nath (Microsoft Research)

Slides:



Advertisements
Similar presentations
Trusted Data Sharing over Untrusted Cloud Storage Provider Gansen Zhao, Chunming Rong, Jin Li, Feng Zhang, and Yong Tang Cloud Computing Technology and.
Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Location Privacy Preservation in Collaborative Spectrum Sensing Shuai Li, Haojin Zhu, Zhaoyu Gao, Xinping Guan, Shanghai Jiao Tong University Kai Xing.
The Mobile Code Paradigm and Its Security Issues Anthony Chan and Michael Lyu September 27, 1999.
SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
Hossein Ahmadi, Nam Pham, Raghu Ganti, Tarek Abdelzaher, Suman Nath, Jiawei Han Pallavi Arora.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
10/25/20061 Threshold Paillier Encryption Web Service A Master’s Project Proposal by Brett Wilson.
Practical Techniques for Searches on Encrypted Data Author: Dawn Xiaodong Song, David Wagner, Adrian Perrig Presenter: 紀銘偉.
1 A Cryptographic Approach to Safe Inter-domain Traffic Engineering Sridhar Machiraju SAHARA Retreat, Summer 2004.
May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy Issues and Techniques for Monitoring Applications Vibhor Rastogi RFID Security Group.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
TrafficView: A Driver Assistant Device for Traffic Monitoring based on Car-to-Car Communication Sasan Dashtinezhad, Tamer Nadeem Department of CS, University.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Slide 1 Justin Brickell Donald E. Porter Vitaly Shmatikov Emmett Witchel The University of Texas at Austin Secure Remote Diagnostics.
Masud Hasan Secure Project 1. Secure It uses Digital Certificate combined with S/MIME capable clients to digitally sign and.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Design of Health Technologies lecture 20 John Canny 11/21/05.
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
Privacy-Aware Personalization for Mobile Advertising
Cross Layer Architectures for Wireless Ad Hoc Networks PIs: Mart Molle, Srikanth V. Krishnamurthy Students: Ioannis Broustis, Arun Saha.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Collusion-Resistant Anonymous Data Collection Method Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore.
Performance Analysis of Real Traffic Carried with Encrypted Cover Flows Nabil Schear David M. Nicol University of Illinois at Urbana-Champaign Department.
Secure Conjunctive Keyword Search Over Encrypted Data Philippe Golle Jessica Staddon Palo Alto Research Center Brent Waters Princeton University.
Geo-Indistinguishability: Differential Privacy for Location Based Services Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
A Hybrid Technique for Private Location-Based Queries with Database Protection Gabriel Ghinita 1 Panos Kalnis 2 Murat Kantarcioglu 3 Elisa Bertino 1 1.
Attribute-Based Encryption With Verifiable Outsourced Decryption.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Privacy-Preserving and Content-Protecting Location Based Queries.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Secure Data Outsourcing
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Dense-Region Based Compact Data Cube
Private Data Management with Verification
Antonis Papadimitriou, Arjun Narayan, Andreas Haeberlen
Privacy Preserving Similarity Evaluation of Time Series Data
Privacy-preserving Release of Statistics: Differential Privacy
563.10: Bloom Cookies Web Search Personalization without User Tracking
Designing Private Forums
Differential Privacy in Practice
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Privacy preserving cloud computing
Published in: IEEE Transactions on Industrial Informatics
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Presentation transcript:

Differentially Private Aggregation of Distributed Time-Series Vibhor Rastogi (University of Washington) Suman Nath (Microsoft Research)

Participatory Data Mining Untrusted Aggregator AliceBobCharlieDelta google.com private.com hateBoss.com yahoo.com likeBoss.com espn.com facebook.com findDate.com private.com google.com private.com findGift.com How many people visit google.com and then visit yahoo.com on day i ? Web History Medical Info GPS Traces

Participatory Data Mining Untrusted Aggregator AliceBobCharlieDelta Week 10 Weight: 120 Cholesterol: 60 Week 10 Weight: 120 Cholesterol: 60 Week 10 Weight: 120 Cholesterol: 60 Week 10 Weight: 120 Cholesterol: 60 How many people weigh > 200 pounds and have cholesterol > 80 in week i? Web History Medical Info GPS Traces

Participatory Data Mining AliceBobCharlieDelta How many people take a particular route at 5 PM? How many people take a particular route at 5:15PM? … Privacy Concerns! Untrusted Aggregator Web History Medical Info GPS Traces Goal: Enable untrusted aggregator to query users’ time-series data with formal privacy

Current State-of-the-art Alice BobCharlieDelta Traffic Analyzer How many people were at 148 th Street & 36 th Ave at 5PM?

Current State-of-the-art Alice BobCharlieDelta Traffic Analyzer YesNo Yes Trusted Server How many people were at 148 th Street & 36 th Ave at 5PM?

Current State-of-the-art Alice BobCharlieDelta Traffic Analyzer Trusted Server Actual answer =2 Noisy answer = 3.6 Formal Privacy achieved for right noise Noise still small for a single query How many people were at 148 th Street & 36 th Ave at 5PM?

Alice BobCharlieDelta Traffic Analyzer At 5PM, were you at 148 th Street & 36 th Ave? Current State-of-the-art Formal Privacy achieved for right noise Noise still small for a single query Two Main Challenges 1. Noise in each answer = O(# of queries) 2. Trusted Server required Trusted Server Actual answer =2 Noisy answer = 3.6 How many people at 148 th Street & 36 th Ave at 5PM? How many people at 148 th Street & 36 th Ave at 5:15PM? How many people at 148 th Street & 36 th Ave at 7AM? … Noisy answer = ???

Outline Background: Differential Privacy Challenge #1: Sequence of Queries Challenge #2: No Trusted Server Experimental Evaluation

Background: Differential privacy [Dwork 06] For a sequence of queries q 1,q 2, …,q N Laplace random noise added to each query Worst case: noise has to increase linearly with N 56 + Laplace Noise How many in Bldng99 at 5? Privacy Algorithm [Dwork 06] Differential Privacy [Dwork 06] : Output should be indistinguishable: VS. NameLocationTime …..…. SmithBuilding 995:00 Alice148 & 365:00 …..…. NameLocationTime …..…. SmithBuilding 995:00 NameLocationTime …..…. SmithBuilding 995:00 Alice148 & 365:00 Alice148 & 385:15 ….

Outline Background: Differential Privacy Challenge #1: Sequence of Queries Challenge #2: No Trusted Server Experimental Evaluation

Answering Sequence of Queries q 1 = # of people in 148 th & Sr 520 at 5:00PM q 2 = # of people in 148 th & Sr 520 at 5:15PM … q N = # of people in 148 th & Sr 520 at 1:25AM Standard algorithm [Dwork et. al. 06] result in Θ(N) noise Noise too large for long sequences! NameLocationTime …..…. SmithBuilding 995PM NameLocationTime …..…. SmithBuilding 995PM Alice148th & Sr 5205PM Alice148th & Sr 5205:15PM …. Alice148th & Sr 5201:25AM

Solution: Compress the sequence q’ i has some error compared to q i – Error is small if q i has periodic nature – k/N is the compression ratio DFT-based Compression (NOT private): Discrete Fourier Transform (DFT): Inverse DFT DFT Inverse DFT q 1,…,q N f 1,…,f N Reduce effective N by compressing the sequence q 1,…,q N f 1,..,f k,f k+1,..,f N q’ 1,…,q’ N f 1,..,f k,0,..,0

DFT-based Compression - Examples qiqi qi’qi’ k = 20 N = 2000

DFT-based Compression - Examples qiqi qi’qi’ k = 10 N = 2000 Day #

Our DFT-based Perturbation Algorithm Perturbation error: O(N) to O(k) – An improvement of k/N Additional compression error often quite small Our Algorithm 1.q 1,..,q N f 1,..,f k 2.Perturb ’ f i ’ = f i + noise 3.f 1 ',..,f k ',0,0,…,0 q’ 1,…,q’ N Main Result Strong differential privacy achieved; Error in q i ’ = O(k) + Compression error DFT Inverse DFT

Outline Background: Differential Privacy Challenge #1: Sequence of Queries Challenge #2: No Trusted Server Experimental Evaluation

No Trusted Server No known efficient technique for distributed Laplace noise Laplace noise = combination of Gaussian noise Gaussian noise can be generated distributedly Individual noise too small, =  (total noise/m) Use cryptographic techniques to hide individual data despite small individual noise Distributed Paillier Cryptosystem Homomorphic encryption: add encrypted data Threshold decryption: many private keys distributed among users decryption requires a threshold # of users

Basic Protocol Alice BobCharlieDelta Traffic Analyzer How many were at 148 th Street & 36 th Ave at 5PM?

Basic Protocol (Contd.) Alice BobCharlieDelta Traffic Analyzer 1001 Trusted Server +noise E(1+noise)E(0+noise) E(1+noise) Addition over encrypted data Exploiting homomorphic property E(sum) = E(user 1 ) * E(user 2 ) * …

Basic Protocol (Contd.) Alice BobCharlieDelta Traffic Analyzer E(sum)

Basic Protocol (Contd.) Alice BobCharlieDelta Traffic Analyzer D 1 [E(sum)]D 2 [E(sum)]D 3 [E(sum)]D 4 [E(sum)] Each user partially decrypts using her key Finally combines all decryption Exploiting threshold property Sum=D 1 [E(sum)] * D 2 [E(sum)] * …

One Tricky Challenge Alice BobCharlieDelta Traffic Analyzer E(sum) During protocol, Encrypted aggregate sent back to the users Third-party agent can be malicious E(Alice’s Data) Alice’s data is breached D 1 [E(Alice’s data)]……D 4 [E(Alice’s data)]

Outline Challenge #1: Handling correlations Challenge #2: No trusted Server Experimental Evaluation Conclusion

Experimental Evaluation Implemented both solutions on – 2.8 GHz Intel Pentium PC with 1GB RAM Evaluated: – Accuracy improvement by Fourier perturbation – Performance overhead in distributed noise-addition

Fourier Perturbation: Real Datasets Source: Predestination [Krum et. al. 08] GPS Traces Fourier-basedStandard [Dwork et. al. 06] Source: hackers.com Weight Data Fourier-basedStandard [Dwork et. al. 06]

Distributed Noise Addition: Performance Overhead Computation Overhead Space Overhead – 0.5 Kb for each user – 0.5 Kb/user for the aggregator

Conclusion Participator Data Mining applications require – Accurately answer sequence of queries – Distributed noise-addition We saw a solution based on – Fourier compression & perturbation – Cryptographic protocols

Backup slides

Current State-of-the-art Formal Privacy achieved for right noise Noise still small for a single query At 5PM, were you at 148 th Street & 36 th Ave? 1.Noise in each answer = O(# of queries) 2.Trusted Server required At 5:15PM, were you at 148 th Street & 36 th Ave? At 7AM, were you at 148 th Street & 36 th Ave? … Two Main Challenges Alice BobCharlieDelta Traffic Analyzer YesNo Yes Trusted Server No Yes

Two main challenges Challenge #1: Correlations in Time-Series Data NameAgeLocationTime Alice25Building 995 PM Alice2536 th Street5:02 PM Alice25Building 1125:03 PM Bob32Building 995:35 PM Lots of tuple correlations! Building th Street Building 112 Current privacy techniques can’t handle tuple correlations !

Two main challenges Challenge #2: No trusted server Alice BobCharlieDelta Traffic Analyzer Were you at 520 bridge at 5 PM? YesNo Yes Trusted Server Users add noise individually No Yes Total error grows with # of users!