Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Slides:



Advertisements
Similar presentations
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Privacy Preserving Publication of Moving Object Data Joey Lei CS295 Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain 6/10/20151CS295.
Differentially Private Aggregation of Distributed Time-Series Vibhor Rastogi (University of Washington) Suman Nath (Microsoft Research)
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Distributed Constraint Optimization * some slides courtesy of P. Modi
1 Preserving Privacy in GPS Traces via Uncertainty-Aware Path Cloaking by: Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady ACM CCS '07 Presentation:
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Private Analysis of Graphs
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Protecting Sensitive Labels in Social Network Data Anonymization.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Mining High Utility Itemset in Big Data
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The 1 st Competition on Critical Assessment of Data Privacy and Protection The privacy workshop is jointly sponsored by iDASH (U54HL108460) and the collaborating.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
The Impact of Duality on Data Representation Problems Panagiotis Karras HKU, June 14 th, 2007.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
m-Privacy for Collaborative Data Publishing
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Private Data Management with Verification
Introduction to Load Balancing:
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Differential Privacy and Statistical Inference: A TCS Perspective
An Adaptive Middleware for Supporting Time-Critical Event Response
Differential Privacy (2)
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Published in: IEEE Transactions on Industrial Informatics
Differential Privacy (1)
Presentation transcript:

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung, Concordia University Bipin C. Desai, Concordia University Nériah M. Sossou, Société de transport de Montréal

2 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 2

3 The STM Story 3  The Société de transport de Montréal (STM) is the public transit agency in Montreal area.  The smart card automated fare collection system generates and collects huge volume of transit data every day.  Transit data needs to be shared for many reasons.

4 Transit Data  Transit data, a kind of sequential data, consists of sequences of time-ordered locations. 4 A station in the STM network

5 Privacy Threats 5 Alice visited L4 and then L1

6 Privacy Threats 6 Alice also visited L2 …

7 Differential Privacy [1] 7 Pr M [M(D) = D*] ≤ exp(ε) × Pr M [M(D’) = D*]

8 Technical Challenges 8  Suppose there are 1,000 stations in the STM network  Suppose the maximum number of stations visited by a passenger is 20  Traditional differentially private mechanisms are data- independent: Computationally infeasible sequences to consider!

9 Contributions 9  The first practical solution for publishing real-life sequential data under differential privacy  A study of the real-life transit data sharing scenario at the STM  The use of a hybrid-granularity prefix tree for data- dependent publication and an efficient implementation based on a statistical process  Enforcement of two sets of consistency constraints  Seamless extension to trajectory data

10 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 10

11 Related Work 11  Abul et al. [2] achieves (k, δ)-anonymity by space translation.  Terrovitis and Mamoulis [3] limits an adversary’s confidence of inferring the presence of a location by global suppression.  Yarovoy et al. [4] k-anonymize a moving object database (MOD) by considering timestamps as the quasi-identifiers.  Chen et al. [5] achieves the (K, C) L -privacy model by local suppression. Is it possible to employ a much stronger privacy model while achieving desirable utility?

12 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 12

13 Laplace Mechanism [1] 13 ε: privacy parameter (privacy budget) Δf: global sensitivity (e.g., the maximum change of f due to the change of a single record). ε/(2 Δf)

14 Composition Properties 14 Sequential composition ∑ i ε i –differential privacy Parallel composition max(ε i )–differential privacy

15 Prefix Tree 15 A simple but effective way to explore the entire output domain

16 Utility Requirements 16  Count query: E.g., how many passengers have visited both Guy-Concordia and McGill stations?  Frequent sequential pattern mining: E.g., what are the most popular sequences of stations being visited?

17 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 17

18 Sanitization Algorithm 18 Complexity:

19 Noisy Prefix Tree 19  Each level consists of two sub-levels with different location granularities  Each level receives ε/h privacy budget

20 Efficient Implementation  Separately handle empty and non-empty nodes

21 Hybrid-Granularity Or For an empty node on level i, we reduce noise by a factor of

22  For any root-to-leaf path p, where v i is a child of v i+1.  For each node v, Consistency Constraints

23 Consistency Enforcement  Constraint Type Ⅰ [6]  Constraint Type Ⅱ

24 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 24

25 STM Datasets 25  Real-life STM datasets are used for evaluation: Datasets|D||D| |L| max|S|avg|S| Metro847, Bus778,

26 Average Relative Error vs. ε 26

27 Average Relative Error vs. ε 27

28 Average Relative Error vs. h 28

29 Average Relative Error vs. h 29

30 Utility vs. k 30 k TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 10099/971/3100/1000/ /1397/11149/1441/ /16822/32185/17715/ /19541/55220/20930/ /21259/88257/23343/67

31 Utility vs. ε 31 ε TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid /19473/106244/21556/ /20661/94253/22447/ /21259/88257/23343/ /21657/84259/23841/ /22452/76261/24239/58

32 Utility vs. h 32 h TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 6234/21266/88241/22159/ /21760/83254/23246/ /21558/85255/23645/ /21259/88257/23343/ /21259/88258/23342/ /21060/90258/23142/ /20960/91255/23045/ /20662/94254/22846/72

33 Scalability 33

34 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 34

35 Conclusion 35  It is possible to publish useful transit data (sequential data) under differential privacy.  Generally, a data-dependent solution outperforms a data- independent solution.  It is worth exploring the utility of released data for more complex data analysis tasks.  It is important to educate transport service practitioners.

36 References 36  C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC,  O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE,  M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. In MDM,  R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: How to hide a MOB in a crowd? In EDBT,  R. Chen, B. C. M. Fung, N. Mohammed, and B. C. Desai. Privacy- preserving trajectory data publishing by local suppression. Information Sciences, in press.  M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 2010.

37 Q&A Thank You Very Much 37

38 Back-up Slides 38

39 Detailed Algorithm