Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Similar presentations


Presentation on theme: "Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,"— Presentation transcript:

1 Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung, Concordia University Bipin C. Desai, Concordia University Nériah M. Sossou, Société de transport de Montréal

2 2 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 2

3 3 The STM Story 3  The Société de transport de Montréal (STM) is the public transit agency in Montreal area.  The smart card automated fare collection system generates and collects huge volume of transit data every day.  Transit data needs to be shared for many reasons.

4 4 Transit Data  Transit data, a kind of sequential data, consists of sequences of time-ordered locations. 4 A station in the STM network

5 5 Privacy Threats 5 Alice visited L4 and then L1

6 6 Privacy Threats 6 Alice also visited L2 …

7 7 Differential Privacy [1] 7 Pr M [M(D) = D*] ≤ exp(ε) × Pr M [M(D’) = D*]

8 8 Technical Challenges 8  Suppose there are 1,000 stations in the STM network  Suppose the maximum number of stations visited by a passenger is 20  Traditional differentially private mechanisms are data- independent: Computationally infeasible sequences to consider!

9 9 Contributions 9  The first practical solution for publishing real-life sequential data under differential privacy  A study of the real-life transit data sharing scenario at the STM  The use of a hybrid-granularity prefix tree for data- dependent publication and an efficient implementation based on a statistical process  Enforcement of two sets of consistency constraints  Seamless extension to trajectory data

10 10 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 10

11 11 Related Work 11  Abul et al. [2] achieves (k, δ)-anonymity by space translation.  Terrovitis and Mamoulis [3] limits an adversary’s confidence of inferring the presence of a location by global suppression.  Yarovoy et al. [4] k-anonymize a moving object database (MOD) by considering timestamps as the quasi-identifiers.  Chen et al. [5] achieves the (K, C) L -privacy model by local suppression. Is it possible to employ a much stronger privacy model while achieving desirable utility?

12 12 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 12

13 13 Laplace Mechanism [1] 13 ε: privacy parameter (privacy budget) Δf: global sensitivity (e.g., the maximum change of f due to the change of a single record). ε/(2 Δf)

14 14 Composition Properties 14 Sequential composition ∑ i ε i –differential privacy Parallel composition max(ε i )–differential privacy

15 15 Prefix Tree 15 A simple but effective way to explore the entire output domain

16 16 Utility Requirements 16  Count query: E.g., how many passengers have visited both Guy-Concordia and McGill stations?  Frequent sequential pattern mining: E.g., what are the most popular sequences of stations being visited?

17 17 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 17

18 18 Sanitization Algorithm 18 Complexity:

19 19 Noisy Prefix Tree 19  Each level consists of two sub-levels with different location granularities  Each level receives ε/h privacy budget

20 20 Efficient Implementation  Separately handle empty and non-empty nodes

21 21 Hybrid-Granularity Or For an empty node on level i, we reduce noise by a factor of

22 22  For any root-to-leaf path p, where v i is a child of v i+1.  For each node v, Consistency Constraints

23 23 Consistency Enforcement  Constraint Type Ⅰ [6]  Constraint Type Ⅱ

24 24 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 24

25 25 STM Datasets 25  Real-life STM datasets are used for evaluation: Datasets|D||D| |L| max|S|avg|S| Metro847,66868904.21 Bus778,7249441215.67

26 26 Average Relative Error vs. ε 26

27 27 Average Relative Error vs. ε 27

28 28 Average Relative Error vs. h 28

29 29 Average Relative Error vs. h 29

30 30 Utility vs. k 30 k TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 10099/971/3100/1000/0 150143/1397/11149/1441/6 200178/16822/32185/17715/23 250209/19541/55220/20930/41 300241/21259/88257/23343/67

31 31 Utility vs. ε 31 ε TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 0.5227/19473/106244/21556/85 0.75239/20661/94253/22447/76 1.0241/21259/88257/23343/67 1.25243/21657/84259/23841/62 1.5248/22452/76261/24239/58

32 32 Utility vs. h 32 h TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 6234/21266/88241/22159/79 8240/21760/83254/23246/68 10241/21558/85255/23645/64 12241/21259/88257/23343/67 14241/21259/88258/23342/67 16240/21060/90258/23142/69 18240/20960/91255/23045/70 20238/20662/94254/22846/72

33 33 Scalability 33

34 34 Outline  Introduction  Related Work  Preliminaries  Sanitization Algorithm  Experimental Results  Conclusion 34

35 35 Conclusion 35  It is possible to publish useful transit data (sequential data) under differential privacy.  Generally, a data-dependent solution outperforms a data- independent solution.  It is worth exploring the utility of released data for more complex data analysis tasks.  It is important to educate transport service practitioners.

36 36 References 36  C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.  O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE, 2008.  M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. In MDM, 2008.  R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: How to hide a MOB in a crowd? In EDBT, 2009.  R. Chen, B. C. M. Fung, N. Mohammed, and B. C. Desai. Privacy- preserving trajectory data publishing by local suppression. Information Sciences, in press.  M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 2010.

37 37 Q&A Thank You Very Much 37

38 38 Back-up Slides 38

39 39 Detailed Algorithm


Download ppt "Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,"

Similar presentations


Ads by Google