Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.

Similar presentations


Presentation on theme: "Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated."— Presentation transcript:

1 Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated fare collection systems, the société de transport de Montréal (STM, the public transit agency in Montreal area, has been benefiting from the huge volume of transit data, a kind of sequential data, collected every day. Transit data needs to be shared for diverse reasons, such as profit sharing, business operations, and marketing analysis. However, publishing raw transit data may violate passengers’ privacy. Sanitization Algorithm We develop a practical sequential data publishing solution under differential privacy, supporting several key data analysis tasks. Our solution is composed of two steps: noisy prefix tree construction and private release generation. Noisy prefix tree construction: We represent a sequential dataset by a prefix tree and adaptively identify prefixes with sufficiently large supports. To reduce added noise, we introduce a hybrid-granularity structure, which decides whether to expand a prefix by first consulting the supports of more general locations. To improve the scalability of our solution, we design a statistical process under Laplace mechanism. An illustration is given in Figure 1. Private release generation: We design mechanisms to remove inconsistencies in a noisy prefix tree in order to improve data utility. More specifically, we enforce two sets of consistency constraints in a prefix tree: 1)The noisy count of a node is less than or equal to that of its parent; 2)The noisy count of a node is greater than or equal to the sum of the noisy counts of its children. Figure 1. Hybrid-granularity prefix tree for sanitization Rui Chen Concordia University Benjamin C. M. Fung Concordia University Bipin C. Desai Concordia University Nériah M. Sossou Société de transport de Montréal Paper ID: 77 Rec. # Path t1 L1 → L2 → L3 t2 L1 → L2 t3 L3 → L2 → L1 t4 L1 → L2 → L4 t5 L1 → L2 → L3 t6 L3 → L2 t7 L1 → L2 → L4 → L1 t8 L3 → L1 Table 1. Sample transit data. Li is a station in the transport system Experimental Evaluation We employ two real-life STM transit datasets: Metro contains 847,668 records over 68 metro stations; Bus contains 778,724 records over 944 bus stations. For count queries, our solution achieves small relative errors (Figure 2); for frequent sequential pattern mining, our solution maintains high true positives (TP) and low false positives (FP) / false drops (FD) (Table 2). Our solution is efficient, with runtime complexity of O(|D|∙|L|), where |D| is the dataset size and |L| the universe size (i.e., total number of stations). Conclusion We present a practical solution for sanitizing large- scale sequential data. Our solution has been tested on real-life STM transit data and exhibits satisfactory effectiveness and efficiency. We believe that our solution could benefit many other sectors that are facing the dilemma between the demands of sequential data publishing and privacy protection. k TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid /971/3100/1000/ /1397/11149/1441/ /16822/32185/17715/ /19541/55220/20930/ /21259/88257/23343/67 Table 2. Utility for frequent sequential pattern mining on top k most frequent patterns Figure 2. Average relative error vs. privacy budget under different query sizes |Q|. (a) max|Q| = 3 (b) max|Q| = 6 (c) max|Q| = 9 (d) max|Q| = 12 Figure 3. Runtime vs. |D| and |L|


Download ppt "Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated."

Similar presentations


Ads by Google