Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz.

Similar presentations


Presentation on theme: "Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz."— Presentation transcript:

1 Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz

2 Location Based Services ► Location Based Services (LBSs) are services that use, in some way, the user's location ► Example: GPS in your car, Microblog, etc ► Growing field

3 Privacy Issues ► Giving your location to another party creates privacy concerns ► Two kinds of privacy involved; location privacy and query privacy ► Example: You need to visit the hospital and don't want anyone to know that you are at the hospital. You ask an LBS for directions. ► Example: You are at home and want to ask where the nearest hospital is.

4 Hospital Example ► Pseudonyms insufficient because of temporal and spatial correlations in your GPS trace ► Identifying locations may be tied to sensitive locations Home Hospital

5 Existing Work ► Location k-anonymity – Queries do not give a coordinate but instead give a region to the LBS that encloses k users ► Path perturbation – Traces are perturbed to increase number of points that can be unambiguously assigned to a single user ► Subsampling – Same as perturbation, data points are removed instead of perturbed

6 CliqueCloak ► Best published k-anonymity algorithm ► Data from vehicles in a 70km x 70km area

7 Overview ► Suggest a different metric, Time To Confusion (TTC) ► Create an algorithm to meet a TTC bound based on empirical data ► Less focus on road coverage metrics

8 Testbed – Traffic Monitoring ► 233 vehicles ► 1 sample per minute while car is moving ► Using data for a hypothetical traffic management system ► Determined that 100m spatial accuracy and 1/minute frequency sufficient to determine what major road a car was on

9 Architecture

10 Empirical Data

11 ► A gap of greater than 10 minutes results in the splitting of traces into separate “trips”

12 Empirical Data ► Average trip time of 10 minutes noted; thus tracking for 10 minutes may connect an identifying location with a sensitive location.

13 Privacy Metric and Adversary ► Adversaries can link correlated space/time anonymous coordinates into paths ► This is done with a simple momentum-free extrapolation based on current velocity ► Time to Confusion (TTC) is the time an adversary could correctly follow a trace ► Suggested as a good metric because the link between identifying locations and sensitive locations can be broken with low TTC

14 Privacy Metric and Adversary ► Tracking uncertainty: H = -Σp log(p) ► p is the probability that a location sample belongs to a given user ► Tracking confidence: C = (1 – H) ► p = exp(-d/μ) ► μ is from the empirical PDF of trip times ► d is distance from predicted location* ► In this dataset, μ = 2094 meters ► One must choose a H threshold

15 Proposed Solution ► Maximum time to confusion can be guaranteed if samples are revealed when:  Time since last point of confusion is less than the maximum time to confusion ► Point of confusion is a point where (H_i > H_thresh)  Tracking uncertainty is above the confusion threshold ► (H_i > H_thresh)

16 Proposed Solution ► Adversary may simply cull points with high H ► Path may still be determinable without a single point ► Empirical CDF of reacquisition  Shows what proportion of reacquisitions can occur after a given time gap*  Original time gaps are empirical*  Remember that each minute is one data point in this system

17 Empirical Reacquisition CDF

18 Extension ► Calculate confusion/uncertainty from past ten minutes ► After Maximum Time to Confusion:  Release samples if past 10 minutes contain an aggregate uncertainty value above the threshold ► Before Maximum Time to Confusion:  Release samples if past 10 minutes + all samples from last release contain an aggregate uncertainty value above the threshold*

19 Evaluation ► Added traces from the same drivers over different days to get to desired density ► Simulated high-density and low-density systems with n=2000 and n=500 ► Metrics used to measure privacy were maximum time to confusion and median time to confusion ► Metric used to measure data quality was relative weighted road coverage

20 Evaluation ► Black dots are suppressed, gray dots are released

21 Does it work? ► Looking at it without reacquisition ► Comparing to a baseline of random sampling ► Uncertainty threshold set to H = 0.4 ► H = 0.4 means the tracker needs to believe that the next sample has a 0.92 chance of belonging the correct target*

22 Does it work? (n=2000)

23

24 Does it work? (n=500)

25 Release Quantity

26 Continuing Problems ► No defenses to a-priori knowledge ► Requires a centralized location server ► All users in this system worked at the same site, artificially aiding the algorithm in finding places of high confusion ► Tracker is crude – knowledge of topology may allow for more accurate tracking

27 Takeaway Concepts ► Path entropy can be calculated for intelligent suppression/subsampling of GPS traces ► Tracking can be made more difficult ► Time to Confusion is a useful privacy metric  Breaks links between identifying locations and sensitive locations

28 My Critique My Critique ► No guidance for confusion threshold values ► The algorithm will still fail in low-density situations by obscuring too many data points  They claim low density areas are irrelevant because they are doing traffic management ► They tested using the empirical data they optimized for – where's the cross-validation? ► Does not protect short trips at all

29 My Conclusion ► Anonymity and privacy are difficult, especially because it is volatile and contextual ► Existing methods cope poorly with low density, but are improving ► Early adoption phases will require better low- density methods ► Hot research topic – ACM workshop on network data anonymization coming up if you're interested

30 Questions?

31 Presenter can be reached at jtm10@duke.edu


Download ppt "Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz."

Similar presentations


Ads by Google