Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preserving Privacy in Clickstreams Isabelle Stanton.

Similar presentations


Presentation on theme: "Preserving Privacy in Clickstreams Isabelle Stanton."— Presentation transcript:

1 Preserving Privacy in Clickstreams Isabelle Stanton

2 Outline Introduction Prior Work My Solution Future Work

3 Introduction Useful data, search algorithms, search optimization, ad targeting etc AOL Scandal: 651,000 users over 3 months = 21 million queries Current work – Input perturbation: k-anonymity – Output perturbation: Laplacian noise and sensitivity – User published: Psuedorandom Sketches

4 k-anonymity “Hiding in a crowd of size k” Given a set of attributes find a quasi-identifier Make sure each quasi-identifier appears at least k times Prevents attacks by linking other available datasets Problem: each query + click + approximate time is a quasi-identifier

5 Example: 2-anonymized GenderZipB’day M229031964 M229041964 F229031983 M229031983 GenderZipB’day M2290*1964 M2290*1964 *229031983 *229031983

6 Laplacian Noise How do you add noise to a text answer in a sensible way: i.e. ‘What is the most popular query term?’ Sensitivity: Query: “How many users searched for x, y and z”? has sensitivity 1 but can uniquely identify a user.

7 Psuedorandom Sketches Want to publish an attribute with value v Create a vector where each entry represents one possible value of the attribute, 1 means yes, 0 means no Flip each bit in the vector with prob. p = ½ - ε This vector is generated by a p-biased psuedorandom function. Publish the input to this function that generates your vector.

8 Psuedorandom Sketches Privacy guarantee: Problem: There is a one to one correspondence between #sketches published and #attributes published Also, possible number of query/clickstreams: – 100,000,000 webpages – 1000 words in a query (Google API) – ~1,000,000 English words – = 2 100,000,000,000,000,000

9 Proposal Publish multiple values per sketch Benefits: doesn’t reveal length of search history, makes vector MUCH smaller – 1,000,000,000,000,000 entries – Can make this smaller Cons: Need to check the privacy guarantees, slight problem with ordering

10 Multiple Values in Sketches Assume a user publishes between 0 and h queries in one sketch Privacy with one sketch: Privacy with l sketches: Minimum length of sketch, M = # users, τ = probability of failure:

11 Search Personalization Sketches can’t be used for this (unless you did something wrong) Can construct clusters of people from sketches

12 Future Work Create a system that makes this easy for users Improve search personalization Find appropriate balance of M, h and ε

13 Questions?


Download ppt "Preserving Privacy in Clickstreams Isabelle Stanton."

Similar presentations


Ads by Google