Presentation is loading. Please wait.

Presentation is loading. Please wait.

APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.

Similar presentations


Presentation on theme: "APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August."— Presentation transcript:

1 APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August 13 th, 2015 @PIR 2015, Santiago, Chile. 1

2 Introduction & Motivation Web search query logs are important and valuable for IR research. Many recent IR methodologies are developed or inspired from the analysis of user behavior in search query logs. However, these query logs contain sensitive data, which makes them difficult to be released directly even for research purposes. In 2006, AOL released a piece of query log without adequate anonymization, which lead to severe social and legal issues. More companies can release their query logs if adequate privacy protection can be placed. 2

3 Query Log Releasing A general query log releasing big picture for search engine: 3

4 Query Log Releasing Existing approaches on query log Releasing: Deletion. Log Deletion, Hashing Queries, Identifier Deletion, Hashing Identifiers, Scrubbing Query Content, Deleting Infrequent Queries and Shortening Sessions etc. Proved to be not private enough by recent works K-Anonymity. Need certain assumption about the adversary. Differential Privacy. A stronger privacy notion. Previous work: approximate differential privacy. We: pure differential privacy. 4

5 Query Log Releasing This workshop paper introduces our ongoing research project on this privacy preserving query log releasing problem. This is a one time release of the query log in a non-interactive setting. In this work, we propose a framework using differential privacy on query logs to guarantee high levels of privacy which achieves pure(ε)-differential privacy. And we make document retrieval experiments based on the released query log to show that our query log releasing algorithm is still very useful for IR tasks while preserving privacy. 5

6 Project Framework Dataset We use the AOL 2006 query log dataset. Split into two parts: Q (for release algorithm’s input) and Q test (for evaluation). Query Log Releasing Algorithm Use Q as input, and release the anonymized query log Q’. Document Retrieval Use Q’ to help document retrieval for queries in Q test. Actual clicked documents in Q test forms the ground truth table. Evaluations Compare retrieved results with the ground truth table to evaluate. IR Metrics: nDCG@10, Precision@1, Recall@10, etc. 6

7 Project Framework 7

8 Major Steps in the Releasing Algorithm Sensitive information removal. Limiting amount of search queries for each of the users in the input query log. Extend the query candidates for releasing by an external query pool. Select queries to release base on the query counts with Laplacian noise. Release query counts and click counts with Laplacian noise added. Release query transitions information, which preserves some sequential information of the search sessions. 8

9 Experiments and Privacy Guarantees We proved that the (anonymized) userID attribute of the search logs can not be released to public if we want to achieve user level differential privacy. We also proof that our approach achieve pure differential privacy. Our experiments are based on the document retrieval task using the released query log, and with varying privacy guarantees. Furthermore, we can propose recommendations for commercial search engines about their future query log release using our framework. 9

10 Evaluations & Results A natural baseline do document retrieval with the original (not private) query log. the k-anonymity approach from Carpineto and Romano [5] The parameter k in k-anonymity means, only those queries appear in at least k different users can be released. # Evaluated Queries are size of common queries between Q’ and Q test. 10

11 Conclusions This project addressed the important security concerns in this query log releasing task. We present our ε-differential private algorithm to release query logs and make experiments to examine how useful the released query logs are. In this paper, we evaluate the IR utility of our query log releasing schemes based on the document retrieval task. Experiments show that our released query log is still very useful for document retrieval, and it outperforms the k- anonymity releasing scheme in both privacy and utility. 11

12 Conclusions (Cont’) More comparative experiments in our project also illustrates the privacy-utility trade-off in query log releasing process. Specifically, the stricter privacy standard we require, the lower utility we can maintain from the released query log. Since the high level privacy has been guaranteed by our ε- differential private query log releasing algorithm, we may recommend those commercial search engines to use softer parameter settings in our algorithm in order to maintain high utility of the released query log. We believe this project is an important step towards a final solution of releasing web search logs. 12

13 Thanks! Presenter: Jiyun Luo 1 st Author: Sicong Zhang Email: sz303@georgetown.edu Georgetown University @PIR 2015, Santiago, Chile. 13

14 Q&A 14

15 Q&A The format of the Log Q’ query, URL, click counts 15

16 Laplacian distribution 16


Download ppt "APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August."

Similar presentations


Ads by Google