PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Network Security Highlights Nick Feamster Georgia Tech.
Advertisements

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
PIR-Tor: Scalable Anonymous Communication Using Private Information Retrieval Prateek Mittal University of Illinois Urbana-Champaign Joint work with: Femi.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Authors Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman Presented by: Jonathan di Costanzo & Muhammad Atif Qureshi 1.
Mudhakar Srivatsa, Ling Liu and Arun Iyengar Presented by Mounica Atluri.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
A. Haeberlen Having your Cake and Eating it too: Routing Security with Privacy Protections 1 HotNets-X (November 15, 2011) Alexander Gurney * Andreas Haeberlen.
1 Aug. 3 rd, 2007Conference on and Anti-Spam (CEAS’07) Slicing Spam with Occam’s Razor Chris Fleizach, Geoffrey M. Voelker, Stefan Savage University.
Zhang Fu, Marina Papatriantafilou, Philippas Tsigas Chalmers University of Technology, Sweden 1 ACM SAC 2010 ACM SAC 2011.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
A. Haeberlen Differential Privacy Under Fire 1 USENIX Security (August 12, 2011) Andreas Haeberlen Benjamin C. Pierce Arjun Narayan University of Pennsylvania.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Wide-scale Botnet Detection and Characterization Anestis Karasaridis, Brian Rexroad, David Hoeflin.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
2. Attacks on Anonymized Social Networks. Setting A social network Edges may be private –E.g., “communication graph” The study of social structure by.
ISP 433/633 Week 6 IR Evaluation. Why Evaluate? Determine if the system is desirable Make comparative assessments.
SybilGuard: Defending Against Sybil Attacks via Social Networks Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman Presented by Ryan.
Preserving Privacy in Clickstreams Isabelle Stanton.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Botnets Uses, Prevention, and Examples. Background Robot Network Programs communicating over a network to complete a task Adapted new meaning in the security.
 Collection of connected programs communicating with similar programs to perform tasks  Legal  IRC bots to moderate/administer channels  Origin of.
Active Learning for Class Imbalance Problem
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
Speaker:Chiang Hong-Ren Botnet Detection by Monitoring Group Activities in DNS Traffic.
Fast Portscan Detection Using Sequential Hypothesis Testing Authors: Jaeyeon Jung, Vern Paxson, Arthur W. Berger, and Hari Balakrishnan Publication: IEEE.
Paper Presentation – CAP Page 2 Outline Review - DNS Proposed Solution Simulation Results / Evaluation Discussion.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Rob Sherwood CS244 Lecture 8: Sound Strategies For Internet Measurement.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
BitTorrent Nathan Marz Raylene Yung. BitTorrent BitTorrent consists of two protocols – Tracker HTTP protocol (THP) How an agent joins a swarm How an agent.
Protecting Sensitive Labels in Social Network Data Anonymization.
Reasoning about Information Leakage and Adversarial Inference Matt Fredrikson 1.
Systems and Internet Infrastructure Security (SIIS) LaboratoryPage Systems and Internet Infrastructure Security Network and Security Research Center Department.
Lecture 1 Page 1 CS 239, Fall 2010 Distributed Denial of Service Attacks and Defenses CS 239 Advanced Topics in Computer Security Peter Reiher September.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.
How Others Compromise Your Location Privacy: The Case of Shared Public IPs at Hotspots N. Vratonjic, K. Huguenin, V. Bindschaedler, and J.-P. Hubaux PETS.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
SybilGuard: Defending Against Sybil Attacks via Social Networks.
On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge NDSS 2015, Shouling Ji, Georgia Institute of Technology.
Motivation: Finding the root cause of a symptom
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Modeling and Measuring Botnets David Dagon, Wenke Lee Georgia Institute of Technology Cliff C. Zou Univ. of Central Florida Funded by NSF CyberTrust.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
Constructing Inter-Domain Packet Filters to Control IP Spoofing Based on BGP Updates Zhenhai Duan, Xin Yuan Department of Computer Science Florida State.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
1 On the Impact of Route Monitor Selection Ying Zhang* Zheng Zhang # Z. Morley Mao* Y. Charlie Hu # Bruce M. Maggs ^ University of Michigan* Purdue University.
Private Data Management with Verification
Antonis Papadimitriou, Arjun Narayan, Andreas Haeberlen
Authors – Johannes Krupp, Michael Backes, and Christian Rossow(2016)
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Privacy-preserving Release of Statistics: Differential Privacy
Designing Private Forums
Differential Privacy in Practice
“Location Privacy Protection for Smartphone Users”
Published in: IEEE Transactions on Industrial Informatics
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania

Motivation: Internet-wide threats 1 Internet-wide threats: Example: Botnet detection, DDoS backtrace, … Bots scattered in many domains But victims only see local ‘views’. AS5 AS2 AS3 AS4 AS1 Bob Spoofed traffic bot traffic Who is attacking me?

Having multiple data sources helps 2 Detect attacks using multiple domains’ data Multiple data sources are better than one! Example: DDoS detection with 98% accuracy on four domains’ data [Chen-TPDS-2007] Bob Query AS5 AS2 AS3 AS4 AS1

Simple to write, hard to implement 3 Toy example: top ASes that generate darknet traffic: SELECT TOP 10 flow.SourceAS FROM JOIN Internet BY FlowID WHERE flow.destIP IN Darknet Privacy concern: all data is not available in a single place! Bob Top ASes with illegal traffic? AS5 AS2 AS3 AS4 AS1

An Internet “knowledge plane” 4 A long-standing vision [Clark-SIGCOMM-2003] Internet produces data about itself Allow real-time queries on metadata You can know what is happening where, when Benefits: DDoS backtrace, botnet analysis, distributed troubleshooting, distributed forcasting… AS5 AS2 AS3 AS4 AS1

What does it take to make this work? 5 Domains produce data about their operations. Domains use similar data formats. Domains allow each other to query their data. AS5 AS2 AS3 AS4 AS1 NetFlow SFlow IPFIX Sampled NetFlow

Why are domains reluctant to share data? 6 Privacy is difficult even if you have the best intentions Even after anonymization (Netflix de-anonymization case) Or aggregation (auxiliary information attack) To make a ‘knowledge plane’ work, we need strong privacy guarantees! Idea: differential privacy. Netflix de-anonymization AOL searcher exposed

Differential privacy 7 Differential privacy: What: provide very strict privacy guarantee for individuals. ‘Worst-case’ adversary Tunable amount of privacy Composable query costs But, there are caveats too: Limited query budget. Gives noised answer. Distributed DP is hard. … Differential privacy: a good candidate? Our hypothesis: Yes!

Outline - Motivation - Challenges - PRISM: Private Retrieval of the Internet’s Sensitive Metadata - The vision - Do we have enough budget? - What about data quality? - Can we deal with attackers? - Can we answer all types of queries? - What about privacy for ISPs? - Conclusion 8

PRISM: differential privacy on Internet data 9 PRISM: a system sketch Domains keep their data local. PRISM nodes manage local data and answer queries. Query answers released with differential privacy. Result: private Internet knowledge plane

Background: Differential privacy 10 How: noise query answer before release E.g., noise drawn from a Laplace distribution parameterized by ε. ε: privacy parameter; larger values = more privacy release. Guarantee: Query answer on ‘neighboring databases’ are very similar. We can view ε as a privacy budget: The total amount of privacy we are willing to release. Each query uses up some budget. Refuse further queries once budget is depleted.

Challenges 11 Do we have enough budget? Can we detect attacks with noised data? What about compromised PRISM nodes? Does PRISM provide privacy for ISPs, too? Would PRISM work with a partial deployment? Can we make all queries differentially private? Would PRISM’s query processor scale? … See paper

The privacy budget 12 The budget problem: ɛ sets a hard limit on how many queries PRISM can answer. Many ways to set ɛ [e.g., Hsu-CSF-2013] No matter how large, budget eventually runs out.

Challenge #1: enough budget? 13 The Internet data presents unique opportunities! Large size: queries cost less. E.g., counting queries about IP addresses. Assume that the answer is 40 million, we want released answer to be 10% within true answer with 95% confidence N = 667,616. Per ISP: ~10 queries.

Challenge #1: enough budget? 14 Sampling: reduces query cost Internet data is typically sampled, e.g., NetFlow is typically sampled at 1/4K. Theoretical result: sampling at rate α reduces cost to α*ε. We further sample NetFlow records by ~50%. Per ISP: ~100,000 queries.

Challenge #1: enough budget? 15 We probably don’t have a worst-case adversary! ISPs are competitors, so won’t collude on a large scale. Conservatively, if no two ISPs collude, we can give each ISP its own budget. This scales up budget significantly. Even there are small-scale collusions, per ISP: 400 million queries are within reach (1K queries per ISP per day for 1,000 years.)

Challenge #1: enough budget? 16 Can we replenish the budget? Internet data is fast changing E.g., many flows expire in seconds E.g., IP-to-user mappings also change E.g., 40% of /24 address blocks are dynamic Eventually, the DB may become entirely different, e.g., in 100 years, most users should be different. There should be opportunity for replenishing the budget when users are completely different.

Challenge #2: data quality? 17 The data quality problem: if DP adds noise, can we still detect attacks accurately? DP’s noise is easy to interpret! Well-known distribution: Laplace. Dealing with imprecision: well understood topic. Works on true data: instead of inferred data. We are looking for large trends, e.g., DDoS, bots.

Challenge #3: compromised nodes? 18 What if PRISM nodes are compromised? There are things we can do, too! Hackers are unlikely to take over the majority of nodes. Quality-checking can be integrated with queries. [Reed ICFP] Queries answers can be released verifiably [Narayan Eurosys]

Other challenges 19 Challenge #4: Difficult queries Challenge #5: Privacy for ISPs Challenge #6: Partial deployment Challenge #7: Scaling the query processor … Please read paper for details.

Conclusion 20 Motivation: Internet-wide threats Primary challenge: privacy concern Proposal: PRISM Differential privacy for Internet data Feasibility Privacy budget Noised data for detection? Compromised nodes? … Questions?