Building and Programming the Cloud, Mysore, Jan 2010 1 Accountable distributed systems and the accountable cloud Peter Druschel joint work with Andreas.

Slides:



Advertisements
Similar presentations
Byzantine Generals. Outline r Byzantine generals problem.
Advertisements

Accountable systems or how to catch a liar? Jinyang Li (with slides from authors of SUNDR and PeerReview)
Cryptography and Network Security 2 nd Edition by William Stallings Note: Lecture slides by Lawrie Brown and Henric Johnson, Modified by Andrew Yang.
© 2010 Andreas Haeberlen 1 Accountable Virtual Machines OSDI (October 4, 2010) Andreas Haeberlen University of Pennsylvania Paarijaat Aditya Rodrigo Rodrigues.
SecureMR: A Service Integrity Assurance Framework for MapReduce Wei Wei, Juan Du, Ting Yu, Xiaohui Gu North Carolina State University, United States Annual.
1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
Reliable Client Accounting for P2P-Infrastructure Hybrids Paarijaat Aditya †, Ming-Chen Zhao ‡, Yin Lin *, Andreas Haeberlen ‡, Peter Druschel †, Bruce.
LADIS workshop (Oct 11, 2009) A Case for the Accountable Cloud Andreas Haeberlen MPI-SWS.
Addressing spam and enforcing a Do Not Registry using a Certified Electronic Mail System Information Technology Advisory Group, Inc.
P. Kouznetsov, 2006 Abstracting out Byzantine Behavior Peter Druschel Andreas Haeberlen Petr Kouznetsov Max Planck Institute for Software Systems.
A. Haeberlen Having your Cake and Eating it too: Routing Security with Privacy Protections 1 HotNets-X (November 15, 2011) Alexander Gurney * Andreas Haeberlen.
1 Steve Chenoweth Friday, 10/21/11 Week 7, Day 4 Right – Good or bad policy? – Asking the user what to do next! From malware.net/how-to-remove-protection-system-
Lecture 2 Page 1 CS 236, Spring 2008 Security Principles and Policies CS 236 On-Line MS Program Networks and Systems Security Peter Reiher Spring, 2008.
1 Cryptography and Network Security Third Edition by William Stallings Lecturer: Dr. Saleem Al_Zoubi.
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
SRG PeerReview: Practical Accountability for Distributed Systems Andreas Heaberlen, Petr Kouznetsov, and Peter Druschel SOSP’07.
NSDI (April 24, 2009) © 2009 Andreas Haeberlen, MPI-SWS 1 NetReview: Detecting when interdomain routing goes wrong Andreas Haeberlen MPI-SWS / Rice Ioannis.
University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.
1 Modeling and Analysis of Networked Secure Systems with Application to Trusted Computing Jason Franklin Joint work with Deepak Garg, Dilsun Kaynar, and.
Vigilante: End-to-End Containment of Internet Worms M. Costa et al. (MSR) SOSP 2005 Shimin Chen LBA Reading Group.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
Cryptography and Network Security Third Edition by William Stallings Lecture slides by Lawrie Brown.
Failure Avoidance through Fault Prediction Based on Synthetic Transactions Mohammed Shatnawi 1, 2 Matei Ripeanu 2 1 – Microsoft Online Ads, Microsoft Corporation.
1 Lecture 18: Security issues specific to security key management services –privacy –integrity/authentication –nonrepudiation/plausible deniability.
Csci5233 Computer Security1 Bishop: Chapter 10 Key Management: Digital Signature.
Advances in Language Design
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Dr. Lo’ai Tawalbeh 2007 INCS 741: Cryptography Chapter 1:Introduction Dr. Lo’ai Tawalbeh New York Institute of Technology (NYIT) Jordan’s Campus
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
Cryptography and Network Security
Eng. Wafaa Kanakri Second Semester 1435 CRYPTOGRAPHY & NETWORK SECURITY Chapter 1:Introduction Eng. Wafaa Kanakri UMM AL-QURA UNIVERSITY
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Project Presentation Students: Yan Michalevsky Asaf Cidon Supervisors: Alexander Shraer Assoc. Prof. Idit Keidar.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
NSDI (April 24, 2009) © 2009 Andreas Haeberlen, MPI-SWS 1 NetReview: Detecting when interdomain routing goes wrong Andreas Haeberlen MPI-SWS / Rice Ioannis.
Accountability Aditya Akella. Outline Accountable Virtual Machines Accountability in and via SDN.
Presented by Keun Soo Yim March 19, 2009
Version Advanced User Training. Instructions This training module contains additional key concepts that are an extension to the concepts in the.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Slide 1 © 2004 Reactivity The Gap Between Reliability and Security Eric Gravengaard Reactivity.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Stamping out worms and other Internet pests Miguel Castro Microsoft Research.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
Byzantine fault tolerance
BFTW 3 workshop (Sep 22, 2009)© 2009 Andreas Haeberlen 1 The Fault Detection Problem Andreas Haeberlen MPI-SWS Petr Kuznetsov TU Berlin / Deutsche Telekom.
A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.
SIGCOMM 2012 (August 16, 2012) Private and Verifiable Interdomain Routing Decisions Mingchen Zhao * Wenchao Zhou * Alexander Gurney * Andreas Haeberlen.
Topic 1 – Introduction Huiqun Yu Information Security Principles & Applications.
PwC New Technologies New Risks. PricewaterhouseCoopers Technology and Security Evolution Mainframe Technology –Single host –Limited Trusted users Security.
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
PeerReview: Practical Accountability for Distributed Systems SOSP 07.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
SOSP 2007 © 2007 Andreas Haeberlen, MPI-SWS 1 Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University Petr Kuznetsov.
TRUST Self-Organizing Systems Emin G ü n Sirer, Cornell University.
Database Laboratory Regular Seminar TaeHoon Kim Article.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Managing Data Resources File Organization and databases for business information systems.
Problem: Internet diagnostics and forensics
Distributed Systems – Paxos
Phillipa Gill University of Toronto
Accountable Virtual Machines
ONLINE SECURE DATA SERVICE
Security Principles and Policies CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
Presentation transcript:

Building and Programming the Cloud, Mysore, Jan Accountable distributed systems and the accountable cloud Peter Druschel joint work with Andreas Haeberlen 1, Petr Kuznetsov 2, Rodrigo Rodrigues 1 University of Pennsylvania 2 TU Berlin/Deutsche Telekom Labs

2 Outline Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion Building and Programming the Cloud, Mysore, Jan 2010

3 What is the problem? Building and Programming the Cloud, Mysore, Jan 2010 Multiple administrative domains (federated, p2p) Multiple stakeholders (hosting, Web) different actors, somewhat different interests lack of global visibility, control Complex faults software faults, mis-configuration, negligence, disgruntled employees, outside attacks, manipulation Lack of transparency

4 Learning from the 'offline' world Relies heavily on accountability to deal with faults, misbehavior Example: Banking Record can be used to (manually) detect problems identify the responsible party convince that a problem does (not) exist RequirementSolution CommitmentSigned receipts Tamper-evident recordDouble-entry bookkeeping InspectionsAudits Building and Programming the Cloud, Mysore, Jan 2010

5 What does accountability mean in distributed systems? 1. Tamper-evident record of each node‘s actions 2. (Automated) audit for fault detection, localization 3. Evidence to convince a third party that a fault has (not) occured Accountability provides transparency trust incentives to avoid faults Building and Programming the Cloud, Mysore, Jan 2010

6 Outline Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion Building and Programming the Cloud, Mysore, Jan 2010

7 Ideal accountability Whenever a node is faulty, the system generates a proof of misbehavior against that node Fault := Node deviates from expected behavior Our goal is to automatically detect faults identify the faulty nodes convince others that a node is (or is not) faulty Can we build a system that provides the following guarantee? Building and Programming the Cloud, Mysore, Jan 2010

8 Can we detect all faults? Problem: Faults that affect only a node's internal state Would require online trusted probes at each node Focus on observable faults: Faults that affect a correct node Can detect observable faults without requiring trusted components A A X C C Building and Programming the Cloud, Mysore, Jan 2010

9 Can we always get a proof? Problem: He-said-she-said Three possible causes: A never sent X B refuses to acknowledge X X was delayed by the network Cannot get proof of misbehavior! Generalize to verifiable evidence: a proof of misbehavior, or a challenge that a faulty node cannot answer What if the challenged node does not respond? Does not prove a fault, but node is suspected until it responds A A X B B C C ? I sent X! I never received X! ?! Building and Programming the Cloud, Mysore, Jan 2010

10 Practical accountability Requirement for an accountable distributed system: This is useful Any (!) fault that affects a correct node is eventually detected and linked to a faulty node It can be implemented in practice Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node Building and Programming the Cloud, Mysore, Jan 2010

11 Outline Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion Building and Programming the Cloud, Mysore, Jan 2010

12 Adds accountability to a given system Implemented as a library Provides tamper-evident record Detects faults via state-machine replay Assumptions: PeerReview 1. Nodes can be modeled as deterministic state machines 2. There is a trusted reference implementation of the state machines 3. Correct nodes can eventually communicate 4. Nodes can sign messages Building and Programming the Cloud, Mysore, Jan 2010

13 PeerReview is widely applicable App #1: NFS server in the Linux kernel Many small, latency-sensitive requests Tampering with files Lost updates App #2: Overlay multicast Transfers large volume of data Freeloading Tampering with content App #3: P2P Complex, large, decentralized Denial of service Attacks on DHT routing Details in [Haeberlen et al., SOSP’07] NetReview [ Haeberlen et al. NSDI’08 ] Metadata corruption Incorrect access control Censorship Building and Programming the Cloud, Mysore, Jan 2010

14 How much does PeerReview cost? Log storage 10 – 100 GByte per month, depending on application Message signatures Message latency (e.g. 1.5ms RTT with RSA-1024) CPU overhead (embarrassingly parallel) Log/authenticator transfer, replay overhead Depends on # witnesses Can be deferred to exploit bursty/diurnal load patterns Building and Programming the Cloud, Mysore, Jan 2010

15 Outline Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion Building and Programming the Cloud, Mysore, Jan 2010

Split administration in the Cloud Bug in Alice‘s software Subtle differences between Alice and Bob‘s environments Alice Bob Alice's customers Bug in Bob‘s software Insufficient resource allocation Hacker attack... What if there is a problem? Building and Programming the Cloud, Mysore, Jan 2010

Split administraction: Alice‘s perspective 17 Building and Programming the Cloud, Mysore, Jan 2010 Alice Alice's customers ? ? ? ? ? ? ? ? Bob If something is wrong, how will I know? How can I tell if it's my software or the cloud? If it's the cloud, how can I convince Bob?

If something is wrong, how will I know? How can I tell if it's my software or the cloud? If it's the cloud, how can I convince Bob? Split administraction: Bob's perspective 18 Building and Programming the Cloud, Mysore, Jan 2010 Alice Bob Alice's customers ? ? ???????? ??? If something is wrong, how will I know? How can I tell if it's the cloud or Alice's software? If it's Alice's software, how can I convince Alice?

An idealized solution What if we had an oracle that Alice and Bob could ask about problems? Completeness: If the cloud is faulty, the oracle will say so Accuracy: If the cloud is not faulty, the oracle will say so Verifiability: The oracle produces evidence that would convince a third party 19 Building and Programming the Cloud, Mysore, Jan 2010 Alice Bob Alice's customers Oracle

The accountable cloud Idea: Make cloud accountable Cloud records its actions in a tamper-evident log Alice can audit the log and check for faults Use log to construct evidence that a fault does (not) exist Should work even if one party was compromised! 20 Building and Programming the Cloud, Mysore, Jan 2010 Alice Bob Alice's customers Tamper-evident log

Discussion Is this too pessimistic? Cloud isn't malicious! Hacker attacks, software bugs, operator error, malicious client, … Difficult to come up with a more restrictive fault model Without provable properties, evidence has little value Why would a provider want to deploy this? Attractive to prospective customers (peace of mind) Helps in handling customer complaints, resolve disputes 21 Building and Programming the Cloud, Mysore, Jan 2010

22 Outline Why accountability? A definition A practical implementation: PeerReview Accountability in the Cloud Technical Challenges Conclusion Building and Programming the Cloud, Mysore, Jan 2010

Is the technology ready? Cloud accountability should Have provable guarantees Work for most cloud applications Require no changes to application code Cover a wide spectrum of properties Have reasonable overhead Can existing techniques deliver this? CATS, Repeat&Compare, AIP, PeerReview, NetReview, AudIt,... More work is needed! 23 Building and Programming the Cloud, Mysore, Jan 2010 ? ? ?

Work in progress: AVM Goal: Provide accountability for arbitrary binary executables Idea: Accountable virtual machine (AVM) Cloud records enough data to enable deterministic replay Alice can replay log against a reference implementation Can audit any part of the hosted execution 24 Building and Programming the Cloud, Mysore, Jan 2010 AliceBob Virtual machine

Challenges Complete state-machine replay expensive limit to spot checks, investigation of suspected faults multi-core replay is hard replay log against an abstract model? Checking performance properties Checking information flow Lots of research opportunities 25 Building and Programming the Cloud, Mysore, Jan 2010

Summary Accountability is a useful capability in distributed systems tamper-evident record fault detection and localization evidence Proposal: the accountable cloud Can verify correct operation, produce evidence Provable guarantees  solid foundation for both players Challenges remain 26 Questions? Building and Programming the Cloud, Mysore, Jan 2010