Carnegie Mellon Approved for Public Release, Distribution Unlimited Increasing Intrusion Tolerance Via Scalable Redundancy Michael Reiter

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Availability in Globally Distributed Storage Systems
Carnegie Mellon December 2005 SRS Principal Investigator Meeting Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter Natassa.
1 Carnegie Mellon Robust Distributed Services in Embedded Networks Michael Reiter.
TRUST Spring Conference, April 2-3, 2008 Write Markers for Probabilistic Quorum Systems Michael Merideth, Carnegie Mellon University Michael Reiter, University.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
E-Transactions: End-to-End Reliability for Three-Tier Architectures Svend Frølund and Rachid Guerraoui.
Distributed Systems 2006 Styles of Client/Server Computing.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,
Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
The Starfish System: Intrusion Detection and Intrusion Tolerance for Middleware Systems Kim Potter Kihlstrom Westmont College Santa Barbara, CA, USA Priya.
Low-Overhead Byzantine Fault-Tolerant Storage James Hendricks, Gregory R. Ganger Carnegie Mellon University Michael K. Reiter University of North Carolina.
HQ Replication: Efficient Quorum Agreement for Reliable Distributed Systems James Cowling 1, Daniel Myers 1, Barbara Liskov 1 Rodrigo Rodrigues 2, Liuba.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Objectives Functionalities and services Architecture and software technologies Potential Applications –Link to research problems.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Chapter 6 Distributed File Systems Summary Bernard Chen 2007 CSc 8230.
Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May.
Agile Survivable Store PIs: Mustaque Ahamad, Douglas M. Blough, Wenke Lee and H.Venkateswaran PhD Students: Prahlad Fogla, Lei Kong, Subbu Lakshmanan,
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor.
Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Greg Ganger Natassa9 Ailamaki Mike Reiter Priya Narasimhan Chuck.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Spring 2003CS 4611 Replication Outline Failure Models Mirroring Quorums.
Hwajung Lee.  Improves reliability  Improves availability ( What good is a reliable system if it is not available?)  Replication must be transparent.
Distributed Storage Systems: Data Replication using Quorums.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Database Laboratory Regular Seminar TaeHoon Kim Article.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Seminar On Rain Technology
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Chapter 1 Characterization of Distributed Systems
Intrusion Tolerant Architectures
Slide credits: Thomas Kao
Dynamo: Amazon’s Highly Available Key-value Store
Consistency and Replication
Active replication for fault tolerance
Replication and Recovery in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Systems CS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
EEC 688/788 Secure and Dependable Computing
Ch 6. Summary Gang Shen.
Presentation transcript:

Carnegie Mellon Approved for Public Release, Distribution Unlimited Increasing Intrusion Tolerance Via Scalable Redundancy Michael Reiter Anastasia Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor

Carnegie Mellon Approved for Public Release, Distribution Unlimited The Problem Space Distributed services manage redundant state across servers to tolerate faults We consider tolerance to Byzantine faults, as might result from an intrusion into a server or client  A faulty server or client may behave arbitrarily We also make no timing assumptions in this work  An “asynchronous” system

Carnegie Mellon Approved for Public Release, Distribution Unlimited Our Goals To design, implement and evaluate new protocols for implementing intrusion-tolerant services that scale better  Here, “scale” refers to efficiency as number of servers and number of failures tolerated grows Targeting three types of services  Read-write data objects  Custom “flat” object types for particular applications, notably directories for implementing an intrusion-tolerant file system  Arbitrary objects that support object nesting

Carnegie Mellon Approved for Public Release, Distribution Unlimited Expected Impact Significant efficiency and scalability benefits over today’s approaches to intrusion tolerance For example, for data services, we anticipate  At-least twofold latency improvement even at small configurations (e.g., tolerating 3-5 Byzantine server failures) over current best  And improvements will grow as system scales up  A twofold improvement in throughput, again growing with system size Without such improvements, intrusion tolerance will remain relegated to small deployments in narrow application areas

Carnegie Mellon Approved for Public Release, Distribution Unlimited Outline Concepts Challenges Techniques Systems Technology transfer

Carnegie Mellon Approved for Public Release, Distribution Unlimited Concepts: Distributed Services Service, or object, abstractionImplementation pushpopsort invocation response

Carnegie Mellon Approved for Public Release, Distribution Unlimited Concepts: Linearizability [Herlihy & Wing 1991] A strong and accepted semantics for shared objects  mimics semantics of a centralized object implementation  each method appears to be executed at a distinct point between its invocation and response time c1c1 c2c2 Object invocations Apparent execution

Carnegie Mellon Approved for Public Release, Distribution Unlimited inv Concepts: State Machine Replication Offers no load dispersion, and degrades as system scales Servers inv

Carnegie Mellon Approved for Public Release, Distribution Unlimited Concepts: Wait-Freedom [Herlihy 1990] A liveness property for object invocations Informally, an implementation is wait-free if any client’s operation is guaranteed to complete  Assuming a limit on the number of faulty servers [Jayanti et al.]  But not assuming a limit on the number of faulty clients Intuitively, wait-freedom precludes synchronization mechanisms that must be “unlocked” by a client Only read-write objects can be implemented in a wait-free way  Virtually any other object cannot (in an asynchronous system)

Carnegie Mellon Approved for Public Release, Distribution Unlimited Challenges: Concurrency Concurrent updates can violate linearizability Data Servers 45123

Carnegie Mellon Approved for Public Release, Distribution Unlimited Challenges: Server Failures Can attempt to mislead clients  Typically addressed by “voting” Servers ???? ’

Carnegie Mellon Approved for Public Release, Distribution Unlimited 54 Challenges: Client Failures Byzantine client failures can also mislead clients  Typically addressed by submitting a request via an agreement protocol Servers Data? 1234’?2’

Carnegie Mellon Approved for Public Release, Distribution Unlimited Challenges: Object Nesting Distributed objects have stubs and replicas Servers

Carnegie Mellon Approved for Public Release, Distribution Unlimited Challenges: Object Nesting

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Versioning D 0 determined complete, returned Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 D0D0 D1D1 Ø D0D0 T1T1 Client read operation after T Ø D0D0 D 1 latest candidateD 1 incompleteD 0 latest candidate 3 writes required

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Repair Time ØØØØØ D0D0 D0D0 D0D0 D1D1 T0T0 T1T1 T2T2 D0D0 D1D1 D2D2 T2T2 Client read operation after T 2 D2D D2D2 D2D2 D2D2 Unreachable D 2 unclassifiableRepair D 2 D2D2 D2D2 D2D2 D2D2 Return D 2 D 2 latest candidate

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Quorum Systems A quorum system is a data redundancy technique that supports load dispersion among servers Only a subset of servers are accessed in each operation Ex: Grid with n =49, b =3

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Cross Checksums [Gong 1989] A mechanism for defending against Byzantine servers that attempts to alter data in their possession  Each data fragment is appended with a hash of all data fragments  When retrieved, hashes are used as “votes” to determine correct data fragments Data-item Data-fragments Hashes Cross checksum

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Validating Timestamps A technique for defending against Byzantine clients that attempt to write different data values at the same timestamp  Cross-checksum of write value recorded in its timestamp  Read results are used to regenerate all data fragments and compare them to the timestamp Hashes Cross checksum All data-fragments Data-item Hash in timestamp Timestamp Read results

Carnegie Mellon Approved for Public Release, Distribution Unlimited Techniques: Replicated Invocation  b stub replicas cannot invoke > b stub replicas can

Carnegie Mellon Approved for Public Release, Distribution Unlimited Our Research To summarize, we will explore the use of these techniques for implementing  Read-write block storage (linearizable, wait-free)  Specialized metadata objects (e.g., directories) necessary to construct a fully functional file system (linearizable)  A general framework for arbitrary deterministic objects (linearizable) Not all techniques will be appropriate for all cases  “Flat” objects as found in file systems will generally not utilize replicated clients  Nested objects may not benefit from versioning (TBD)

Carnegie Mellon Approved for Public Release, Distribution Unlimited Systems: PASIS PASIS is a survivable storage system developed in a DARPA IPTO project  Funding ended December 2003 Examined the use of encoding schemes for efficiently distributing data storage while protecting confidentiality/integrity Did not address concurrency control  Clients would have to handle explicitly, e.g., using locking Explored use of versioning for other purposes: recovery from user mistakes, system failures, penetrations  Showed viability of comprehensive versioning

Carnegie Mellon Approved for Public Release, Distribution Unlimited Systems: Fleet Fleet is a Java-based distributed object architecture developed in previous projects in DARPA ATO  Funding ended June 2004 Focused on the use of quorum systems for efficient object replication Fleet does not support nested objects and nested method invocations Nor does it support potentially faulty clients

Carnegie Mellon Approved for Public Release, Distribution Unlimited Technology Transition Two primary channels are the industry consortia of two research centers at Carnegie Mellon: CyLab and the Parallel Data Lab CyLab  A center focused on trustworthy and measurable computing  Founded in 2003 through the merger of the Center for Computer and Communications Security and the Sustainable Computing Consortium  Corporate affiliate program includes over fifty companies, including defense suppliers, tech companies and IT-based critical infrastructures Parallel Data Lab  A ten-year-old center focused on storage infrastructures  Corporate affiliates include most major storage vendors Both have a track record of technology transfer

Carnegie Mellon Approved for Public Release, Distribution Unlimited Questions?