Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa,

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

GENI Experiment Control Using Gush Jeannie Albrecht and Amin Vahdat Williams College and UC San Diego.
CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.
Reliability on Web Services Presented by Pat Chan 17/10/2005.
A FIT Event Broker for trustworthy infrastructure monitoring and management António Casimiro University of Lisbon Faculty of Sciences LASIGE – Navigators.
Extensible Networking Platform IWAN 2005 Extensible Network Configuration and Communication Framework Todd Sproull and John Lockwood
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
SANE: A Protection Architecture for Enterprise Networks Offense by: Amit Mondal Bert Gonzalez.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 30 Slide 1 Security Engineering.
Replication Management using the State-Machine Approach Fred B. Schneider Summary and Discussion : Hee Jung Kim and Ying Zhang October 27, 2005.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Passage Three Introduction to Microsoft SQL Server 2000.
1 Integrating a Network IDS into an Open Source Cloud Computing Environment 1st International Workshop on Security and Performance in Emerging Distributed.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
BFT3W'091 Intrusion Tolerance: The Killer App for BFT (?) Alysson Bessani, Miguel Correia, Paulo Sousa, Nuno Ferreira Neves, Paulo Veríssimo Universidade.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
1 © 2001, Cisco Systems, Inc. All rights reserved. Cisco Info Center for Security Monitoring.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Practical Byzantine Fault Tolerance
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
Agile Survivable Store PIs: Mustaque Ahamad, Douglas M. Blough, Wenke Lee and H.Venkateswaran PhD Students: Prahlad Fogla, Lei Kong, Subbu Lakshmanan,
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Byzantine fault tolerance
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Information Technology Needs and Trends in the Electric Power Business Mladen Kezunovic Texas A&M University PS ERC Industrial Advisory Board Meeting December.
Carnegie Mellon Increasing Intrusion Tolerance Via Scalable Redundancy Mike Reiter Natassa Ailamaki Greg Ganger Priya Narasimhan Chuck Cranor.
ICN Baseline Scenarios draft-pentikousis-icn-scenarios-04 K. Pentikousis (Ed.), B. Ohlman, D. Corujo, G. Boggia, G. Tyson, E. Davies, P. Mahadevan, S.
Geo-distributed Messaging with RabbitMQ
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
Rob Davidson, Partner Technology Specialist Microsoft Management Servers: Using management to stay secure.
UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department
1 INTRUSION TOLERANT SYSTEMS WORKSHOP Phoenix, AZ 4 August 1999 Jaynarayan H. Lala ITS Program Manager.
Slide 1 Security Engineering. Slide 2 Objectives l To introduce issues that must be considered in the specification and design of secure software l To.
Replication Improves reliability Improves availability ( What good is a reliable system if it is not available?) Replication must be transparent and create.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Fault Tolerance
+ Support multiple virtual environment for Grid computing Dr. Lizhe Wang.
Intrusion Tolerant Distributed Object Systems Joint IA&S PI Meeting Honolulu, HI July 17-21, 2000 Gregg Tally
Fail-Stop Processors UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau One paper: Byzantine.
Chapter 8 Fault Tolerance. Outline Introductions –Concepts –Failure models –Redundancy Process resilience –Groups and failure masking –Distributed agreement.
Chapter 1 Characterization of Distributed Systems
BChain: High-Throughput BFT Protocols
InGenius Connector Enterprise Microsoft Dynamics CRM
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Hybrid Management and Security
Summary of Sessions 2 and 3
Security Engineering.
Principles of Computer Security
Advanced Integration and Deployment Techniques
Exploring Azure Event Grid
Dev Test on Windows Azure Solution in a Box
Providing Secure Storage on the Internet
Principles of Computer Security
Replication Improves reliability Improves availability
Consistency and Replication
Technical Capabilities
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
Monitor VMware with SC2012 SP1 Operation Manager & Veeam Microsoft Tools for VMware Integration & Migration Symon Perriman Michael Stafford Senior.
Presentation transcript:

Fault and Intrusion Tolerant (FIT) Event Broker & BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo University of Lisboa, Portugal Meeting PT, November 27, 2012

Cloud Infrastructures Monitoring Tools and Control Engines Processing farm Storage farm Switching and Routing Control Events Control Events Control Events Alert! Cloud infrastructures are one of the new hot targets of attacks! Meeting PT / November 27, 20122

Example scenario: Portugal Telecom Cloud Computing Infrastructure SmartCloud product First and main problem: Centralized monitoring approach Diversity of monitoring tools ArchSight, Pulse, SCOM Meeting PT / November 27, Agentless Agent-Based Agent with ArchSight ArcSight (engine) Monitoring Probe Events ArcSight or other tool Problems: (a)faults and attacks; (b)diversity is hard to achieve in practice.

The TRONE approach Fault and Intrusion Tolerant (FIT) Event Broker Automated Failure Diagnosis Multi-homing for fast reconfiguration Meeting PT / November 27,

FIT Event Broker Goals and challenges Overarching goals: To provide support for trustworthy and resilient monitoring of cloud/datacenter infrastructures To achieve improved Quality of Protection without neglecting Quality of Service (performance) needs Some specific challenges: Deal with large flows of information (events) Support different kinds of events (e.g. different criticality) Low intrusiveness and easy integration 5Meeting PT / November 27, 2012

FIT Event Broker Assumptions System entities: Probes, event collectors/brokers, consoles Some event processing may be done by collectors Fully connected network E.g., all the entities lie in the same monitoring VLAN Partially synchronous system Clocks may be used to timestamp events Faults Some FIT brokers may crash or fail in a Byzantine way We do not require/enforce clients (probes/consoles) to be correct If this is a problem for monitoring, then it must also be solved 6Meeting PT / November 27, 2012

FIT Event Broker Baseline design options Topic-based Publish-Subscribe paradigm Good fit to considered scenarios State Machine Replication Active replication is better for Byzantine fault tolerance f out of n replicas of a FIT Broker may fail in a Byzantine way Public-key cryptography Client authentication, avoid attacks from malicious probes Event channels with support for QoP and QoS Differentiated fault-tolerance support (e.g. crash only or BFT) 7Meeting PT / November 27, 2012

FIT Event Broker High level architectural view Meeting PT / November 27, 20128

FIT Event Broker Interface 9Meeting PT / November 27, 2012 Create event channel In: TAG and CLASS Destroy event channel In: TAG Register to channel In: TAG Publish event In: EVENT Subscribe to channel In: TAG Receive event Out: EVENT

FIT Event Broker Internal state From the SMR perspective, it is important to identify the relevant state that needs to be maintained consistent across replicas Data related to the broker configuration Existing channels and their CLASS Registered publishers and subscribers Data related to events Events that are ready to be delivered 10 All client input that affects the state of the FIT broker state (e.g. channel and subscription data, some events) must be handled as a state machine command Meeting PT / November 27, 2012

BFT-SMaRt Overview Java-based platform for BFT SMR, available at Actively being developed and improved in our group BFT SMR “common” features State machine programming model n ≥ 3f+1 replicas required A small step away from being a commercial product Advanced features Replica recovery (state transfer) Reconfigurations Extensible API: e.g. custom voter Meeting PT / November 27,

BFT-SMaRt Service invocation Meeting PT / November 27, PROBE FIT Broker state Agreement on order performed by SMaRt

BFT-SMaRt Execution and response Meeting PT / November 27, Commands are delivered to the FIT broker, which updates the state/queues and replies Voting on client side

The FIT Broker is currently being implemented… …and integrated with BFT-SMaRt Evaluation: Throughput Aim is to deal with 40K events/sec Resilience Measure performance under attack Verify recovery and reconfiguration capabilities A simple demo is available Meeting PT / November 27, BFT-SMaRt Implementation & Evaluation

Preliminary results available [DAIS 2012] Meeting PT / November 27, Throughput for up to 100 channels

Summary FIT Event Broker – Event dissemination support For easier deployment of multiple monitoring tools Manage which events are propagated, to which consoles, with which QoS BFT-SMaRT – Byzantine fault tolerant replication First usable implementation of BFT replication Leading edge worldwide Resilience against malicious attacks with small overhead Portugal Telecom’s cloud infrastructure is being used as real use case for application and evaluation of the work Meeting PT / November 27,