Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University.

Slides:



Advertisements
Similar presentations
Quality-of-Service Routing in IP Networks Donna Ghosh, Venkatesh Sarangan, and Raj Acharya IEEE TRANSACTIONS ON MULTIMEDIA JUNE 2001.
Advertisements

© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
Discretized Streams Fault-Tolerant Streaming Computation at Scale Matei Zaharia, Tathagata Das (TD), Haoyuan (HY) Li, Timothy Hunter, Scott Shenker, Ion.
Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker,
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
EIGRP routing protocol Omer ben-shalom Omer Ben-Shalom: Must show how EIGRP is dealing with count to infinity problem Omer Ben-Shalom: Must.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Fault-Tolerance in the Borealis Distributed Stream Processing System Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, and Michael Stonebraker MIT.
ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,
CS 582 / CMPE 481 Distributed Systems Fault Tolerance.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Traffic Engineering With Traditional IP Routing Protocols
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Muhammad Mahmudul Islam Ronald Pose Carlo Kopp School of Computer Science & Software Engineering Monash University, Australia.
Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
Scalable Distributed Stream Processing Presented by Ming Jiang.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 688/788 Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
QoS-Aware Path Protection in MPLS Networks Ashish Gupta Ashish Gupta Bijendra Jain Indian Institute of Technology Delhi Satish Tripathi University of California.
Reliable, Robust Data Collection in Sensor Networks Murali Rangan Russell Sears Fall 2005 – Sensornet.
Internet Infrastructure and Pricing. Internet Pipelines Technology of the internet enables ecommerce –Issues of congestion and peak-load pricing –Convergence.
The Design of the Borealis Stream Processing Engine CIDR 2005 Brandeis University, Brown University, MIT Kang, Seungwoo Ref.
Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.
A new model and architecture for data stream management.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
 High-Availability Cluster with Linux-HA Matt Varnell Cameron Adkins Jeremy Landes.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
A new model and architecture for data stream management.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Group Communication Theresa Nguyen ICS243f Spring 2001.
Contract-Based Load Management in Federated Distributed Systems Magdalena Balazinska, Hari Balakrishnan and Mike Stonebraker (MIT) NSDI 2004 (presented.
1 Dynamic RWA Connection requests arrive sequentially. Setup a lightpath when a connection request arrives and teardown the lightpath when a connection.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
PeerNet: Pushing Peer-to-Peer Down the Stack Jakob Eriksson, Michalis Faloutsos, Srikanth Krishnamurthy University of California, Riverside.
Spring Routing: Part I Section 4.2 Outline Algorithms Scalability.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
Relying on Safe Distance to Achieve Strong Partitionable Group Membership in Ad Hoc Networks Authors: Q. Huang, C. Julien, G. Roman Presented By: Jeff.
Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Cassandra The Fortune Teller
TensorFlow– A system for large-scale machine learning
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Network Load Balancing
A paper on Join Synopses for Approximate Query Answering
PREGEL Data Management in the Cloud
Physical Database Design for Relational Databases Step 3 – Step 8
Chapter 19: Distributed Databases
Internet Networking recitation #12
Plethora: Infrastructure and System Design
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Supporting Fault-Tolerance in Streaming Grid Applications
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Deterministic and Semantically Organized Network Topology
by Mikael Bjerga & Arne Lange
Overview Multimedia: The Role of WINS in the Network Infrastructure
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

Load Management and High Availability in Borealis Magdalena Balazinska, Jeong-Hyon Hwang, and the Borealis team MIT, Brown University, and Brandeis University Borealis is a distributed stream processing system (DSPS) based on Aurora and Medusa Contract-Based Load Management HA Semantics and Algorithms Network Partitions Approach: 1 - Offline, participants negotiate and establish bilateral contracts that: Fix or tightly bound price per unit-load Are private and customizable (e.g., performance, availability guarantees, SLA) Properties: Simple, efficient, and low overhead (provable small bounds) Provable incentives to participate in mechanism Experimental result: A small number of contracts and small price-ranges suffice to achieve acceptable allocation A C Approach: Favor availability. Use updates to achieve consistency Use connection points to create replicas and stream versions Downstream nodes Monitor upstream nodes Reconnect to available upstream replica Continue processing with minimal disruptions Goal: Handle network partitions in a distributed stream processing system p p [p,p+e] 0.8p B B Contract at p Convex cost function Offered load (msgs/sec) Total cost (delay, $) Task t moves from A to B if: unit MC task t > p, at A unit MC task t < p, at B B AC ACK Trim Upstream backup lowest runtime overhead B AC B Replay Active Standby shortest recovery time B AC B ACK Trim Passive Standby most suitable for precise recovery Goal: Streaming applications can tolerate different types of failure recovery: Gap recovery: may lose tuples Rollback recovery: produces duplicates but does not lose tuples Precise recovery: takes over precisely from the point of failure Repeatable Convergent Deterministic Filter, Map, Join BSort, Resample, Aggregate Union, operators with timeouts B AC B ACK Checkpoint D A CB Goals: Manage load through collaborations between autonomous participants Ensure acceptable allocation where each nodes load is below threshold Participant Contract specifying that A will pay C, $p per unit of load Challenges: Operator and processing non-determinism 2 - At runtime, Load moves only between participants that have a contract Movements are based on marginal costs: Each participant has a private convex cost function Load moves when its cheaper to pay partner than to process locally Challenges: Incentives, efficiency, and customization Arbitrary load(t) MC(t) at A Challenges: Maximize availability Minimize reprocessing Maintain consistency MC(t) at B

Load Management Demonstration Setup A CB D 2) As node A becomes overloaded it sheds load to its partners B and C until system reaches acceptable allocation A CB 0.8p 3) Load increases at node B causing system overload 4) Node D joins the system. Load flows from node B to C and C to D until the system reaches acceptable allocation All nodes process a network monitoring query over real traces of connection summaries Group by IP count 60s Group by IP count distinct port 60s Filter > 10 Filter > 100 Group by IP prefix, sum 60s Filter > 100 Connection information Clusters of IPs that establish many connections T F A CB pp p 1) Three nodes with identical contracts and uneven initial load distribution Acceptable allocation Node A overloaded A sheds load to B then to C Acceptable allocation System overload Node D joins Load flows from C to D and from B to C A B C C B D IPs that establish many connections IPs that connect over many ports Query: Count the connections established by each IP over 60 sec and the number of distinct ports to which each IP connected

High Availability Demonstration Setup Passive Standby 1) The four primaries, B0, C0, D0, and E0 run on one laptop Identical queries traverse nodes that use different high availability approaches 3) We compare the runtime overhead of the approaches A B0 B1 C0 C1 D0 D1 E0 E1 Active Standby Upstream Backup Upstream Backup & Duplicate Elimination B0 C0 D0 E0 2) All other nodes run on the other laptop 4) We kill all primaries at the same time 5) We compare the recovery time and the effects on tuple delay and duplication Statically assigned secondary Tuples received E2E delay Failure Duplicate tuples Failure Active standby has highest runtime overhead Upstream backup has highest overhead during recovery Passive standby adds most end-to-end delay Passive Standby Active Standby UB no dups Upstream Backup

Network Partition Demonstration Setup 2) We unplug the cable connecting the laptops 3) Node C detects that node B has become unreachable 1) The initial query distribution crosses computer boundaries AC Laptop 2 Laptop 1 R B 4) Node C identifies node R as reachable alternate replica: Output stream has the same name but a different version 5) Node C connects to node R and continues processing from the same point on the stream 6) Node C changes the version of its output stream 7) When partition heals, node C remains connected to R and continues processing uninterrupted End-to-end tuple delay increases while C detects the network partition and re-connects to R End-to-end tuple delay Sequence nb of received tuples Tuples received through B Tuples received through R No duplications and no losses after network partitions