PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee.

Slides:



Advertisements
Similar presentations
Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM.
Advertisements

IEEE INFOCOM 2004 MultiNet: Connecting to Multiple IEEE Networks Using a Single Wireless Card.
Reliable Multicast for Time-Critical Systems Mahesh Balakrishnan Ken Birman Cornell University.
Multicasting in Mobile Ad hoc Networks By XIE Jiawei.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
SIMPLE Presence Traffic Optimization and Server Scalability Vishal Kumar Singh Henning Schulzrinne Markus Isomaki Piotr Boni IETF 67, San Diego.
CS4550: Computer Networks II network layer basics 3 routing & congestion control.
802.11a/b/g Networks Herbert Rubens Some slides taken from UIUC Wireless Networking Group.
Ýmir Vigfússon IBM Research Haifa Labs Ken Birman Cornell University Qi Huang Cornell University Deepak Nataraj Cornell University.
CS 408 Computer Networks Congestion Control (from Chapter 05)
Chapter 10 Congestion Control in Data Networks1 Congestion Control in Data Networks and Internets COMP5416 Chapter 10.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
Local Area Networks LAN. Why LANs? Provide a means of DIRECT connection to other machines Manage access Provide reasonable performance Hopefully allow.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance August 2008 Hakim Weatherspoon, Lakshmi Ganesh, Tudor.
Data Communications Packet Switching.
1 Flexlab: A Realistic, Controlled, and Friendly Environment for Evaluating Networked Systems Jonathon Duerig, Robert Ricci, Junxing Zhang, Daniel Gebhardt,
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
SIMPLEStone – A presence server performance benchmarking standard SIMPLEStone – A presence server performance benchmarking standard Presented by Vishal.
Receiver-driven Layered Multicast Paper by- Steven McCanne, Van Jacobson and Martin Vetterli – ACM SIGCOMM 1996 Presented By – Manoj Sivakumar.
SEPT, 2005CSI Part 2.2 Protocols and Protocol Layering Robert Probert, SITE, University of Ottawa.
Practical TDMA for Datacenter Ethernet
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Smoke and Mirrors: Shadowing Files at a Geographically Remote Location Without Loss of Performance Hakim Weatherspoon Joint with Lakshmi Ganesh, Tudor.
Study on Power Saving for Cellular Digital Packet Data over a Random Error/Loss Channel Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer.
1 Chapter 16 Protocols and Protocol Layering. 2 Protocol  Agreement about communication  Specifies  Format of messages (syntax)  Meaning of messages.
Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.
7/26/ Design and Implementation of a Simple Totally-Ordered Reliable Multicast Protocol in Java.
Computer Networks with Internet Technology William Stallings
CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.
The Totem Single-Ring Ordering and Membership Protocol Y. Amir, L. E. Moser, P. M Melliar-Smith, D. A. Agarwal, P. Ciarfella.
Distributed systems (NET 422) Prepared by Dr. Naglaa Fathi Soliman Princess Nora Bint Abdulrahman University College of computer.
Teknik Routing Pertemuan 10 Matakuliah: H0524/Jaringan Komputer Tahun: 2009.
Slingshot: Time-Critical Multicast for Clustered Applications Mahesh Balakrishnan Stefan Pleisch Ken Birman Cornell University.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
Ασύρματες και Κινητές Επικοινωνίες Ενότητα # 11: Mobile Transport Layer Διδάσκων: Βασίλειος Σύρης Τμήμα: Πληροφορικής.
1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.
1 Protocols and Protocol Layering. 2 Protocol Agreement about communication Specifies –Format of messages –Meaning of messages –Rules for exchange –Procedures.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Measuring packet forwarding behavior in a production network Lars Landmark.
Manajemen Jaringan, Sukiswo ST, MT 1 Network Monitoring Sukiswo
William Stallings Data and Computer Communications
Youngstown State University Cisco Regional Academy
Congestion Control in Data Networks and Internets
Resilient Datacenter Load Balancing in the Wild
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Alternative system models
SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
Congestion Control, Internet transport protocols: udp
Transport Layer Unit 5.
11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.
Fault Tolerance Distributed Web-based Systems
Middleware for Fault Tolerant Applications
Congestion Control in Data Networks and Internets
Advanced Operating System
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Network Systems and Throughput Preservation
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Collection Tree Protocol
Congestion Control (from Chapter 05)
The Transport Layer Chapter 6.
Congestion Control (from Chapter 05)
Congestion Control (from Chapter 05)
Host and Small Network Relaying Howard C. Berkowitz
Caching 50.5* + Apache Kafka
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Presentation transcript:

PLATO: Predictive Latency- Aware Total Ordering Mahesh Balakrishnan Ken Birman Amar Phanishayee

Total Ordering a.k.a Atomic Broadcast delivering messages to a set of nodes in the same order messages arrive at nodes in different orders… nodes agree on a single delivery order messages are delivered at nodes in the agreed order

Modern Datacenters Applications E-tailers, Finance, Aerospace Service-Oriented Architectures, Publish- Subscribe, Distributed Objects, Event Notification… … Totally Ordered Multicast! Hardware Fast high-capacity networks Failure-prone commodity nodes

Total Ordering in a Datacenter Updates are Totally Ordered Replicated Service Totally Ordered Multicast is used to consistently update Replicated Services Latency of Multicast System Consistency Requirement: order multicasts consistently, rapidly, robustly

Multicast Wishlist Low Latency! High (stable) throughput Minimal, proactive overheads Leverage hardware properties HW Multicast/Broadcast is fast, unreliable Handle varying data rates Datacenter workloads have sharp spikes… and extended troughs!

State-of-the-Art Traditional Protocols Conservative Latency-Overhead tradeoff Example: Fixed Sequencer Simple, works well Optimistic Total Ordering: deliver optimistically, rollback if incorrect Why this works – No out-of-order arrival in LANs Optimistic total ordering for datacenters?

PLATO: Predictive Ordering In a datacenter, broadcast / multicast occurs almost instantaneously Most of the time, messages arrive in same order at all nodes. Some of the time, messages arrive in different orders at different nodes. Can we predict out-of-order arrival?

Reasons for Disorder: Swaps Out-of-order arrival can occur when the inter-send interval between two messages is smaller than the diameter of the network Typical Datacenter Diameter: microseconds

Reasons for Disorder: Loss Datacenter networks are over- provisioned Loss never occurs in the network Datacenter nodes are cheap Loss occurs due to end-host buffer overflows caused by CPU contention

Emulab Testbed (Utah)

Cornell Testbed

Disorder: Emulab3 At 2800 packets per sec, 2% of all packet pairs are swapped and 0.5% of packets are lost. Percentage of swaps and losses goes up with data rate

Disorder

Predicting Disorder Predictor: Inter-arrival time of consecutive packets into user-space Why? Swaps: simultaneous multicasts low inter-arrival time Loss: kernel buffer overflow sequence of low inter-arrival times

Predicting Disorder 95% of swaps and 14% of all pairs are within 128 µsecs Inter-arrival time of swaps Inter-arrival time of all pairs Cornell Datacenter, 400 multicasts/sec

Predicting Disorder

PLATO Design Heuristic: If two packets arrive within Δ µsecs, possibility of disorder PLATO Heuristic + Lazy Fixed Sequencer Heuristic works ~ zero (Δ) latency Heuristic fails fixed sequencer latency

PLATO Design API: optdeliver, confirm, revoke Ordering Layer: Pending Queue: Packets suspected to be out-of-order, or queued behind suspected packets Suspicious Queue: Packets optdelivered to the application, not yet confirmed

PLATO Design

Performance Fixed Sequencer PLATO At small values of Δ, very low latency of delivery but more rollbacks

Performance Latency of both Fixed- Sequencer and PLATO decreases as throughput increases

Performance Traffic Spike: PLATO is insensitive to data rate, while Fixed Sequencer depends on data rate

Performance Δ is varied adaptively in reaction to rollbacks Latency is as good as static Δ parameterization

Conclusion First optimistic total order protocol that predicts out-of-order delivery Slashes ordering latency in datacenter settings Stable at varying loads Ordering layer of a time-critical protocol stack for Datacenters