Availability in Wide-Area Service Composition Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley.

Slides:



Advertisements
Similar presentations
Availability and Performance in Wide-Area Service Composition Bhaskaran Raman EECS, U.C.Berkeley July 2002.
Advertisements

Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.
Review of a research paper on Skype
Lecture 9 Overview. Hierarchical Routing scale – with 200 million destinations – can’t store all dests in routing tables! – routing table exchange would.
Wide-Area Service Composition: Evaluation of Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Provider Q Texttoaudio Provider R.
Configurable restoration in overlay networks Matthew Caesar, Takashi Suzuki.
Small-world Overlay P2P Network
15-441: Computer Networking Lecture 26: Networking Future.
Towards More Adaptive Internet Routing Mukund Seshadri Prof. Randy Katz.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Informed Detour Selection Helps Reliability Boulat A. Bash.
CSE331: Introduction to Networks and Security Lecture 9 Fall 2002.
Resilient Overlay Networks David Anderson, Hari Balakrishnan, Frank Kaashoek and Robert Morris. MIT Laboratory for Computer Science
Wide-Area Service Composition: Availability, Performance, and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley SAHARA Retreat, Jan 2002.
Exploring Tradeoffs in Failure Detection in P2P Networks Shelley Zhuang, Ion Stoica, Randy Katz Sahara Retreat June 4-6, 2003.
Configurable restoration in overlay networks Matthew Caesar, Takashi Suzuki.
E2E Routing Behavior in the Internet Vern Paxson Sigcomm 1996 Slides are adopted from Ion Stoica’s lecture at UCB.
Internet-Scale Research at Universities Panel Session SAHARA Retreat, Jan 2002 Prof. Randy H. Katz, Bhaskaran Raman, Z. Morley Mao, Yan Chen.
The SAHARA Four-Layer Model; Case-studies in Composition
Routing.
RRAPID: Real-time Recovery based on Active Probing, Introspection, and Decentralization Takashi Suzuki Matthew Caesar.
14 – Inter/Intra-AS Routing
Tesseract A 4D Network Control Plane
Problem Definition Data path –Created by the Automatic Path Creation (APC) component –Service: program with well-defined interface –Operator: stateless.
Using Fault Model Enforcement (FME) to Improve Availability EASY ’02 Workshop Kiran Nagaraja, Ricardo Bianchini, Richard Martin, Thu Nguyen Department.
Scalable Construction of Resilient Overlays using Topology Information Mukund Seshadri Dr. Randy Katz.
1 ECE453 – Introduction to Computer Networks Lecture 10 – Network Layer (Routing II)
ROUTING ON THE INTERNET COSC Aug-15. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT Laboratory for Computer Science
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
14 – Inter/Intra-AS Routing Network Layer Hierarchical Routing scale: with > 200 million destinations: can’t store all dest’s in routing tables!
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
An Architecture for Optimal and Robust Composition of Services across the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
1 Controlling IP Spoofing via Inter-Domain Packet Filters Zhenhai Duan Department of Computer Science Florida State University.
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
A Framework for Highly-Available Cascaded Real-Time Internet Services Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001 Examination Committee:
A Framework for Highly-Available Session-Oriented Internet Services Bhaskaran Raman, Prof. Randy H. Katz {bhaskar, The ICEBERG Project.
1 Resilient and Coherence Preserving Dissemination of Dynamic Data Using Cooperating Peers Shetal Shah, IIT Bombay Kirthi Ramamritham, IIT Bombay Prashant.
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley.
Performance and Availability in Wide-Area Service Composition Bhaskaran Raman ICEBERG, EECS, U.C.Berkeley Presentation at Siemens, June 2001.
1 Internet Routing. 2 Terminology Forwarding –Refers to datagram transfer –Performed by host or router –Uses routing table Routing –Refers to propagation.
A comparison of overlay routing and multihoming route control Hayoung OH
Wide-Area Service Composition: Performance, Availability and Scalability Bhaskaran Raman SAHARA, EECS, U.C.Berkeley Presentation at Ericsson, Jan 2002.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Network Layer4-1 Intra-AS Routing r Also known as Interior Gateway Protocols (IGP) r Most common Intra-AS routing protocols: m RIP: Routing Information.
1 Computer Communication & Networks Lecture 21 Network Layer: Delivery, Forwarding, Routing Waleed.
Routing Networks and Protocols Prepared by: TGK First Prepared on: Last Modified on: Quality checked by: Copyright 2009 Asia Pacific Institute of Information.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
COS 420 Day 15. Agenda Finish Individualized Project Presentations on Thrusday Have Grading sheets to me by Friday Group Project Discussion Goals & Timelines.
Transport Layer3-1 Network Layer Every man dies. Not every man really lives.
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
A Framework for Composing Services Across Independent Providers in the Wide-Area Internet Bhaskaran Raman Qualifying Examination Proposal Feb 12, 2001.
ROUTING ON THE INTERNET COSC Jun-16. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services Ming Zhang, Chi Zhang Vivek Pai, Larry Peterson, Randy Wang Princeton.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
A Comparison of Overlay Routing and Multihoming Route Control
Routing.
CPE 401/601 Computer Network Systems
Lecture 6 Overlay Networks
COS 461: Computer Networks
Lecture 6 Overlay Networks
Routing.
Presentation transcript:

Availability in Wide-Area Service Composition Bhaskaran Raman and Randy H. Katz SAHARA, EECS, U.C.Berkeley

10% of paths have only 95% availability Problem Statement Poor availability of wide-area (inter-domain) Internet paths BGP recovery can take several 10s of seconds [Labovitz, FTCS’99] [Labovitz, SIGCOMM’00]

Architecture Composed services Hardware platform Peering relations, Overlay network Service clusters Logical platform Application plane Service cluster: compute cluster capable of running services Internet Peering: exchange perf. info. Destination Source Finding Overlay Entry/Exit Location of Service Replicas Service-Level Path Creation, Maintenance, and Recovery Link-State Propagation At-least -once UDP Perf. Meas. Liveness Detection Functionalities at the Cluster-Manager

“Failure” detection in the Wide-Area Two important characteristics: –Distbn. of outage periods –Rate of occurrence Wide-Area traces –12 pairs of hosts: Berkeley, Stanford, UIUC, CMU, TU-Berlin, UNSW –300ms heart-beat Time Timeout period Timeout for failure detection Approx. 2sec timeout Low rate of occurrence (once an hour) Good for many real-time applications

Key Design Points Overlay size: how many nodes? –A comparison: Akamai cache servers –O(10,000) servers for Internet-wide operation –Probably a lesser number of data-center locations Link-state floods: –Twice for each failure –For a 1,000-node graph; estimate #edges = 10,000 –Failures (>1.8 sec outage): O(once an hour) in the worst case –Only about 6 floods/second in the entire network! Graph computation: –Modified version of Dijkstra’s for service composition –O(k*E*log(N)) computation time; k = #services composed –For 6,510-node network, this takes 50ms –Huge overhead, but: path caching helps –Memory: a few MB

Wide-Area experiments: setup 8 nodes: –Berkeley, Stanford, UCSD, CMU –Cable modem (Berkeley) –DSL (San Francisco) –UNSW (Australia), TU-Berlin (Germany) Text-to-speech composed sessions –Half with destinations at Berkeley, CMU –Half with recovery algo enabled, other half disabled –4 paths in system at any time –Duration of session: 2min 30sec –Run for 4 days Metric: loss-rate measured in 5sec intervals

Loss-rate for a pair of paths

CDF of loss-rates of all paths failed

CDF of gaps seen at client

Summary Failure detection makes sense in ~2sec –Improvement in availability for real-time applications –Text-to-speech composed application About sec recovery time –2,000ms failure detection timeout –1,000ms recovery signaling –500-1,000ms state restoration (re-process current text sentence) –Of the 2,872 paths, 18 were recovered (0.63%) –Availability: Number of 5sec periods with >10% outage: Other issues: stability, scaling, load-balancing –Studied using Millennium emulation platform Availability % Table Day 1, Dest= Berk Day 1, Dest= CMU Day 2, Dest= Berk Day 2, Dest= CMU Day 3, Dest= Berk Day 3, Dest= CMU Day 4, Dest= Berk Day 4, Dest= CMU No recovery With recovery