PrincetonUniversity Towards Predictable Multi-Tenant Shared Cloud Storage David Shue*, Michael Freedman*, and Anees Shaikh ✦ *Princeton ✦ IBM Research.

Slides:



Advertisements
Similar presentations
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
Advertisements

Hadi Goudarzi and Massoud Pedram
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Winter 2004 UCSC CMPE252B1 CMPE 257: Wireless and Mobile Networking SET 3f: Medium Access Control Protocols.
Sharing Cloud Networks Lucian Popa, Gautam Kumar, Mosharaf Chowdhury Arvind Krishnamurthy, Sylvia Ratnasamy, Ion Stoica UC Berkeley.
PrincetonUniversity Performance Isolation and Fairness for Multi-Tenant Cloud Storage David Shue*, Michael Freedman*, and Anees Shaikh ✦ *Princeton ✦ IBM.
1 Traffic Engineering (TE). 2 Network Congestion Causes of congestion –Lack of network resources –Uneven distribution of traffic caused by current dynamic.
Cloud Control with Distributed Rate Limiting Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren University of California,
Page 15/4/2015 CSE 30341: Operating Systems Principles Allocation of Frames  How should the OS distribute the frames among the various processes?  Each.
CSE331: Introduction to Networks and Security Lecture 13 Fall 2002.
CS 4700 / CS 5700 Network Fundamentals Lecture 12: Router-Aided Congestion Control (Drop it like it’s hot) Revised 3/18/13.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.
Receiver-driven Layered Multicast S. McCanne, V. Jacobsen and M. Vetterli SIGCOMM 1996.
SLA-aware Virtual Resource Management for Cloud Infrastructures
Differentiated Services. Service Differentiation in the Internet Different applications have varying bandwidth, delay, and reliability requirements How.
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Analysis and Simulation of a Fair Queuing Algorithm
Congestion Control and Resource Allocation
Fair Scheduling in Web Servers CS 213 Lecture 17 L.N. Bhuyan.
ACN: Congestion Control1 Congestion Control and Resource Allocation.
Computer Networking Lecture 17 – Queue Management As usual: Thanks to Srini Seshan and Dave Anderson.
Dynamic routing – QoS routing Load sensitive routing QoS routing.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Lecture 5: Congestion Control l Challenge: how do we efficiently share network resources among billions of hosts? n Last time: TCP n This time: Alternative.
Congestion Control for High Bandwidth-delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs.
Multipath Protocol for Delay-Sensitive Traffic Jennifer Rexford Princeton University Joint work with Umar Javed, Martin Suchara, and Jiayue He
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Network Sharing Issues Lecture 15 Aditya Akella. Is this the biggest problem in cloud resource allocation? Why? Why not? How does the problem differ wrt.
Sharing the Data Center Network Alan Shieh, Srikanth Kandula, Albert Greenberg, Changhoon Kim, Bikas Saha Microsoft Research, Cornell University, Windows.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Key Perf considerations & bottlenecks Windows Azure VM characteristics Monitoring TroubleshootingBest practices.
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Network Aware Resource Allocation in Distributed Clouds.
Advance Computer Networking L-5 TCP & Routers Acknowledgments: Lecture slides are from the graduate level Computer Networks course thought by Srinivasan.
Opportunistic Traffic Scheduling Over Multiple Network Path Coskun Cetinkaya and Edward Knightly.
1 Admission Control and Request Scheduling in E-Commerce Web Sites Sameh Elnikety, EPFL Erich Nahum, IBM Watson John Tracey, IBM Watson Willy Zwaenepoel,
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jiayue He, Rui Zhang-Shen, Ying Li, Cheng-Yen Lee, Jennifer Rexford, and Mung.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
Qos support and adaptive video. QoS support in ad hoc networks MAC layer techniques: – e - alternation of contention based and contention free periods;
Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Resource Allocation in Network Virtualization Jie Wu Computer and Information Sciences Temple University.
PrincetonUniversity From application requests to Virtual IOPs: Provisioned key-value storage with Libra David Shue * and Michael J. Freedman (*now at Google)
1 Fair Queuing Hamed Khanmirza Principles of Network University of Tehran.
Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.
TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL nms.csail.mit.edu/~dina.
R2C2: A Network Stack for Rack-scale Computers Paolo Costa, Hitesh Ballani, Kaveh Razavi, Ian Kash Microsoft Research Cambridge EECS 582 – W161.
sRoute: Treating the Storage Stack Like a Network
VL2: A Scalable and Flexible Data Center Network
Architecture and Algorithms for an IEEE 802
ECE 544: Traffic engineering (supplement)
Mean Value Analysis of a Database Grid Application
Congestion Control and Resource Allocation
TCP Congestion Control
Load Management in Distributed Video Servers
TCP, XCP and Fair Queueing
Be Fast, Cheap and in Control
Specialized Cloud Architectures
Congestion Control and Resource Allocation
Towards Predictable Datacenter Networks
Approximate Mean Value Analysis of a Database Grid Application
Data Center Traffic Engineering
Presentation transcript:

PrincetonUniversity Towards Predictable Multi-Tenant Shared Cloud Storage David Shue*, Michael Freedman*, and Anees Shaikh ✦ *Princeton ✦ IBM Research

2 Shared Services in the Cloud Z Y T F Z Y F T S3EBS SQ S DynamoDB

3 Shared Services in the Cloud Z Y T F Z Y F T DD

4 Shared Storage Service Co-located Tenants Contend For Resources ZYTFZYFTYYZFFF 2x demand

5 S3 Co-located Tenants Contend For Resources ZYTFZYFTYYZFFF DD 2x demand Tenants contend for resources on co-located service nodes DynamoDB 2x demand

6 DD Shared Storage Service Co-located Tenants Contend For Resources YFYFYYFFF 2x demand ZYTFZYFTYYZFFF

7 Shared Service = Shared Resources Z Y T F Z YF T YY Z FFF DD 2x throughput? Tenants contend for resources on co-located service nodes 2x demand

8 DD Shared Storage Service Co-located Tenants Contend For Resources 2x demand YFYFYYFFF ZYTFZYFTYYZFFF Resource contention = unpredictable performance

9 Unfettered access leads to unpredictable performance Shared Service = Performance Volatility Z Y T F Z YF T YY Z FFF DD latency spikes for other tenants? 2x demand 2x throughput?

10 Fairness and Isolation: per-VM/Server Non-uniform demand across servers Per-VM state (queue) explosion Overly complex per-VM weight tuning S3 ZYTFZYFTYYZFFF Non-uniform demand across tenant VM’s wzwz wywy wtwt wfwf wzwz wzwz wywy wywy wywy wtwt wfwf wfwf wfwf wfwf

11 Towards Predictable Shared Services Z Y T F Z YF T YY Z FFF DD Per-tenant resource allocation

12 Towards Predictable Shared Services Inter-tenant performance isolation Z Y T F Z YF T YY Z FFF DD Per-tenant resource allocation

13 Unpredictable Cloud Early S3, SQS evaluation found high variability in throughput (double access test) - Harvard TR SLA’s are in terms of availability only, not performance No API for specifying level of service Network variance rampant (Oktopus)

14 Fairness and Isolation: per-VM/Server Non-uniform demand across tenant VM’s wzwz S3 ZYTFZYFTYYZFFF wywy wtwt wfwf wzwz wzwz wywy wywy wywy wtwt wfwf wfwf wfwf wfwf

15 Network Resource Sharing Seawall Netshare GateKeeper Oktopus Cloud Police/FairCloud etc..

16 ZYTFZYFTYYZFFFZYTFZYFTYYZFFF DD SS Towards Predictable Shared Cloud Storage Shared Storage Service Per-tenant Resource Allocation and Performance Isolation

17 ZyngaYelpFoursquareTP Shared Storage Service Towards Predictable Shared Cloud Storage ZYTFZYFTYYZFFF SS 80 kreq/s120 kreq/s160 kreq/s 40 kreq/s Hard limits are too restrictive, achieve lower utilization ≤

18 ZyngaYelpFoursquareTP Shared Storage Towards Predictable Shared Cloud Storage w z = 20%w y = 30%w f = 40%w t = 10% demand z = 40% rate z = 30% demand f = 30% ZYTFZYFTYYZFFF SS Goal: per-tenant max-min fair share of system resources ≥

19 ZyngaYelpFoursquareTP Shared Service Towards Predictable Shared Cloud Storage w z = 80 kreq/s w y = 120 kreq/s w f = 160 kreq/s w t = 40 kreq/s demand z = 160 kq/s rate z = 120 kq/s demand f = 120 kq/s Per-tenant max-min fair share of system-wide resources Z Y T F Z YF T YY Z FFF DD

20 Shared Service Access Parda - per-node (hypervisor) performance isolation for SAN storage systems via congestion control request scheduling (aggregate of VM’s, no notion of tenant shares) M-clock - per-VM weighted (STFQ) I/O scheduling with reservations and limits on a single physical node (per-VM shares, but limited to a single node) Stout - batch-based congestion control access to back- end DB for performance isolation (single tenant, no weighting) Database QoS (single node?)

21 Goal: Multi-tenant Service Assurance Provide multi-tenant fairness and performance isolation - Allocate per-tenant fair share of total system resources - Minimize overhead and maintain system throughput Support service differentiation - Service-assured (weight per tenant) - Best-effort (weight per class)

22 Towards Predictable Shared Cloud Storage PISCES: Predictable Shared Cloud Storage - Allocates per-tenant fair share of total system resources - Isolates tenant request performance - Minimizes overhead and preserves system throughput PISCES Mechanisms - Partition Placement - Weight Allocation - Replica Selection - Fair Queuing

23 PISCES: Predictable Shared Cloud Storage PISCES Goals - Allocate per-tenant fair share of total system resources - Isolate tenant request performance - Minimize overhead and preserve system throughput PISCES Mechanisms minutessecondsmicroseconds Partition Placement (Migration) Local Weight Allocation Replica Selection Fair Queuing

24 Tenant ATenant BTenant C PISCES Node 2 PISCES Node 3 PISCES Node 4 VM PISCES Node 1 weight A weight B weight C Tenant D VM weight D Place Partitions By Fairness Constraints A keyspace B C D keyspace partition popularity

25 Controller PP Place Partitions By Fairness Constraints Rate A < w A Rate B < w B Rate C < w C Compute feasible partition placement 25 Over- loaded

26 Controller PP Place Partitions By Fairness Constraints Compute feasible partition placement

Node 2Node 3 Node 4 Node 1 12 Tenant A rate =17.5 Tenant B rate =17.5 Tenant C rate =17.5 w a1 = w b1 Tenant D rate =17.5 w c2 = w d2 w a3 = w b3 w c4 = w d Tenant A rate = 30 Tenant B rate = 30 Tenant C rate = 20 w a1 = w b1 Tenant D rate = 20 w c2 = w d2 w a3 = w b3 w c4 = w d4 Partitions are properly placed, but 2x demand from A and B violates the shares for C and D if local weights are mismatched to demand Node 2Node 3 Node 4 Node 1 20

28 Node 1 Partition Placement: feasible fair sharing Partition Placement Controller Update per- partition tenant demand Compute feasible partition placement according to per-tenant weighted max-min fair share Node 2Node 3Node 4 Resolve overload PP WA RS FQ

29 Partition Placement Controller Map and migrate partitions Partition Placement: feasible fair sharing Node 2Node 3Node 4Node 1 Compute feasible partition placement according to per-tenant weighted max-min fair share

30 Estimate and compute Z t Solve a Z t constrained optimization problem Alternatively, sample the mapping configuration space Migrate (spin up) replicas and allow the system to reach a new equilibrium (average demand) Sum of weights and proportions Randomized initial placement with avg demand/object size

31 PISCES Mechanisms Partition Placement (Migration) minutessecondsmicroseconds Place partitions by fairness constraints Timescale Mechanism Feasible Global Allocation Locally Fair Allocation Load Balancing (relaxation) Fairness and Isolation Controller

32 VM wAwA wBwB wCwC wDwD Allocate Local Weights By Tenant Demand wAwA wBwB wCwC wDwD === w a2 w b2 w c2 w d2 w a3 w b3 w c3 w d3 w a4 w b4 w c4 w d4 32 w a1 w b1 w c1 w d1 R A > w A R B < w B R C < w C R D > w D

33 Swap weights to minimize tenant latency (demand) Allocate Local Weights By Tenant Demand WA Controller VM wAwA wBwB wCwC wDwD A→CA→C C→AC→A D→BD→BC→DC→DB→CB→C

13 Tenant A rate = 30 Tenant B rate = 30 Tenant C rate = 20 w a1 = w b1 Tenant D rate = 20 w c2 = w d2 w a3 = w b3 w c4 = w d4 Partitions are properly placed, but 2x demand from A and B violates the shares for C and D if local weights are mismatched to demand Node 2Node 3 Node 4 Node 1 Tenant A rate = 30 Tenant B rate = 30 Tenant C rate = 20 w a1 = w b1 Tenant D rate = 20 w c2 = w d2 w a3 = w b3 w c4 = w d4 Partitions are properly placed, but 2x demand from A and B violates the shares for C and D if local weights are mismatched to demand Node 2Node 3 Node 4 Node 1 Tenant A rate = 30 Tenant B rate = 30 Tenant C rate = 20 w a1 = w b1 Tenant D rate = 20 w c2 = w d2 w a3 = w b3 w c4 = w d4 Partitions are properly placed, but 2x demand from A and B violates the shares for C and D if local weights are mismatched to demand Node 2Node 3 Node 4 Node 1

35 Node 1 Queue Weight Allocation: minimize latency A B C per-tenant weights Weight Allocation Controller Update per-tenant demand and rate

36 Queue Weight Allocation: minimize latency A B C B C 11 Shift weight from a less loaded tenant (A) to the max latency tenant (B at Node 1) and swap at a different node (2) swap weights A Node Node Weight Allocation Controller

37 Queue Weight Allocation: minimize latency A B C B C 1 1 A B C A 1 Node Node Node Weight Allocation Controller Optimal multilateral exchange = Maximum bottleneck flow

38 Multilateral exchange: maximum bottleneck flow B max B AC 1: 1: 1 2: 1 3: : : 1

39 Queue Weight Allocation Python controller - query memcached stats to determine partition demand and DWRR “rate” - med time-scale: compute and actuate max-latency queue weight swap - long time-scale: remap partitions based on weighted max-min fair share, tenant demands, and server capacity

40 Tenant ATenant BTenant C Node 2Node 3 Node 4 VM Node 1 p p p p p p p p p p p w a1 w b1 w c1 weight A weight B weight C Tenant D VM weight D p p p w d1 w a2 w b2 w c2 w d2 w a3 w b3 w c3 w d3 w a4 w b4 w c4 w d4 Controller Allocate Local Weights By Tenant Demand PPWA Shift weight from a less loaded tenant (A) to the max latency tenant (C at Node 1) and swap at a different node (2)

41 Achieving System-wide Fairness minutessecondsmicroseconds Place partitions by fairness constraints Timescale Mechanism Local Weight Allocation Allocate weights by tenant demand Feasible Global Allocation Locally Fair Allocation Load Balancing (relaxation) Fairness and Isolation Controller Partition Placement (Migration)

42 Select Replicas By Local Weight VM wAwA wBwB wDwD 42 RR 1/2 VM wCwC GET C over- loaded C under- utilized RS Select replicas based on node latency GET

43 Select Replicas By Local Weight VM wAwA wBwB wDwD RR VM wCwC RS Select replicas based on node latency 1/32/3

14 Tenant A rate =17.5 Tenant B rate = 25 Tenant C rate = Tenant D rate = / /2 10 Even with a feasible placement and demand- aligned local weights, without weight-sensitive replica selection, A wastes capacity at Node 1 Node 2Node 3Node 4Node 1 RR

45 Replica Selection: maximize throughput N1: 33% N2: 67% A B C B A RR C GET Node 1Node 2 Tenant B Round robin/uniform Shortest queue first Latency proportional Windowed Proper load balancing requires (implicit) coordination and control

46 Distributed Replica Selection Spymemcached client library/Moxi proxy - explicit partition mapping retrieved from controller - request latency estimation (EWMA) - replica selection: sampling distribution based on per- tenant node latencies - alternatively: use explicit queue-length feedback (on ack) and shortest-queue first policy

47 Replica Selection Policies Request Router tracks per-tenant per-node latencies Latency info shared by all partitions Shortest-queue first selection policy Latency-proportional selection policy FAST-TCP inspired congestion control (windowed) selection policy

48 Achieving System-wide Fairness minutessecondsmicroseconds Place partitions by fairness constraints Timescale Mechanism Local Weight Allocation Allocate weights by tenant demand Replica Selection Select replicas by local weight Feasible Global Allocation Locally Fair Allocation Load Balancing (relaxation) Fairness and Isolation Partition Placement (Migration) Controller RR R...

49 Fair Queuing Queue Tenants By Dominant Resource outinreq outinreq Bandwidth limitedRequest Limited Out bytes fair sharing VM wAwA wBwB

50 Queue Tenants By Dominant Resource out inreq outinreq Fair Queuing Bandwidth limitedRequest Limited Dominant resource fair sharing Shared out bytes bottleneck VM wAwA wBwB

15 Tenant A out = 50% Tenant B req= 62. 5% out inreq outinreq Tenant A out = 55% Tenant B req=55% out inreq outinreq Tenant workloads may have different resource bottlenecks, leading to unfairness Instead, tenants shares should be relative to their dominant resource

Tenant A conn 1 work er threa d work er threa d conn 2 Tenant B Tenant A conn 1 work er threa d work er threa d conn 2 Tenant B Tenant A conn 1 conn 2 Active Queue work er threa d work er threa d Per-request DWRR Network I/O Layer Request Processing Layer Tenant B Per-connection DWRR Non-Blocking DWRR Work Stealin g Active Queue 52 Getting Fair-queuing Right

53 Weighted Per-tenant Request Queueing Membase server - Multi-tenant capable: bucket per tenant with partition mapping - Request queueing: servicing connections (network I/O) - Memory pressure: allocate object memory according to working set size and desired weight allocation - Disk queuing: back-end I/O dispatcher for writes and non-resident reads

54 Tenant ATenant BTenant C Node 2Node 3 Node 4 VM Node 1 p p p p p p p RR p p p p w a1 w b1 GET w c1 weight A weight B weight C Tenant D VM weight D p p p w d1 w a2 w b2 w c2 w d2 w a3 w b3 w c3 w d3 w a4 w b4 w c4 w d4 Controller 2 PPWAFQ RS PISCES Architecture

55 An Abstract (network) Model Logical Src Logical Link Logical Dst Multipath TCP or QoS Routing Tenant flow QoS Tenant Route Selection Partition 2 Partition 0 Partition 1 Partition 2 Partition 0 Partition 1 Partition 2 Partition 0 Partition 1 Tenant A Tenant B Tenant C Service Node 1 Service Node 3 Service Node 2 Service Node 4

56 Model Properties that Convolute the Problem Partitioned data - multiple vs. single destination (single flow per-tenant) Replicated partitions - multipath vs. single path (strict partition placement) Skewed partition demand - varying vs. uniform (local allocation = global allocation) Contention at service nodes - shared bottleneck links vs. tenant-disjoint paths (HWFQ)

57 Achieving System-wide Fairness minutessecondsmicroseconds Place partitions by fairness constraints Timescale Mechanism Local Weight Allocation Allocate weights by tenant demand Replica Selection Select replicas by local weight Fair Queuing Queue tenants by dominant resource Feasible Global Allocation Locally Fair Allocation Load Balancing (relaxation) Fairness and Isolation Partition Placement (Migration) Controller RR R N... N

58 PISCES Mechanisms Partition Placement (Migration) Local Weight Allocation Replica Selection minutessecondsmicroseconds Fair Queuing feasible fair allocation demand- matched weight allocation weight-sensitive request distribution dominant resource scheduling and isolation MechanismPurpose

59 Evaluating Service Assurance Partition Placement (Migration) Queue Weight Allocation Replica Selection Policy minutessecondsmicroseconds Weighted Fair Queueing (Future work) Matches demands to minimize latency Balances load to maximize throughput and enhance fairness Enforces throughput fairness and latency isolation MechanismEvaluation Feasible Well-matched Load-balanced Fair Allocation

60 Example Shared Service: Membase Membase: persistent key-value object store - Open source project based on memcached - Multi-tenant aware object store with persistence (SQLite) - Virtual bucket (partition) mapping - Moxi proxy request router (spymemcached client lib) - C/C++ memcached server and persistence engine - Erlang-based controller using memcached API protocols

61 Does PISCES achieve system-wide fairness? Does PISCES provide performance isolation? Can PISCES adapt to shifting tenant distributions? Evaluation YCSB 1 ToR Switch YCSB 2 PISCE S 6 PISCE S 8 PISCE S 5 PISCE S 7 Gigabit Ethernet YCSB 3 YCSB 8 YCSB 7 YCSB 4 YCSB 5 YCSB 6 PISCE S 1 PISCE S 2 PISCE S 3 PISCE S 4

62 Does PISCES achieve system-wide fairness? Does PISCES provide performance isolation? Can PISCES handle heterogenous workloads? Can PISCES adapt to shifting tenant distributions? Evaluation YCSB 1 ToR Switch YCSB 2 PISCE S 6 PISCE S 8 PISCE S 5 PISCE S 7 GigE YCSB 3 YCSB 8 YCSB 7 YCSB 4 YCSB 5 YCSB 6 PISCE S 1 PISCE S 2 PISCE S 3 PISCE S 4

63 PISCES Achieves System-wide Fairness Membase (Noqueue)FQReplica Selection FQ + Replica Selection 0.68 MMR0.51 MMR0.79 MMR0.97 MMR ms ms ms ms GET Requests (kreq/s) Ideal fair share: 110 kreq/s (1kB requests) Time (s)

64 Server 1Server 2Server 3Server 4 Hot Partition Collision = Infeasible Allocation Time (s) 2 Hot Partitions! GET Requests (kreq/s) Local Fair Queuing Throughput

65 Server 1Server 2Server 3Server 4 Hot Partition Collision = Infeasible Allocation Local Membase (No Queue) Throughput Time (s) GET Requests (kreq/s)

66 PISCES Provides Strong Performance Isolation Membase (Noqueue)FQReplica Selection FQ + Replica Selection 0.68 MMR0.51 MMR0.79 MMR0.97 MMR ms ms ms ms GET Requests (kreq/s) Time (s) 2x demand vs. 1x demand tenants (equal weights) Membase (Noqueue)FQReplica Selection FQ + Replica Selection 0.42 MMR0.50 MMR 0.97 MMR ms ms ms ms ms ms ms ms

67 Membase (Noqueue)FQReplica Selection FQ + Replica Selection 0.42 MMR0.50 MMR 0.97 MMR ms ms ms ms ms ms ms ms PISCES Provides Strong Performance Isolation GET Requests (kreq/s) Time (s) 2x demand vs. 1x demand tenants (equal weights)

68 Equal Global Weight, Differing Resource Workloads PISCES Achieves Dominant Resource Fairness Time (s) Bandwidth (Mb/s) Latency (ms) GET Requests (kreq/s) 76% of effective bandwidth 76% of effective throughput Bottleneck Resource

69 Differentiated Global Weights ( ) PISCES Achieves System-wide Weighted Fairness GET Requests (kreq/s) Latency (ms) Fraction of Requests Time (s)

70 Equal Global Weights, Staggered Local Weights PISCES Achieves Local Weighted Fairness Time (s) GET Requests (kreq/s) Latency (ms) Fraction of Requests

71 Differentiated (Staggered) Local Weights Server 1Server 2Server 3Server 4 PISCES Achieves Local Weighted Fairness Time (s) wt = 4 wt = 3 wt = 2 wt = 1 GET Requests (kreq/s)

72 PISCES Handles Heterogenous Workloads

73 PISCES Handles Heterogenous Workloads

74 PISCES Adapts To Shifting Tenant Distributions Tenant 3Tenant 2Tenant 1Tenant 4 Weight 1x Weight 2x Weight 1x Weight 2x

75 Global Tenant ThroughputTenant Latency (1s average) Server 1Server 2Server 3Server 4 PISCES Adapts To Shifting Tenant Distributions Tenant 3Tenant 2Tenant 1Tenant Tenant Weight

76 Even Weights Proportional Weights Dynamic Weight Allocation Tput Fairnes s Latenc y Tput Fairnes s Latenc y Tput Fairnes s Latenc y No Replica Selection 378 kreq/s ms(0.1) 422 kreq/ s ms(0.08 ) 420 kreq/ s ms (0.08) Windowe d 434 kreq/s ms (0.23) 434 kreq/ s ms(0.24 ) 433 kreq/ s ms (0.22) Matching Demand Distributions With Weight Allocation

77 Weight allocation rides the margins EvenProportionalDynamic TFLTFLTFL None 378k ( 0.1) (0.06) 422k (0.08) (0.06) 419k (0.49) (0.28) Uniform 432k (0.17) (0.07) 431k (0.14) (0.068) 431k (0.39) (0.09) Windowed 434k (0.228) 0.49 (0.232) 434k (0.244) 0.43 (0.125) 433k (0.22) (0.127)

78 Conclusion PISCES achieves: - System-wide per-tenant fair sharing - Strong performance isolation - Low operational overhead (< 3%) PISCES combines: - Partition Placement: find a feasible fair allocation (TBD) - Weight Allocation: adapts to (shifting) per-tenant demand - Replica Selection: distributes load according to local weights - Fair Queuing: enforces per-tenant fairness and isolation

79 Conclusion Most multi-tenant shared services do not provide any notion of fairness and isolation PISCES provides system-wide per-tenant fair sharing and performance isolation Replica selection and fair queueing achieve (weighted) per-tenant fairness and isolation Weight allocation matches per-tenant distribution demands to improve fairness and system latency

80 Future Work Implement partition placement (for T >> N) Generalize the fairness mechanisms to different services and resources (CPU, Memory, Disk) Scale evaluation to a larger test-bed (simulation) Thanks!

PrincetonUniversity Thanks! Questions?

82 Thank You! Questions?

83 Not Distributed Rate-Limiting DRL = distributed, hierarchical fair-queuing (sigmetrics paper) Single tenant distributed across multiple sites Allocate first at the aggregate (per-site) level, then at finer granularities (per-flow) Does not account for inter-tenant fairness and resource contention (reciprocal resource swap) Rate-limiting = non-work conserving

84 Weighted Max-Min Fair Optimization Objective System Model Per-tenant weighted max- min fair share High-level global optimization

85 Optimization Formulation Addendum M/M/1 queue approximation for weight swap computation Partition placement optimization to distribute partitions over nodes according to demand and fair share constraints