Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group.

Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group Department of Computer Science Duke University *

Internet Service Utilities Shared server cluster Web hosting centers Shared reserve capacity to handle surges and failures. Service/load multiplexing Dynamic provisioning Service is contractual Performance isolation Differentiated service SLAs

Utility Resource Management Goal: meet contractual service quality (SLA) targets under changing load; use resources efficiently. Approach: assign each hosted service a dynamic “slice” of resources. Combine “slivers” of shared servers, i.e., CPU time and memory. Resource containers [Banga99], VMware ESX [Waldspurger02], PlanetLab Assign shares of storage server I/O throughput. Given the mechanisms for performance isolation and proportional sharing, how do we set the knobs?

Adaptive Multi-Resource Provisioning This work addresses resource allocation policy for multiple resources, with a focus on memory & storage. 1. Provisioning: how much? [Muse SOSP01] 2. Assignment: which servers and storage units? clients Utility data center Utility OS executive Actuator (directives) Monitor (observations) or service manager

Model-Based Provisioning Resources interact in complex ways to determine overall service performance. Resource manager performance predictions Application models candidate allotments Incorporate a model of application behavior. Model predicts effects of candidate allotments. Plan allotments that are predicted to yield desired behavior. Monitor load and adapt as load intensity varies. workload profiles (e.g., access locality) storage models

Goals Research question: how can a resource manager incorporate these models when they exist? Manage multiple resources for diverse system goals. Meet SLA targets for response time Use surplus to optimize global average response time, yield, or value. Adjust to constraints discovered during assignment. Storage-aware caching [Forney03] Demonstrate that even simple models are a powerful basis for dynamic resource management.

Non-goals We are NOT trying to: build better models (you can plug in your favorite) parameterize or adapt models online from system observations manage network bandwidth schedule resources within each slice solve the assignment problem (bin-packing) allocate resources across the wide area make probabilistic performance guarantees Assume stable average case behavior at each load level, and provision for average response time.

System Context Load and performance measures reconfigurable redirecting switch configuration commands offered load λ per service server pool stateless interchangeable clients Muse [SOSP01] MBRP storage tier

Enforcing Slices Our prototype uses the Dash Web server [Asad02] to enforce resource control for slices at user level. Based on Flash [Pai99] using DAFS network storage. Asynchronous I/O from user space to user-level cache Low overhead (zero-copy, etc.), and user-level control Fully asynchronous, event-driven server “SEDA meets Click.” Independently size caches for co-hosted services. Request Windows [Jin03]: control the number of outstanding I/Os on a per-service basis. Dash is part of the utility’s trusted computing base.

A Simple Web Service Model CPU arrival rate λ Object cache (M) Storage λSλS M yields hit rate H λ S = λ (1 – H) Streams of requests with stable average case behavior per request class Varying load intensity λ Provision each stage, and M Downstream demand grows and shrinks with M (inverse) Bottlenecks limit demand downstream Generalize to stages or tiers

Web Cache Model Cache Size (M) 1 – M 1 – α 1 – T 1 – α H = -------------- H Footprint T objects Average size S Size is independent of popularity Cache M objects Given Zipf popularity  LFU approximation Integrate over the Zipf PDF

Storage Arrival Rate (IOPS) Cache Size ( M ) λ s = λS(1 – H) Each miss requires S I/O operations. S determines intensity of bulk I/O in this service’s storage load. Model predicts storage response time R S for load λ S given an IOPS share  per-service. Account for prefetching and sequential locality indirectly. λSλS

An Example using Dash IBM 2001 segment Load λ grows during trace segment. Dynamic cache resizing Storage IOPS demand λ S matches model prediction (squint) A few transient shifts in request locality

A Model-Based Allocator MBRP is a package of three primitives that coordinate with an assignment planner. Candidate Plan an initial allotment vector with CPU share and [M,  ] LocalAdjust Adjust a vector to adapt to a resource constraint or surplus, while staying on target for response time. GroupAdjust Modify a set of vectors to adapt to a fixed resource constraint or surplus exposed during assignment. Use any surplus to meet system-wide goals.

Candidate There is a large space of possible allotment vectors to meet a given response time target. Simplify the search space with a simple principle: Build a balanced system. Set the CPU share and storage allotment  to hit a preconfigured target utilization level . The  determines response time at storage and CPU. Select the minimum M and H that can hit the SLA target for overall response time. Refine  based on M and H and resulting λ S. Converges quickly.

Candidate LocalAdjust LocalAdjust adapts to constraint in one resource by adding more of another. Take as much as you can of the constrained resource, then rebalance to meet SLA target. E.g., in this graph it grows memory to respond to an IOPS constraint. Note: it’s not linear.

GroupAdjust Input: set of allotment vectors, with a group constraint or surplus. E.g., planner mapped all vectors to a shared server, leaving surplus memory. Adapt vectors to conform to constraint or use the surplus to meet a global goal. E.g., for services with the same profiles ( , S, T), prefer the service with the heaviest load.

Example: Differentiated Service Four identical services: -same load λ -same profiles ( , S, T) -same storage units Different SLA targets. Provision memory to meet targets first, then optimize global response time. (Give next unit of surplus memory to the most constrained service.)

Some Other Results in the Paper 1. GroupAdjust for services with different profiles and equivalent loads: prefer higher-locality services. 2. Simple dynamic example to optimize for global response time in a storage-aware fashion. 3. “Putting it all together” experiment: adjust to changes in locality, SLA targets, and available resources as well as changes in load. 4. Handle overload by shifting a co-hosted service to another server (bin-packing assignment). 5. Preliminary evaluation of storage model.

Conclusion Models are important for self-managing systems. MBRP shows how to use models to adapt proactively. Respond proactively to changing load signal, rather than reacting to off-target performance measures. It’s easy to plug better models into the framework. It seems clear that we can generalize this. Broader class of systems (e.g., multi-tier) and system goals (e.g., availability). But: models may be brittle or just plain wrong (HAL). Self-managing systems will combine proactive and reactive mechanisms.

http://issg.cs.duke.edu http://www.cs.duke.edu/~chase

Assignment Planning Map services to servers and storage units Allocator primitives work in concert with assignment planning Bin-packing services, balancing affinity, migration costs, local constraints/ surplus

Related Work Proportional-share schedulers: mechanism to enforce provisioning policies. Resource Containers[Banga99], Cluster Reserves[Aron00] Response-time schedulers: meet SLA targets without explicit partitioning/provisioning. Neptune[Shen02], Facade [Lumb03] Adaptive Resource Management for Servers: reactive, feedback-based adjustment of server resources. Web Server Performance Guarantees[Abdelzaher02], Predictable Web Server QoS[Aron-PhD], SEDA[Welsh01] Memory/storage management: goal-directed allotment of resources to services. Storage Aware Caching[Forney02], Value Sensitive Caching [Kelly99], Hippodrome[Anderson02]

Multiple Shared Resources Bottleneck Behavior Non-bottleneck resource adjustments have little effect. Global Constraints Services compete for resources in zero-sum game Local Constraints Service assignment to nodes exposes local resource constraints. Caching Memory allotment affects storage load for single service, impacting available resources for other services

Adaptive Resource Provisioning Utility OS Services Predictable average-case response time Resource intensive Workload Models predict Resource Demand Resource Interaction Effect of allotment decisions Framework is reactive to changes in workload characteristics for dynamic adaptation

Outline Overview Resource control mechanisms Web Service Models Model-Based Allocator Conclusions

Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group.

Similar presentations

Presentation on theme: "Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group.

Similar presentations

Presentation on theme: "Model-Based Resource Provisioning for a Web Service Utility Ron Doyle*, Jeff Chase, Omer Asad, Wei Jin, Amin Vahdat Internet Systems and Storage Group."— Presentation transcript:

Similar presentations

About project

Feedback