vCAT: Dynamic Cache Management using CAT Virtualization

vCAT: Dynamic Cache Management using CAT Virtualization
Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science University of Pennsylvania

Trend: Multicore & Virtualization
Cyber physical systems are becoming increasingly complex Require high performance and strong isolation Virtualization on multicore help handle such complexity  Increase performance and reduce cost  Challenge: Harder to achieve timing isolation Collision avoidance Adaptive cruise control Pedestrian detection Infotainment VM1 VM2 VM3 VM4 Hypervisor

Problem: Shared cache interference
A task uses the cache to reduce its execution time Concurrent tasks may access the same cache area  Extra cache misses  Increased WCET Intra-VM interference Inter-VM cache interference Tasks: VM1 VM2 1 2 1 2 3 4 3 4 Hypervisor P1 P2 P3 P4 Cache Collision

Existing approach: Static management
Statically assign non-overlapped cache areas to tasks (VMs) Pros: Simple to implement Cons: Low cache resource utilization Unused cache area of one task (VM) cannot be reused by another Cons: Not always feasible e.g., when the whole task set does not fit into the cache Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4

Our approach: Dynamic management
Dynamically assign disjoint cache areas to tasks (VMs) Pros: Enable cache reuse  Better utilization of the cache  Running tasks (VMs) can have larger cache areas, and thus smaller WCETs Challenge: Account for the cache overhead in sched. analysis The cache overhead scenario under dynamic cache allocation are more complex than that under static cache allocation Tasks: VM1 VM2 1 2 3 4 1 2 3 4 Hypervisor P1 P2 P3 P4 TODO: A figure to show dynamic management

Our approach: Dynamic management
Challenge: How to achieve the efficient dynamic cache management while guaranteeing isolation? Efficiency: The dynamic management should incur small overhead Solution: Hardware-based Increasingly many CPUs support the cache partitioning Benefit: Cache reconfiguration can be done very efficiently Example: Intel processors that support cache partitioning Processor family Number of COTS processors Intel(R) Xeon(R) processor E5 v3 6 out of 48 Intel(R) Xeon(R) processor D 15 out of 15 Intel(R) Xeon(R) processor E3 v4 5 out of 5 Intel(R) Xeon(R) processor E5 v4 117 out of 117 Source: and

Contribution: vCAT vCAT: Dynamic cache management by virtualizing CAT
First work that achieves dynamic cache management for tasks in virtualization systems on commodity multicore hardware Achieve strong shared cache isolation for tasks and VMs Support the dynamic cache management for tasks and VMs OS in VM can dynamically allocate cache partitions for its tasks Hypervisor can dynamically reconfigure cache partitions for VMs Support cache sharing among best-effort VMs and tasks

Outline Introduction Background: Intel CAT Design & Implementation
Evaluation

Intel Cache Allocation Technology (CAT)
Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) COS register ID Reserved 63 31 9 PQR Cache Bit Mask COS 63 20 Shared cache

Intel Cache Allocation Technology (CAT)
Divide the shared cache into α partitions (α = 20) Similar to way-based cache partitioning Provide two types of model-specific registers Each core has a PQR register K Class of Service (COS) registers shared by all cores (K = 4) Configure cache partitions for a core Step 1: Set the cache bit mask of the COS Step 2: Link the core with a COS by setting PQR PQR 1 COS register ID Reserved 63 31 9 COS 0x0000F Cache Bit Mask 63 20 Shared cache

Intel CAT: Software support
Xen hypervisor supports Intel CAT System operators can allocate cache partitions for VMs only Pros: Mitigate the interference among VMs Cons: Do not provide strong isolation among VMs Cons: Do not allow a VM to manage partitions for its tasks Tasks in the same VM can still interfere each other Cons: Only support a limited number of VMs with different cache-partition settings e.g., the number of VMs with different cache-partition settings supported by Xen is ≤ 4 on our machine (Intel Xeon 2618L v3 processor).

Outline Introduction Background: Intel CAT Design & Implementation
Evaluation

Goals Dynamically control cache allocations for tasks and VMs
Each VM should control the cache allocation for its own tasks The hypervisor should control the cache allocation for the VMs Preserve the virtualization abstraction layer Physical resources should not be exposed to VMs Guarantee cache isolation among tasks and VMs Tasks should not interfere with each other after the reconfiguration

Dynamic cache allocation for tasks
To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs VM Modify COS Hypervisor Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF00 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

To modify the cache configuration of a task, VM needs to modify the cache control registers BUT, cache control registers are only available to the hypervisor One possible approach: Expose the registers to VMs Problem: Potential cache interference among VMs e.g., a VM may overwrite the hypervisor’s allocation decision Problem: Hypervisor needs to notify VMs for any changes VM Modify COS Hypervisor Validate the operation Core P2 COS register 0xF Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

vCAT: Key insight Virtualize cache partitions and expose virtual caches to VMs Hypervisor assigns virtual and physical cache partitions to VMs VM controls the allocation of its assigned virtual partitions to tasks Hypervisor translates VM’s operations on virtual partitions to operations on the physical partitions VM VM operates on its virtual cache Virtual cache Hypervisor Translate the operation Core P2 COS register 0xF0 Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Challenge 1: No control for cache hit requests
A task’s contents stay in cache until they are evicted Problem: A task can access its content in its previous partitions via cache hits  interfere with another task Not explicitly documented in Intel’s SDM We confirmed this limitation with experiments (available in the paper) Tasks hit Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Collision

Solution: Cache flushing
Task’s content in the previous partitions is no longer valid Approach 1: Flush for each memory address of the task Pros: Not affect the other tasks’ cache content Cons: Slow when a task’s working set size is large (> 8.46MB) Approach 2: Flush the entire cache Pros: Efficient when a task’s working set size is large (> 8.46MB) Cons: Flush the other tasks’ cache content as well vCAT provides both approaches to system operators Discussion of the tradeoffs and flushing heuristics are in the paper Tasks Core Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Challenge 2: Contiguous allocation constraint
Unallocated partitions may NOT be contiguous Fragmentation of cache partitions in dynamic allocation Low cache resource utilization VM 3 VM 1 VM 2 Virtual cache Invalid! Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Solution: Partition defragmentation
Rearrange the partitions to form contiguous partitions Hypervisor rearranges physical cache partitions for VMs VM rearranges virtual cache partitions for tasks VM 3 VM 1 VM 2 Virtual cache Physical cache 1 2 3 4 5 6 7 8 9 10 11 12 13 14

vCAT: Design summary Introduce virtual cache partitions
Enable the VM to control the cache allocation for its tasks without breaking the virtualization abstraction Flush the cache when the cache partitions of tasks (VMs) are changed Guarantee cache isolation among tasks and VMs in dynamic cache management Defragment non-contiguous cache partitions Enables better cache utilization Refer to the paper for technical details and other design considerations e.g., how to allocate and de-allocate partitions for tasks and VMs e.g., how to support an arbitrary number of tasks and VMs with different cache-partition settings

Implementation Hardware: Intel Xeon 2618L v3 processor
Design works for any processors that support both virtualization and hardware-based cache partitioning Implementation based on Xen 4.8 and LITMUSRT LITMUSRT: Linux Testbed for Multiprocessor Scheduling in Real-Time Systems 5K Line of Code (LoC) in total Hypervisor (Xen): 3264 LoC VM (LITMUSRT): 2086 LoC Flexible to add new cache management policies

Outline Introduction Background: Intel CAT Design Evaluation

vCAT Evaluation: Goals
How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management

How much overhead is introduced by vCAT? How much WCET reduction is achieved through cache isolation? How much real-time performance improvement vCAT enables? Static management vs. No management Dynamic management vs. Static management The rest of the evaluation is available in the paper

vCAT run-time overhead
Static cache management Overhead occurs only when a task/VM is created Negligible overhead: ≤ 1.12us Dynamic cache management Overhead occurs whenever the partitions of a task/VM are changed Reasonably small overhead: ≤ 27.1 ms Value depends on the workload’s working set size (WSS) Overhead = min{3.23 ms/MB × WSS, 27.1ms} More details can be found in the paper Computation of the overhead value based on the WSS Experiments that show the factors that contribute to the overhead

Static management: Evaluation setup
PARSEC benchmarks Convert to LITMUSRT compatible real-time tasks Randomly generate real-time parameters for the benchmarks to generate real-time tasks Benchmark VM Pollute VM PARSEC benchmarks … Cache-intensive task VM VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache

Static management vs. No management
Static management improves system utilization significantly Fraction of schedulable task sets Improve system utilization by 1.0 / 0.3 = 3.3x VCPU utilization No management Static management Real-time performance of streamcluster benchmark

Static management vs. No management
The more cache sensitive the workload is, the more performance benefit is achieved 33

Dynamic management: Evaluation setup
Create the workloads that have dynamic cache demand Dual-mode tasks: Switch from mode 1 to mode 2 after 1min Type 1: Task increases its utilization by decreasing its period Type 2: Task decreases its utilization by increasing its period Benchmark VM Pollute VM Type 1 dual-mode task … … Type 2 dual-mode task VM Cache-intensive task VP1 VP2 VP3 VP4 Pin to core Hypervisor Core P1 P2 P3 P4 Cache

Dynamic management vs. Static management
Dynamic outperforms static significantly Fraction of schedulable task sets Improve system utilization by 0.6/0.2 = 3x VCPU utilization Static management Dynamic management

Conclusion vCAT: A dynamic cache management framework for virtualization systems using CAT virtualization Provide strong isolations among tasks and VMs Support both static and dynamic cache allocations for both real-time tasks and best-effort tasks Evaluation shows that dynamic management substantially improves schedulability compared to static management Future work Develop more sophisticated cache resource allocation policies for tasks and VMs in virtualization systems Apply vCAT to real systems, e.g., automotive systems and cloud computing

vCAT: Dynamic Cache Management using CAT Virtualization

Similar presentations

Presentation on theme: "vCAT: Dynamic Cache Management using CAT Virtualization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

vCAT: Dynamic Cache Management using CAT Virtualization

Similar presentations

Presentation on theme: "vCAT: Dynamic Cache Management using CAT Virtualization"— Presentation transcript:

Similar presentations

About project

Feedback