Download presentation
Presentation is loading. Please wait.
Published byKathryn Blair Modified over 9 years ago
1
Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors : A Machine Learning Approach
2
Resource sharing problem in CMP Increasing levels of pressure on shared system resources Efficient sharing is necessary for high utilization and performance Multiple interacting resources Cache Space, DRAM Bandwidth and Power Budget Allocation of a resource affects demands of other resources Propose a resource allocation framework At runtime, monitors the execution of each application and learns a predictive model of performance as a function of resource allocation decisions and periodically allocates resources to each core using the model Introduction
3
Per-application HW performance model Use Artificial Neural Networks (ANNs) Predict each app’s performance as a function of the resources allocated to it Global resource manager At every interval, searches the possible resource allocations by querying the application performance model Resource Allocation Framework
4
Use ANNs Input units, hidden units and an output unit connected via a set of weighted edges Hidden(output) unit calculates a weighted sum of their inputs(hidden values) based on edge weights Edge weights are trained with training examples (data sets) How to Predict a Performance? (Artificial Neural Networks)
5
Input units L2 cache space, off-chip bandwidth, power budget Number of read hits, read misses, write hits, and write misses over the last 20K inst and over the 1.5M inst Fraction of cache ways that are dirty (the amount of WB traffic) Activation function Use sigmoid (integer to value in [0, 1]) Model performance as a function of its allocated resources and recent behavior Training during first 1.2 billion cycle with randomly allocated resource Always keep a training set consisting of 300 points Retrained at every 2,500,000 cycle How to Predict a Performance? (Adaptation to per-APP Performance Model)
6
Optimization Prevent memorizing outliers in a sample data Cross validation Data set is divided into N equal-sized folds (N-1 training sets and 1 test set) Ensemble consists of N ANN models Performance is predicted averaging the predictions of all ANNs in the ensemble Prediction error is estimated as a function of CoV of the predictions by each ANN in the ensemble (will be used for resource allocation) How to Predict a Performance? (Adaptation to per-APP Performance Model) Training Test Trning Test
7
Make resource allocation decision (at every 500,000 cycle) using the trained per-application performance model Discard queries involving an app with a high error estimate Fairly distribute resources to the running applications Predict the perf and compute the prediction error If the performance is estimated to be inaccurate (error > 9%), app is excluded from global resource allocation Search the space with stochastic hill climbing It starts with a random solution, and iteratively makes small changes to the solution, each time improving it a little. When the algorithm cannot see any improvement anymore, it terminates 2,000 trials produces the best tradeoff between search performance and overhead Resource Allocation
8
HW implementation Single HW ANN and multiplex edge weights on the fly to achieve 16 ‘virtual’ ANNs 12 * 4 + 4 multipliers as many as weighted edges 50 entry-table-based quantized sigmoid function Calculate in a pipelined manner Prediction(search) takes 16 cycles for 16 virtual ANNs Area, Power, and Delay 3% of the chip’s area 3W power consumption Possible to make 2,000 queries within 5% of interval OS Interface Embed training set and the ANN weights to the process state OS communicates the desired objective function through CR Implementation & Overhead
9
Tools & architecture Heavily modified version of SESC With Wattch(power), HotSpot(temperature) Baseline : Intel’s Core2Quad, DDR2-800 4-core CMP, frequency = 0.9GHz-4.0GHz(0.1GHz unit) 4MB, 16-way shared L2 cache Distributed 60W power budget among 4 apps via per-core DVFS Outs is limited to 57W Statically allocate 5W Partition L2 cache space at the granularity of cache ways Allocate one way to each app Distribute the remaining 12 ways Each app statically allocated 800MB/s of off-chip DRAM bandwidth and the remaining 3.2GB/s is distributed Experimental Setup
10
Metrics Weighted speedup Sum of IPCs Harmonic mean of normalized IPCs Weighted sum of IPCs Workload 9 quad-core multi-programmed workloads from SPEC2000 and NAS suites Classify into 3 categories CPU-bound Memory-bound Cache Sensitive Experimental Setup
11
Configurations Unmanaged Isolated Cache Management (Cache) Utility-based cache partitioning, MICRO’2006 Distribute L2 cache ways to minimize miss rate Isolated Power Management (Power) An analysis of efficient multi-core global power management policies : Maximizing performance for a given power budget, MICRO’2006 Isolated Bandwidth Management (BW) Fair Queuing Memory System, Micro ‘06 Uncoordinated Cache + Power, Cache + BW, Power + BW, Cache + Power + BW Continuous Stochastic Hill-Climbing (Coordinated-HC) Learning based SMT processor resource distribution(issue-queue, ROB, and register file), ISCA ’06 Fair-share Proposed scheme (Coordinated-ANN) ANN-based models of the applications’ IPC response to resource allocation are used to guide a stochastic hill-climbing search Experimental Setup
12
Performance Results are normalized to Fair-Share 14% average speedup over Fair-Share Similar for other metrics Evaluation Results P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
13
Sensitivity to confidence threshold Results are normalized to Fair-Share Evaluation Results P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
14
Confidence estimated mechanism Fraction of the total execution time where the ANN could predict the resource allocation optimization for each application Evaluation Results P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
15
Proposed a resource allocation framework that Manages multiple shared CMP resources in a coordinated fashion through ANNs and periodic resource allocation scheme Coordinated approach to multiple resource management is a key to delivering high performance in multi- programmed workloads Conclusions
16
Extras P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
17
Extras P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
18
Extras P,C,P,MM,C,P,MC,C,C,CP,C,M,CC,M,C,CC,P,C,MC,M,M,CP,C,P,MP,C,P,P
19
Extras
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.