Download presentation
Presentation is loading. Please wait.
Published byPrudence Cannon Modified over 8 years ago
1
Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim
2
Outline Background and Motivation Age Based Scheduling Evaluation Conclusion 2
3
3 Asymmetric (Chip) Multiprocessors Heterogeneous Architectures where all cores have same ISA but different performance PE A PE B Heterogeneous Architecture
4
4 Asymmetric (Chip) Multiprocessors Potential for better performance than SMPs occupying same area and consuming same power Core0Core1 Core2Core3 Core0 Symmetric Chip Multiprocessor (SMP/CMP)Asymmetric Chip Multiprocessor (AMP/ACMP) Core1Core2Core3
5
AMPs present new challenges Thread Scheduling is one among them 5
6
6 Scheduling in Multiprocessor OSes Thread Assignment –assign to least loaded core Load Balancing –make load on all cores uniform Idle Balancing –move threads from busy cores to idle core
7
7 Scheduling in Multiprocessor OSes Assume that all cores are identical Results in bad performance and application instability Parsec benchmarks on a (real) AMP using the Linux Scheduler all-fast16 cores- 2GHz half-half8 cores -2GHz, 8 cores -1GHz all-slow16 cores - 1GHz
8
8 Problem with current Scheduling Not taking advantage of fast core
9
9 Outline Background and Motivation Age Based Scheduling (ABS) Evaluation Conclusion
10
10 Motivation for Age Based Scheduling Many compute-intensive multithreaded applications follow fork-join model Milestones (barriers) in thread execution Application Model … … … … … fork join barrier main thread
11
11 Symmetry of Applications Threads created together are symmetric –Based on instruction count –Degree of Symmetry = Std Dev / Average Degree of Symmetry of Parsec Benchmarks (Symmetric benchmarks are benchmarks with degree of symmetry <= 0.1)
12
Insight exe_dur (T1) = exe_dur (T2) = exe_dur (T3) = exe_dur (T4) Difficult to predict absolute execution duration, so predict relative execution duration 12 execution duration = ? barrier T1T1 T2T2 T3T3 T4T4
13
Putting together Applications follow fork-join model with milestones in between Many applications are symmetric Easy to predict relative execution duration to next milestone Age Based Scheduling 13
14
What is Age? Age is the progress made by a thread towards its next milestone 14
15
15 Age Calculation Threads created together have the same age As a thread executes, it ages Reset age when milestone crossed t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone (barrier) t A = 30 t A = X t A = 0 t B = 0t B = 50t B = 0 X – Unknown, assumed to be a large value
16
16 Age Based Scheduling Algorithm To make a Scheduling decision: Calculate remaining execution duration to next milestone based on age Assign threads with longer remaining execution durations to fast core – Longest Job to Fast Core First (LJFCF)
17
Application of L JFCF Apply whenever –Thread is created –A core becomes idle –Reassignment timer expires (for load balancing) 17
18
Working of the Algorithm execution t A = 0 creation milestone (termination) milestone (barrier) t A = 30 Age at barrier = X rem_exe = (X – 30) T1T1 18
19
19 Remaining Execution Duration (I) Track progress of threads Using Prediction [AGE] –Predict all threads have same inter-milestone distance t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone (barrier) t A = X t A = 0 t A = X t B = 0 t B = X
20
20 Remaining Execution Duration (II) Using Profiling [AGE(PROF)] –threads have different inter-milestone distances calculated based on a metric obtained by profiling t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone ( barrier ) t A = X t A = 0t A = X t B = 0 t B = rX r is from profiler Only one r value for each thread
21
Working of the Algorithm fast slow B CD A rem_exe A = 50rem_exe D = 30rem_exe C = 90rem_exe B = 70 A C rem_exe C = 90rem_exe A = 50 21
22
22 Benefit of Age Based Scheduling Asymmetry aware Utilizes all cores Gives all threads opportunities to run on fast cores
23
23 Implementation OS –Track progress using Performance Counters –Disable counter on Interrupts Compiler (AGE[PROF]) –Passing profiled information one value for each thread
24
24 Outline Background and Motivation Age Based Scheduling Evaluation Conclusion
25
25 Evaluation Simulation based experiments Trace + execution hybrid simulator Lock, barriers are modeled Context switch and migration overhead simulated 10 ms time slice for each thread Machine configuration 1 fast, 7 slow, 8:1 speed ratio (others are in the paper) Benchmarks Symmetric –Parsec (simmedium input) Asymmetric –Splash-2 –OMPSCR –SuperLU
26
Comparisons with Other Policies 26 Policy Description LinuxLinux O(1) Scheduler RRThreads are assigned to fast cores in a Round Robin fashion SCALEDLD [Li’07] Fast Core First assignment, asymmetry aware load balancing (baseline) FCA-AGEFast Core First assignment with Age based periodic reassignment AGEAge based assignment and reassignment using prediction AGE(PROF)Age based assignment and reassignment using profiling AGE(ORACLE)Age based assignment and reassignment using oracle
27
27 L JFCF vs Other Policies (I) * - Default Linux Policy which performs considerable worse than other policies is not shown PolicyAvg % reduction over SCALEDLD RR-36.64 FCA-AGE9.8 AGE10.4 AGE(PROF)13.2 AGE(ORACLE)15.4 Parsec Baseline: SCALEDLD
28
L JFCF vs Other Policies (II) Asymmetric Benchmarks 28 PolicyAvg % reduction over SCALEDLD FCA-AGE8.2 AGE7.7 AGE(PROF)9.4 AGE(ORACLE)13.1 Baseline: SCALEDLD
29
29 Idle Cycles Linux Scheduler – Most of the idle cycles contributed by fast core SCALEDLD – keeps same thread(s) on fast core AGE – assigns different threads to fast core
30
30 Different AMP Configurations Need for asymmetry aware scheduling increases as cores become more asymmetric AGE based policies show more improvement over SCALEDLD as asymmetry increases X/1 : Ratio of speeds of Fast and Slow cores is X:1
31
31 Outline Background and Motivation Age Based Scheduling Evaluation Conclusion
32
32 Conclusion Age based scheduling (ABS) for Asymmetric Multiprocessors –ABS assumes threads created at the same time are symmetric –ABS assigns threads to cores based on their predicted remaining execution durations –Predictions are made based on Age of threads Improvement of 10.4% (Pred) and 13.2% (Prof) for Parsec and 7.6% (Pred) and 9.4% (Prof) for Asymmetric benchmarks over Li’s mechanism
33
THANK YOU
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.