Presentation is loading. Please wait.

Presentation is loading. Please wait.

Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Similar presentations


Presentation on theme: "Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim."— Presentation transcript:

1 Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim

2 Outline Background and Motivation Age Based Scheduling Evaluation Conclusion 2

3 3 Asymmetric (Chip) Multiprocessors Heterogeneous Architectures where all cores have same ISA but different performance PE A PE B Heterogeneous Architecture

4 4 Asymmetric (Chip) Multiprocessors Potential for better performance than SMPs occupying same area and consuming same power Core0Core1 Core2Core3 Core0 Symmetric Chip Multiprocessor (SMP/CMP)Asymmetric Chip Multiprocessor (AMP/ACMP) Core1Core2Core3

5 AMPs present new challenges Thread Scheduling is one among them 5

6 6 Scheduling in Multiprocessor OSes Thread Assignment –assign to least loaded core Load Balancing –make load on all cores uniform Idle Balancing –move threads from busy cores to idle core

7 7 Scheduling in Multiprocessor OSes Assume that all cores are identical Results in bad performance and application instability Parsec benchmarks on a (real) AMP using the Linux Scheduler all-fast16 cores- 2GHz half-half8 cores -2GHz, 8 cores -1GHz all-slow16 cores - 1GHz

8 8 Problem with current Scheduling Not taking advantage of fast core

9 9 Outline Background and Motivation Age Based Scheduling (ABS) Evaluation Conclusion

10 10 Motivation for Age Based Scheduling Many compute-intensive multithreaded applications follow fork-join model Milestones (barriers) in thread execution Application Model … … … … … fork join barrier main thread

11 11 Symmetry of Applications Threads created together are symmetric –Based on instruction count –Degree of Symmetry = Std Dev / Average Degree of Symmetry of Parsec Benchmarks (Symmetric benchmarks are benchmarks with degree of symmetry <= 0.1)

12 Insight exe_dur (T1) = exe_dur (T2) = exe_dur (T3) = exe_dur (T4) Difficult to predict absolute execution duration, so predict relative execution duration 12 execution duration = ? barrier T1T1 T2T2 T3T3 T4T4

13 Putting together Applications follow fork-join model with milestones in between Many applications are symmetric Easy to predict relative execution duration to next milestone  Age Based Scheduling 13

14 What is Age? Age is the progress made by a thread towards its next milestone 14

15 15 Age Calculation Threads created together have the same age As a thread executes, it ages Reset age when milestone crossed t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone (barrier) t A = 30 t A = X t A = 0 t B = 0t B = 50t B = 0 X – Unknown, assumed to be a large value

16 16 Age Based Scheduling Algorithm To make a Scheduling decision: Calculate remaining execution duration to next milestone based on age Assign threads with longer remaining execution durations to fast core – Longest Job to Fast Core First (LJFCF)

17 Application of L JFCF Apply whenever –Thread is created –A core becomes idle –Reassignment timer expires (for load balancing) 17

18 Working of the Algorithm execution t A = 0 creation milestone (termination) milestone (barrier) t A = 30 Age at barrier = X rem_exe = (X – 30) T1T1 18

19 19 Remaining Execution Duration (I) Track progress of threads Using Prediction [AGE] –Predict all threads have same inter-milestone distance t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone (barrier) t A = X t A = 0 t A = X t B = 0 t B = X

20 20 Remaining Execution Duration (II) Using Profiling [AGE(PROF)] –threads have different inter-milestone distances calculated based on a metric obtained by profiling t A – age of thread A t B – age of thread B creation execution t A = 0 milestone (termination) milestone ( barrier ) t A = X t A = 0t A = X t B = 0 t B = rX r is from profiler Only one r value for each thread

21 Working of the Algorithm fast slow B CD A rem_exe A = 50rem_exe D = 30rem_exe C = 90rem_exe B = 70 A C rem_exe C = 90rem_exe A = 50 21

22 22 Benefit of Age Based Scheduling Asymmetry aware Utilizes all cores Gives all threads opportunities to run on fast cores

23 23 Implementation OS –Track progress using Performance Counters –Disable counter on Interrupts Compiler (AGE[PROF]) –Passing profiled information one value for each thread

24 24 Outline Background and Motivation Age Based Scheduling Evaluation Conclusion

25 25 Evaluation Simulation based experiments Trace + execution hybrid simulator Lock, barriers are modeled Context switch and migration overhead simulated 10 ms time slice for each thread Machine configuration 1 fast, 7 slow, 8:1 speed ratio (others are in the paper) Benchmarks Symmetric –Parsec (simmedium input) Asymmetric –Splash-2 –OMPSCR –SuperLU

26 Comparisons with Other Policies 26 Policy Description LinuxLinux O(1) Scheduler RRThreads are assigned to fast cores in a Round Robin fashion SCALEDLD [Li’07] Fast Core First assignment, asymmetry aware load balancing (baseline) FCA-AGEFast Core First assignment with Age based periodic reassignment AGEAge based assignment and reassignment using prediction AGE(PROF)Age based assignment and reassignment using profiling AGE(ORACLE)Age based assignment and reassignment using oracle

27 27 L JFCF vs Other Policies (I) * - Default Linux Policy which performs considerable worse than other policies is not shown PolicyAvg % reduction over SCALEDLD RR-36.64 FCA-AGE9.8 AGE10.4 AGE(PROF)13.2 AGE(ORACLE)15.4 Parsec Baseline: SCALEDLD

28 L JFCF vs Other Policies (II) Asymmetric Benchmarks 28 PolicyAvg % reduction over SCALEDLD FCA-AGE8.2 AGE7.7 AGE(PROF)9.4 AGE(ORACLE)13.1 Baseline: SCALEDLD

29 29 Idle Cycles Linux Scheduler – Most of the idle cycles contributed by fast core SCALEDLD – keeps same thread(s) on fast core AGE – assigns different threads to fast core

30 30 Different AMP Configurations Need for asymmetry aware scheduling increases as cores become more asymmetric AGE based policies show more improvement over SCALEDLD as asymmetry increases X/1 : Ratio of speeds of Fast and Slow cores is X:1

31 31 Outline Background and Motivation Age Based Scheduling Evaluation Conclusion

32 32 Conclusion Age based scheduling (ABS) for Asymmetric Multiprocessors –ABS assumes threads created at the same time are symmetric –ABS assigns threads to cores based on their predicted remaining execution durations –Predictions are made based on Age of threads Improvement of 10.4% (Pred) and 13.2% (Prof) for Parsec and 7.6% (Pred) and 9.4% (Prof) for Asymmetric benchmarks over Li’s mechanism

33 THANK YOU


Download ppt "Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim."

Similar presentations


Ads by Google