Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura.

Similar presentations


Presentation on theme: "Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura."— Presentation transcript:

1 Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura The University of Tokyo PACT 2012 Presented by Kim, Jong-yul 2013. 7. 31

2 Contents Motivation SBMP Scheduler Scalability Prediction Core Partition Core Donation Phase Change Detection Evaluation Results Conclusions 2 / 27

3 Prospects Limitation of increasing F ILP, power wall, transistor scaling Multi-core, many-core system System APP2APP3 … APP1 Multi-threaded multiprogramming 3 / 27

4 Problem Traditional OS Assign equal CPU to all running apps Programs have different Scalability N ormalized T urnaround T ime Clock cycles when multiprogrammed with others Clock cycles when solo-run Workloads Average Workloads Performance 4 / 27 Linux: 2.04 Best Partitioning: 1.38

5 Experimental System allocation unit 5 / 27

6 SBMP Scheduler Scalability Prediction Core Partitioning Core Donation Phase Change Detection 6 / 27

7 Overview Assign cores considering scalability of applications SBMP: Scalability-Based Manycore Partitioning scheduler Partitioning Steady Scalability Prediction Core Partitioning Core Donation Detect 7 / 27

8 Steady Scalability Prediction Core Partitioning Core Donation Detect 8 / 27

9 Workloads Scalability Prediction (1/2) Cumulative retired instructions per second (IPS) Little effect from # of cores Total # of instructions 8% 9 / 27

10 Scalability Prediction (2/2) If obtained directly… Warm up branch prediction & cache system Need 8 allocations (6, 12, 18, …, 48) Simple model 3 coefficients (α, β, γ) 3 Samplings: 1 single core + 2 different configurations PerformanceAmdahl’s lawOverhead caused by additional core Over 3 seconds 10 / 27

11 Steady Scalability Prediction Core Partitioning Core Donation Detect 11 / 27

12 # of cores Relative performance Core Partitioning (1/2) High Medium Low # of cores Relative performance 12 / 27

13 Core Partitioning (2/2) Scalability-table for each program Key -value Key : # of cores Value : performance with [key] cores Goal Hill climbing algorithm Near optimal assignment Single-run Multiprogrammed 13 / 27

14 Steady Scalability Prediction Core Partitioning Core Donation Detect 14 / 27

15 Core Donation 1 program for each processor die CPU utilization Core1 Program1 CPU utilization ratio < Threshold (70%) Core2 Donor Donee: most beneficial one Utilization, scalability Priority: Donee < Donor Finer granularity Processor die (6 cores) time Program2 Donee 15 / 27

16 Steady Scalability Prediction Core Partitioning Core Donation Detect 16 / 27

17 Steady Scalability Prediction Core Partitioning Core Donation Detect 17 / 27

18 Detection (1/2) 1.Creation or termination of program 2.Phase transition detected in any of the programs Performance 18 / 27

19 Detection (2/2) – Phase Prediction Steady Scalability Prediction Core Partitioning Core Donation Detect 19 / 27

20 Evaluation Core Partitioning Phase Prediction Core Donation Overall Performance 20 / 27

21 Experimental System PARSEC benchmark suite 2.1 Processor4 X AMD Opteron 6172 # of dies / processor2 # of cores / die6 Total # of cores48 L3 cache size12 MB / socket Main memory96 GB DDR3 PC3-10600 Linux kernel2.6.37.6 21 / 27

22 Workloads Core Partitioning SBMP-base Scalability Prediction + Core Partitioning Single-phase application (2 Medium + 2 Low) Workloads Performance Average Linux: 1.88 SBMP-base: 1.54 22 / 27

23 Phase Prediction SBMP-PP (Phase Prediction) SBMP-base + Phase Prediction Multiple-phase application Workloads Linux: 1.89 SBMP-base: 2.09 SBMP-PP: 1.77 23 / 27

24 Core Donation SBMP-CD (Core Donation) SBMP-PP + Core Donation 2 low CPU utilization + 2 normal Workloads Linux: 2.06 SBMP-PP: 1.68 SBMP-CD: 1.60 24 / 27

25 Overall Results All programs Linux: 1.83 SBMP-base: 1.99 SBMP-PP: 1.70 (8%) SBMP-CD: 1.65 (11%) 72 Workloads 25 / 27

26 Conclusions OS scheduling on many core system Multiple Multi-threaded applications SBMP Scheduler Dynamic scalability prediction + Core partitioning Phase recognition Core Donation 11% over Linux 26 / 27

27 QnA 27 / 27

28 Hill Climbing Algorithm Find near optimal solution Start with arbitrary solution Incrementally changing a single element 28 / 27

29 Core Donation 1 program for each processor die CPU utilization P1 Program1 CPU utilization ratio < Threshold (70%) Donee: most beneficial one Utilization, scalability Priority: Donee < Donor Finer granularity Processor die (6 cores) Donor Program2 Donee P2 Program2 Donee 29 / 27

30 Evaluation PARSEC benchmark suite 2.1 4 benchmark programs for 1 workload Gang-scheduling for Green, co-scheduling for others Exception: freqmine, multiple phase changes BLCR tool Evaluate only the parallel region 30 / 27

31 31 / 27


Download ppt "Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura."

Similar presentations


Ads by Google