Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24, 2007 1.

Slides:

Advertisements

Similar presentations

Dynamic Thread Mapping for High- Performance, Power-Efficient Heterogeneous Many-core Systems Guangshuo Liu Jinpyo Park Diana Marculescu Presented By Ravi.

Advertisements

Processes Management.

CPU Scheduling Tanenbaum Ch 2.4 Silberchatz and Galvin Ch 5.

D YNAMIC T HREAD A SSIGNMENT ON H ETEROGENEOUS M ULTIPROCESSOR A RCHITECTURE Hüsnü Şensoy.

Lecture 6: Multicore Systems

PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.

SLA-Oriented Resource Provisioning for Cloud Computing

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.

Scheduling Algorithms for Unpredictably Heterogeneous CMP Architectures J. Winter and D. Albonesi, Cornell University International Conference on Dependable.

CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,

International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,

Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.

- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.

CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.

1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Veynu Narasiman The University of Texas at Austin Michael Shebanow

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Kris Lange Nopparat suwaanarat Pree Thiengburanathum.

Low Contention Mapping of RT Tasks onto a TilePro 64 Core Processor 1 Background Introduction = why 2 Goal 3 What 4 How 5 Experimental Result 6 Advantage.

1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.

Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.

1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.

1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Age Based Scheduling for Asymmetric Multiprocessors Nagesh B Lakshminarayana, Jaekyu Lee & Hyesoon Kim.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.

1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.

Static Process Scheduling

Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.

By Islam Atta Supervised by Dr. Ihab Talkhan

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker ： Chun-Chung Chen Single-ISA.

Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato

Core Architecture Optimization for Heterogeneous CMPs R. Kumar, D. M. Tullsen, and N.P. Jouppi İlker YILDIRIM

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

Computer Architecture: Parallel Task Assignment

Introduction to Load Balancing:

Simultaneous Multithreading

“Temperature-Aware Task Scheduling for Multicore Processors”

Ching-Chi Lin Institute of Information Science, Academia Sinica

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

/ Computer Architecture and Design

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Lecture: SMT, Cache Hierarchies

Improved schedulability on the ρVEX polymorphic VLIW processor

Computer Architecture Lecture 4 17th May, 2006

Faustino J. Gomez, Doug Burger, and Risto Miikkulainen

CPU SCHEDULING.

Fine-grained vs Coarse-grained multithreading

/ Computer Architecture and Design

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Patrick Akl and Andreas Moshovos AENAO Research Group

Presentation transcript:

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Agenda Introduction Basic ideas and goals Static versus Dynamic thread assignment Architecture and Methodology Results Conclusion 2

Introduction Multi programming Chip multiprocessor (CMP) Heterogeneous CMP system Homogeneous CMP system More processors cores in the single chip! 3

Goals Heterogeneous vs. Homogeneous CMP system What type of core should be replicated?  Many simple cores = higher thread parallelism  Fewer cores, larger = lower thread parallelism Multi-programmed computing environment may present threads of execution with different hardware resource requirements maximize resource utilization & achieve a high degree of inter-thread parallelism. 4

Basic ideas Taking advantage of Heterogeneous CMP  Mapping running tasks  Control Mechanism  Easy to implement Claim, a dynamic policy is more preferable than the static one. What is about a static policy? 5

Scenario of heterogeneous CMP system (1) Two processors P1 and P2 of different type Assume that each program will run for 1 million instructions IPC = (Instructions) / (cycle) 6 P1P2 Thread A Thread B1.51 Table 1: IPC of threads A and B on cores P1 and P2

Scenario of heterogeneous CMP system(2) Execution time = #instruction / IPC Moving the threads into different cores give the better total execution time. 7 P1P2 Thread A~700,000~2,500,000 Thread B~700,000~1,000,000 Table 2: Execution time of threads A and B on cores P1 and P2

Scenario of heterogeneous CMP system(3) If we move thread A into P2 and thread B to P1 Total execution time = 2.5M If we move thread A into P1 and thread B to P2 Total execution time = 1M The mapping programs to the core improve performance. Assume programs can migrate across cores 8

Dynamic thread assignment Thread assignment depends on the ratios between the IPCs on the two different core. The higher the ratios, the more the execution time. What is about thread assignment in homogenous CMP system? 9

Simulation approach Real program and real processors. SPEC2000 and Alpha  Performance  Number of cores and programs  IPC number problems.  Thread migration overhead.  New assignment policies 10

Processor configurations (1) 11

Benchmark SPEC

Benchmark of EV5 and EV6 13

Processor configurations (2) homogeneous configurations:  4 EV6s or 20 EV5s heterogeneous configurations:  5 EV5s and 3EV6s  10 EV5s and 2 EV6s  15 EV5s and 1 EV6. 14

CMP Simulation Model To evaluate different combinations of processors, workloads, and thread assignment policies. The model: working principles: A multiprocessor system can be thought of as a collection of processor and thread objects where each thread represents an instance of one of the benchmark programs. Modeling thread migration inter-core context switch - the architectural state (PC value, registers, etc.) Use the parameter such as switch_duration and switch_loss 15

Assignment Policies Static Assignment Well studies problem before assign Solution rely on heuristics a random static assignment. Don’t know the work loads and IPC, always assign the faster core (EV6) a pseudo best static assignment. Know the work loads and IPC, use heuristic to find out. Disadvantages  does not optimize EV6 usage  slow” threads on EV5 penalize overall system performance 16

Assignment Policies Round robin dynamic assignment  rotating the assignment of threads to processors in a round robin fashion  ensures that the available EV6 cores are equally shared among the running programs. 17

IPC driven dynamic assignment Considering the characteristics of the executing threads. Look at IPC number and ratio to decide Thread with higher ratio run on EV6 Thread with lower ratio run on EV5 18

Simulation results (1) Homogeneous vs. heterogeneous configuration with static assignment 19

Simulation results (2) Dynamic assignment 20

Conclusion Dynamic thread assignment increase performance and usage.  outperform a random assignment policy by 20% to 40%  outperform a homogeneous configuration by 20% to 80% 21

Bibliography [1] B. Michael, C. Patrick. “Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures”, Conference on computing frontiers, Proceedings the 3 rd conference on computing systems, page 29-40, May 2006 [2] Silberschatz, Gavin, Gagne. “Operating system concepts sixth edition”,

Question? 23