COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits.

Slides:



Advertisements
Similar presentations
Process management Information maintained by OS for process management  process context  process control block OS virtualization of CPU for each process.
Advertisements

Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.
Instruction-Level Parallelism (ILP)
Thursday, June 08, 2006 The number of UNIX installations has grown to 10, with more expected. The UNIX Programmer's Manual, 2nd Edition, June, 1972.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Processes CSCI 444/544 Operating Systems Fall 2008.
Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
CSE451 Processes Spring 2001 Gary Kimura Lecture #4 April 2, 2001.
Processes April 5, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Process Concept An operating system executes a variety of programs
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
Parallelism Processing more than one instruction at a time. Pipelining
CSC 501 Lecture 2: Processes. Process Process is a running program a program in execution an “instantiation” of a program Program is a bunch of instructions.
Implementing Processes and Process Management Brian Bershad.
Exec Function calls Used to begin a processes execution. Accomplished by overwriting process imaged of caller with that of called. Several flavors, use.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
COMP Multithreading. Coarse Grain Multithreading Minimal pipeline changes – Need to abort instructions in “shadow” of miss – Resume instruction.
CS25212 Coarse Grain Multithreading Learning Objectives: – To be able to describe a coarse grain multithreading implementation – To be able to estimate.
Hardware Multithreading. Increasing CPU Performance By increasing clock frequency By increasing Instructions per Clock Minimizing memory access impact.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Multiprogramming. Readings r Silberschatz, Galvin, Gagne, “Operating System Concepts”, 8 th edition: Chapter 3.1, 3.2.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.
Computer Architecture: Multithreading (I) Prof. Onur Mutlu Carnegie Mellon University.
1 OS Review Processes and Threads Chi Zhang
Chapter 2 Process Management. 2 Objectives After finish this chapter, you will understand: the concept of a process. the process life cycle. process states.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Multiprogramming. Readings r Chapter 2.1 of the textbook.
COMP 740: Computer Architecture and Implementation
Processes and threads.
Process concept.
Operating Systems CMPSC 473
Caches in Systems Feb 2013 COMP25212 Cache 4.
Jonathan Walpole Computer Science Portland State University
Simultaneous Multithreading
Lecture Topics: 11/1 Processes Process Management
Multi-core processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Lecture: SMT, Cache Hierarchies
Computer Architecture: Multithreading (I)
From before the Break Classic 5-stage pipeline
Levels of Parallelism within a Single Processor
Hardware Multithreading
More examples How many processes does this piece of code create?
Lecture: SMT, Cache Hierarchies
Mid Term review CSC345.
PROCESS MANAGEMENT Information maintained by OS for process management
Lecture Topics: 11/1 General Operating System Concepts Processes
Lecture: SMT, Cache Hierarchies
Processes Hank Levy 1.
Processes and Process Management
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Lecture: SMT, Cache Hierarchies
Processes David Ferry CSCI 3500 – Operating Systems
Levels of Parallelism within a Single Processor
Hardware Multithreading
Processes Hank Levy 1.
Presentation transcript:

COMP25212 CPU Multi Threading Learning Outcomes: to be able to: –Describe the motivation for multithread support in CPU hardware –To distinguish the benefits and implementations of coarse grain, fine grain and simultaneous multithreading –To explain when multithreading is inappropriate –To be able to describe a multithreading implementations –To be able to estimate performance of these implementations –To be able to state important assumptions of this performance model

Revision: Increasing CPU Performance Data Cache Fetch Logic Decode LogicFetch LogicExec LogicFetch LogicMem LogicWrite Logic Inst Cache How can throughput be increased? Clock a c b d f e

Increasing CPU Performance a)By increasing clock frequency b)By increasing Instructions per Clock c)Minimizing memory access impact – data cache d)Maximising Inst issue rate – branch prediction e)Maximising Inst issue rate – superscalar f)Maximising pipeline utilisation – avoid instruction dependencies – out of order execution g)(What does lengthening pipeline do?)

Increasing Program Parellelism –Keep issuing instructions after branch? –Keep processing instructions after cache miss? –Process instructions in parallel? –Write register while previous write pending? Where can we find additional independent instructions? –In a different program!

Revision – Process States Terminated Running on a CPU Blocked waiting for event Ready waiting for a CPU New Dispatch (scheduler) Needs to wait (e.g. I/O) I/O occurs Pre-empted (e.g. timer)

Revision – Process Control Block Process ID Process State PC Stack Pointer General Registers Memory Management Info Open File List, with positions Network Connections CPU time used Parent Process ID

Revision: CPU Switch Process P 0 Process P 1 Operating System Save state into PCB 0 Load state fromPCB 1 Save state into PCB 0 Load state fromPCB 1

What does CPU load on dispatch? Process ID Process State PC Stack Pointer General Registers Memory Management Info Open File List, with positions Network Connections CPU time used Parent Process ID

What does CPU need to store on deschedule? Process ID Process State PC Stack Pointer General Registers Memory Management Info Open File List, with positions Network Connections CPU time used Parent Process ID

CPU Support for Multithreading Data Cache Fetch Logic Decode LogicFetch LogicExec LogicFetch LogicMem LogicWrite Logic Inst CachePC A PC B VA Mapping A VA Mapping B Address Translation GPRs A GPRs B

How Should OS View Extra Hardware Thread? A variety of solutions Simplest is probably to declare extra CPU Need multiprocessor-aware OS

CPU Support for Multithreading Data Cache Fetch Logic Decode LogicFetch LogicExec LogicFetch LogicMem LogicWrite Logic Inst CachePC A PC B VA Mapping A VA Mapping B Address Translation GPRs A GPRs B Design Issue: when to switch threads

Coarse-Grain Multithreading Switch Thread on “expensive” operation: –E.g. I-cache miss –E.g. D-cache miss Some are easier than others!

Switch Threads on Icache miss Inst aIFIDEXMEMWB Inst bIFIDEXMEMWB Inst cIF MISSIDEXMEMWB Inst dIFIDEXMEM Inst eIFIDEX Inst fIFID Inst X Inst Y Inst Z ----

Performance of Coarse Grain Assume (conservatively) – 1GHz clock (1nS clock tick!), 20nS memory ( = 20 clocks) – 1 i-cache miss per 100 instructions – 1 instruction per clock otherwise Then, time to execute 100 instructions without multithreading – clock cycles – Inst per Clock = 100 / 120 = With multithreading: time to exec 100 instructions: – 100 [+ 1] – Inst per Clock = 100 / 101 =

Switch Threads on Dcache miss Inst aIFIDEXM-MissWB Inst bIFIDEXMEMWB Inst cIFIDEXMEMWB Inst dIFIDEXMEM Inst eIFIDEX Inst fIFID MISS Inst X Inst Y Performance: similar calculation (STATE ASSUMPTIONS!) Where to restart after memory cycle? I suggest instruction “a” – why? Abort these