Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simultaneous Multithreading

Similar presentations


Presentation on theme: "Simultaneous Multithreading"— Presentation transcript:

1 Simultaneous Multithreading
Pratyusa Manadhata Vyas Carnegie Mellon, Fall 03

2 References Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous Multithreading: A Platform for Next-generation Processors, in IEEE Micro, September/October 1997, pages Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm, and Dean Tullsen. Converting Thread-Level Parallelism Into Instruction-Level Parallelism via Simultaneous Multithreading, in ACM Transactions on Computer Systems, August 1997, pages Dean Tullsen, Susan Eggers, Joel Emer, Henry Levy, Jack Lo, and Rebecca Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , in Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996, pages Carnegie Mellon, Fall 03

3 Motivation For significant performance improvement, improving memory subsystem or increasing system integration not sufficient. So increase parallelism in all its available form Instruction Level Parallelism (ILP) Thread Level Parallelism (TLP) Carnegie Mellon, Fall 03

4 Architectural Alternatives
Superscalar Multithreaded Super scalar MultiProcessors Neither superscalar or SMP can capture ILP/TLP in its entirety Incapable of adapting to dynamic levels of ILP, and TLP Carnegie Mellon, Fall 03

5 Simultaneous Multithreading
TLP from either multithreaded parallel programs or from multiprogramming workload ILP from each thread Characteristics of SMT processors: from superscalar: issue multiple instructions per cycle from multithreaded: h/w state for multiple threads Carnegie Mellon, Fall 03

6 Superscalar Issue slots SMT Multithreaded
Carnegie Mellon, Fall 03

7 Comparison Superscalar: Multithreaded: SMT :
looks at multiple instructions from same process, both horizontal and vertical waste. Multithreaded: minimizes vertical waste: tolerate long latency operations SMT : Selects instructions from any "ready" thread Carnegie Mellon, Fall 03

8 SMT Model Minimal extension of superscalar processor
Changes in IF stage and register files only No static partitioning of resources Most of the hardware is still available to a single thread. Carnegie Mellon, Fall 03

9 SMT Model Per thread Large register file
State for hardware context (PC, registers) Instruction retirement, trapping, subroutine return Per thread id in BTB and TLB I cache port Large register file No of physical registers = 8 * 32 + registers for renaming Longer access time Carnegie Mellon, Fall 03

10 Pipeline superscalar SMT Carnegie Mellon, Fall 03

11 Fetch Mechanism (2.8 scheme)
Select 2 threads not incurring I cache miss, read 8 instructions from each thread. Choose as many possible from first thread and rest from the second, upto 8. Alternative – 1.8, 2.4, 4.2 Carnegie Mellon, Fall 03

12 I Count Which thread to fetch from
threads that have least number of instructions in the decode, rename and queue pipeline stages. even distribution, prevents starvation Carnegie Mellon, Fall 03

13 Results/Observations
Superscalars: approximately give an IPC of about 1-2 SMT: significantly higher than the values reported for superscalar Longer latency for a single thread? Why? not a significant performance effect Carnegie Mellon, Fall 03

14 Results/Observations…
SMT absorbs additional conflicts: greater ability to hide latency by using multiple issues from multiple threads. SMP MP2 and MP4 hindered by static resource partitioning SMT dynamically partitions resources among threads Carnegie Mellon, Fall 03

15 Results/Observations..
Multithreading can increase cache misses/conflicts More memory requirement More stress on branch prediction h/w Impact on program performance is not significant -> SMT + h/w + compiler opts can hide latency Carnegie Mellon, Fall 03

16 Future Directions Each processor in an SMP can use SMT
Next generation architectures: SMP on chip instead of wider superscalars Is the performance gain adequate with the additional resource cost Processor Cycle Design Time: Cost vs Performance Writing optimizing Compilers to take advantage of SMT. OS support for thread scheduling, thread priority etc Carnegie Mellon, Fall 03

17 Q & A ? Carnegie Mellon, Fall 03

18 Thank You. Carnegie Mellon, Fall 03


Download ppt "Simultaneous Multithreading"

Similar presentations


Ads by Google