How Multi-threading can increase on-chip parallelism

Slides:



Advertisements
Similar presentations
Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer.
Advertisements

CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.
Microprocessor Microarchitecture Multithreading Lynn Choi School of Electrical Engineering.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 7:
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Single-Chip Multiprocessor Nirmal Andrews. Case for single chip multiprocessors Advances in the field of integrated chip processing. - Gate density (More.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
Chapter 17 Parallel Processing.
EECC722 - Shaaban #1 Lec # 2 Fall Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1995.
1 Lecture 10: ILP Innovations Today: ILP innovations and SMT (Section 3.5)
Simultaneous Multithreading:Maximising On-Chip Parallelism Dean Tullsen, Susan Eggers, Henry Levy Department of Computer Science, University of Washington,Seattle.
EECC722 - Shaaban #1 Lec # 2 Fall Simultaneous Multithreading (SMT) An evolutionary processor architecture originally introduced in 1996.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
The Vector-Thread Architecture Ronny Krashinsky, Chris Batten, Krste Asanović Computer Architecture Group MIT Laboratory for Computer Science
Hyper-Threading, Chip multiprocessors and both Zoran Jovanovic.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Multithreaded and multicore processors Marco D. Santambrogio:
– Mehmet SEVİK – Yasin İNAĞ
CASH: REVISITING HARDWARE SHARING IN SINGLE-CHIP PARALLEL PROCESSOR
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Spring 2003CSE P5481 Midterm Philosophy What the exam looks like. Definitions, comparisons, advantages & disadvantages what is it? how does it work? why.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 4: Microarchitecture: Overview and General Trends.
On-chip Parallelism Alvin R. Lebeck CPS 221 Week 13, Lecture 2.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
EKT303/4 Superscalar vs Super-pipelined.
E6200, Fall 07, Oct 24Ambale: CMP1 Bharath Ambale Venkatesh 10/24/2007.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Simultaneous Multithreading CMPE 511 BOĞAZİÇİ UNIVERSITY.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Processor Level Parallelism 1
COMP 740: Computer Architecture and Implementation
Electrical and Computer Engineering
Simultaneous Multithreading
Simultaneous Multithreading
Multi-core processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Hyperthreading Technology
Improved schedulability on the ρVEX polymorphic VLIW processor
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
CPE 631: Multithreading: Thread-Level Parallelism Within a Processor
Coe818 Advanced Computer Architecture
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
The Vector-Thread Architecture
Levels of Parallelism within a Single Processor
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Presentation transcript:

How Multi-threading can increase on-chip parallelism Multi-threaded RTOS How Multi-threading can increase on-chip parallelism

Outline Introduction Multi-threading models Architectures of multi-threaded processors Simultaneous multi-threading and multi-processors Cache design Examples of Multi-threaded environments Conclusions

Introduction Two forms of parallelism instruction-level parallelism (ILP) thread-level parallelism (TLP) Both identify independent instructions that can execute in parallel Wide-issue superscalar processors exploit ILP by executing multiple instructions from a single program in a single cycle. Multiprocessors exploit TLP by executing different threads in parallel on different processors. The first multi-threaded processor approaches in the 1970s and 1980s applied multi-threading at user-thread-level to solve the memory access latency problem.

Introduction Motivations for multi-threaded processor architecture development include chip area , cost and complexity. Simultaneous Multi-threading (SMT), Single chip multiprocessing (CMP), SMT VLIW architecture, Multithreaded Vector (SMV) architecture DSP applications inherently benefit from the following architectural characteristics: Parallelization at multiple levels of hierarchy: - Instruction - separate instruction memory space - Data – separate date memory space - Thread- multiple functional units - Data transfer – multiple wide data buses

Vertical and Horizontal Waste Vertical waste is introduced when the processor issues no instructions in a cycle Horizontal waste when not all issue slots can be filled in a cycle.

Vertical and Horizontal Waste

Multi-threaded Models Fine-Grain Multithreading Only one thread issues instructions each cycle, but it can use the entire issue width of the processor. SM: full Simultaneous Issue Single Dual Four SM: limited Connection Hardware context is connected directly one of each type of functional units. Less dynamic

Performance

SMT VLIW Architecture

Simultaneous Vector Multi-threaded Architecture (SVMT)

SMT vs. Multiprocessing

Cache design

Examples Multi-threaded RTOS Analog Devices VDK uClinux The RTXC Quadros RTOS RTCX/ss RTXC/ss ThreadX

Conclusions A simultaneous multithreaded architecture is superior in performance to a multiple-issue multiprocessor (multi-issue CMP). SMT boost utilization by dynamically scheduling functional units among multiple threads. SMT also increases hardware design flexibility. Simultaneous multithreading increases the complexity of instruction scheduling. Increased parallelism offered makes multi-threading ideal for DSP applications where each application can run as a separate thread.