Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Streaming SIMD Extension (SSE)
CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Multicore & Parallel Processing P&H Chapter ,
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter 17 Parallel Processing.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 7 Multicores, Multiprocessors, and Clusters.
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Chapter One Introduction to Pipelined Processors.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi-core architectures. Single-core computer Single-core CPU chip.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.
Hyper Threading (HT) and  OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki.
RISC Architecture RISC vs CISC Sherwin Chan.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Morgan Kaufmann Publishers
Processor Architecture
Pipelining and Parallelism Mark Staveley
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
EKT303/4 Superscalar vs Super-pipelined.
Pipelining Example Laundry Example: Three Stages
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Processor Level Parallelism 1
PipeliningPipelining Computer Architecture (Fall 2006)
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
COMP 740: Computer Architecture and Implementation
Advanced Architectures
Instruction Level Parallelism
How do we evaluate computer architectures?
Computer Architecture Principles Dr. Mike Frank
Parallel Processing - introduction
Multi-core processors
Multi-core processors
CS 147 – Parallel Processing
Multi-core processors
Pipeline Implementation (4.6)
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Multi-Processing in High Performance Computer Architecture:
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
Serial versus Pipelined Execution
Coe818 Advanced Computer Architecture
Chapter 1 Introduction.
/ Computer Architecture and Design
Chapter 11: Alternative Architectures
Levels of Parallelism within a Single Processor
CS 286 Computer Organization and Architecture
Multicore and GPU Programming
Presentation transcript:

Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff

Processor Execution time The time taken by a program to execute is the product of – Number of machine instructions executed – Number of clock cycles per instruction (CPI) – Single clock period duration Example: 10,000 instructions, CPI=2, clock period = 250 ps

Processor Execution time Instruction Count for a program – Determined by program, ISA and compiler Average Cycles per instruction (CPI) – Determined by CPU hardware – If different instructions have different CPI Average CPI affected by instruction mix Clock cycle time ( inverse of frequency ) – Logic levels – technology

Reducing clock cycle time Has worked well for decades. Small transistor dimensions implied smaller delays and hence lower clock cycle time. Not any more.

CPI (cycles per instruction) What is LC-3 cycles per instruction? Instructions take 5-9 cycles (p. 568), assuming memory access time is one clock period. – LC-3 CPI may be about 6*. (ideal) No cache, memory access time = 100 cycles? – LC-3 CPI would be very high. Cache reduces access time to 2 cycles. – LC-3 CPI higher than 6, but still reasonable. Load/store instructions are about 20-30%

Parallelism to save time Do things in parallel to save time. Example: Pipelining – Divide flow into stages. – Let instructions flow into the pipeline. – At a time multiple instructions are under execution.

Chapter 4 — The Processor — 7 Pipelining Analogy Pipelined laundry: overlapping execution – Parallelism improves performance Four loads: time = 4x2 = 8 hours Pipelined: Time in example = 7x0.5 = 3.5 hours Non-stop = 4x0.5 = 2 hours.

Chapter 4 — The Processor — 8 Pipeline Processor Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Pipelining: Issues Cannot predict which branch will be taken. – Actually you may be able to make a good guess. – Some performance penalty for bad guesses. Instructions may depend on results of previous instructions. – There may be a way to get around that problem in some cases.

Instruction level parallelism (ILP): Pipelining is one example. Multiple issue: have multiple copies of resources – Multiple instructions start at the same time – Need careful scheduling Compiler assisted scheduling Hardware assisted (“superscaler”): “dynamic scheduling” – Ex: AMD Opteron x4 – CPI can be less than 1!.

Flynn’s taxonomy Michael J. Flynn, 1966 Data Streams SingleMultiple Instruction Streams SingleSISD: Intel Pentium 4 SIMD: SSE instructions of x86 MultipleMISD: No examples today MIMD: Intel Xeon e5345 Instruction level parallelism is still SISD SSE (Streaming SIMD Extensions): vector operations Intel Xeon e5345: 4 cores

Multi what? Multitasking: tasks share a processor Multithreading: threads share a processor Multiprocessors: using multiple processors – For example multi-core processors (multiples processors on the same chip) – Scheduling of tasks/subtasks needed Thread level parallelism: – multiple threads on one/more processors Simultaneous multi-threading: – multiple threads in parallel (using multiple states)

Multi-core processors Power consumption has become a limiting factor Key advantage: lower power consumption for the same performance – Ex: 20% lower clock frequency: 87% performance, 51% power. A processor can switch to lower frequency to reduce power. N cores: can run n or more threads.

Multi-core processors Cores may be identical or specialized Higher level caches are shared. Lower level cache coherency required. Cores may use superscalar or simultaneous multi-threading architectures.

LC-3 states InstructionCycles ADD, AND, NOT, JMP 5 TRAP8 LD, LDR, ST, STR 7 LDI, STI9 BR5, 6 JSR6