10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.

Slides:



Advertisements
Similar presentations
CH14 Instruction Level Parallelism and Superscalar Processors
Advertisements

Computer Organization and Architecture
Computer architecture
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Multithreading processors Adapted from Bhuyan, Patterson, Eggers, probably others.
10/11: Lecture Topics Slides on starting a program from last time Where we are, where we’re going RISC vs. CISC reprise Execution cycle Pipelining Hazards.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
CS 162 Computer Architecture Lecture 10: Multithreading Instructor: L.N. Bhuyan Adopted from Internet.
Chapter 17 Parallel Processing.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Prince Sultan College For Woman
Basics and Architectures
Previously Fetch execute cycle Pipelining and others forms of parallelism Basic architecture This week we going to consider further some of the principles.
Multi-Core Architectures
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
What have mr aldred’s dirty clothes got to do with the cpu
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
Pipelining and Parallelism Mark Staveley
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
ISA's, Compilers, and Assembly
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
CS 352H: Computer Systems Architecture
Advanced Architectures
A Memory Aliased Instruction Set Architecture
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Henk Corporaal TUEindhoven 2009
Hyperthreading Technology
Superscalar Processors & VLIW Processors
Levels of Parallelism within a Single Processor
Computer Architecture Lecture 4 17th May, 2006
Instruction Level Parallelism and Superscalar Processors
Henk Corporaal TUEindhoven 2011
Levels of Parallelism within a Single Processor
Instruction Level Parallelism
Presentation transcript:

10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems

Superscalar Pipelines Superscalar pipelines can execute multiple instructions at once –2+ instructions in any stage of the pipeline Some processors allow 8 instructions to be issued at once Most programs can only take advantage of 1 or 2 issue slots

Out-of-Order Execution Allows you to execute any instruction that you can Enables more issue slots to be filled Often out-of-order execution, but in- order commit –that is, write back results in the order they should have occurred Note: IA-64 is in-order

Longer Pipelines Pipelines are getting longer –original RISC pipelines had 5 stages –pipelines now have up to 20 stages Allows the clock cycle to be very fast Okay as long as you can accurately predict branches (or get rid of them)

Speculation Prediction –better branch predictors (95% accurate) –predict many levels of branches –predict variable values –predict load addresses Simultaneously execute both paths of a branch Execute instructions even if there could be a dependency –sw after lw could be the same address, but probably not –let the sw execute and then fix it if you were wrong

Predicated Execution Predicated execution allows conditional moves and conditional adds instead of only conditional branches Avoids branches, which are bad because pipelines are so long IA-64 almost everything in IA-64 is predicated (many 1-bit predicate registers) HW problem with movn and movz was an example of this

VLIW Long Instruction Words (LIW) and Very Long Instruction Words (VLIW) –each instruction contains multiple smaller instructions that execute in parallel –(V)LIW instructions can be 128 to 1024 bits long and contain 3 to 16 instructions It's the compiler's job to find independent instructions to execute

Register Windows Saving registers on the stack during procedure call hurts performance Register windows use a stack of registers that are allocated to a procedure as it needs it Local Name Actual Name... r76 r75 r74 r73 t2r72 t1r71 t0r70 t1r69 t0r68 t2r67 t1r66 t0r65... Foo() Bar() Baz()

Smarter Compilers VLIW requires good compilers Predicated execution and speculation needs help from the compiler Old architectures had instructions to emulate high-level constructions (bad) New architectures provide many general instructions and instruction options IA-64 will keep compiler writers busy for a decade

Multiple CPUs on a Chip Chip multiprocessors –multiple simple CPUs, but share a cache –can run multiple programs simultaneously –single programs are no faster –like a multiprocessor machine but cheaper Simultaneous Multithreading (SMT) –more complex CPUs –like chip multiprocessors + superscalar + out-of- order –also improves single program performance –developed at UW –memory bandwidth is an issue for both

Funky Hardware on a Chip We can squeeze more and more transistors on a chip What do we do with them? Bigger caches (boring) Put programmable hardware on the CPU –FPGAs can be (re)programmed quickly –hardware runs 1000X faster than software Graphics specific hardware Instruction Co-Processors Simultaneously run two copies of all programs to avoid hardware glitches

Low Power CPUs are being put in everything, even devices that have very small batteries (tiny sensors) Need to make CPUs that use very little power (only as much as they need) –reduce the CPU clock frequency –allow the OS to turn off part of the chip Transmeta is building chips that emulate Intel x86, but with less power

Time to Market It used to be solely about being the fastest Now being adequate is enough Being the first technology to fill a need is the most important