TECH 6 VLIW Architectures {Very Long Instruction Word}

Slides:



Advertisements
Similar presentations
Computer Architecture Instruction-Level Parallel Processors
Advertisements

Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
ENGS 116 Lecture 101 ILP: Software Approaches Vincent H. Berk October 12 th Reading for today: , 4.1 Reading for Friday: 4.2 – 4.6 Homework #2:
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
1 Lecture 18: VLIW and EPIC Static superscalar, VLIW, EPIC and Itanium Processor (First introduce fast and high- bandwidth L1 cache design)
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Microprocessors VLIW Very Long Instruction Word Computing April 18th, 2002.
9. Code Scheduling for ILP-Processors TECH Computer Science {Software! compilers optimizing code for ILP-processors, including VLIW} 9.1 Introduction 9.2.
Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors.
3.13. Fallacies and Pitfalls Fallacy: Processors with lower CPIs will always be faster Fallacy: Processors with faster clock rates will always be faster.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
5 Pipelined Processor TECH Computer Science  temporal overlapping of processing, assembly line 5.1 Basic concept 5.2 Design space of pipelines 5.3 Overview.
The Pentium: A CISC Architecture Shalvin Maharaj CS Umesh Maharaj:
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Processor Organization and Architecture
High Performance Architectures
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Basics and Architectures
Very Long Instruction Word (VLIW) Architecture. VLIW Machine It consists of many functional units connected to a large central register file Each functional.
INTRODUCTION Crusoe processor is 128 bit microprocessor which is build for mobile computing devices where low power consumption is required. Crusoe processor.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
Hardware Support for Compiler Speculation
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
Spring 2003CSE P5481 VLIW Processors VLIW (“very long instruction word”) processors instructions are scheduled by the compiler a fixed number of operations.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
 Introduction to SUN SPARC  What is CISC?  History: CISC  Advantages of CISC  Disadvantages of CISC  RISC vs CISC  Features of SUN SPARC  Architecture.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Pipelining and Parallelism Mark Staveley
Transmeta’s New Processor Another way to design CPU By Wu Cheng
CISC and RISC 12/25/ What is CISC? acronym for Complex Instruction Set Computer Chips that are easy to program and which make efficient use of memory.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
15-740/ Computer Architecture Lecture 2: ISA, Tradeoffs, Performance Prof. Onur Mutlu Carnegie Mellon University.
1 Aphirak Jansang Thiranun Dumrongson
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Topics to be covered Instruction Execution Characteristics
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Advanced Architectures
VLIW Architecture FK Boachie..
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Superscalar Processors & VLIW Processors
The EPIC-VLIW Approach
The Pentium: A CISC Architecture
Henk Corporaal TUEindhoven 2011
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Superscalar and VLIW Architectures
Design of Digital Circuits Lecture 19a: VLIW
Presentation transcript:

TECH 6 VLIW Architectures {Very Long Instruction Word} {EPIC: Explicit Parallel Instruction-set Computer} {CISC  RISC EPIC} 6.1 Basic principles 6.2 Overview of proposed and commercial VLIW architectures 6.3 Case study: The Trace 200 family TECH Computer Science CH01

Basic Structure of VLIW Architecture

Basic principle of VLIW Controlled by very long instruction words comprising a control field for each of the execution units Length of instruction depends on the number of execution units (5-30 EU) the code lengths required for controlling each EU (16-32 bits) 256 to 1024 bits Disadvantages: on average only some of the control fields will actually be used waste memory space and memory bandwidth e.g. Fortran code is 3 times larger for VLIW (Trace processor)

Instruction word format: Trace 7/200

VLIW: Static Scheduling of instructions/ Instruction scheduling done entirely by [software] compiler Lesser Hardware complexity translate to increase the clock rate raise the degree of parallelism (more EU); {Can this be utilized?} Higher Software (compiler) complexity compiler needs to aware hardware detail number of EU, their latencies, repetition rates, memory load-use delay, and so on cache misses: compiler has to take into account worst-case delay value this hardware dependency restricts the use of the same compiler for a family of VLIW processors

6.2 overview of proposed and commercial VLIW architectures

6.3 Case study: Trace 200

Trace 7/200 256-bit VLIW words Capable of executing 7 instructions/cycle 4 integer operations 2 FP 1 Conditional branch Found that every 5th to 8th operation on average is a conditional branch Use sophisticated branching scheme: multi-way branching capability executing multi-paths assign priority code corresponds to its relative order

Trace 28/200: storing long instructions 1024 bit per instruction a number of 32-bit fields maybe empty Storing scheme to save space 32-bit mask indicating each sub-field is empty or not followed by all sub-fields that are not empty resulting still 3 time larger memory space required to store Fortran code (vs. VAX object code) very complex hardware for cache fill and refill Performance data is Impressive indeed!

Trace: Performance data

Transmeta: Crusoe Full x86-compatible: by dynamic code translation High performance: 700MHz in mobile platforms

Crusoe “Remarkably low power consumption” “A full day of web browsing on a single battery charge” (333-400Mhz TM3120 in production now, running Linux! TM5400 runs Windows, production mid 2000) Playing DVD: Conventional? 105 C vs. Crusoe: 48 C

Intel: Itanium (VLIW) {EPIC} Processor