10-1 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
10-1 Chapter 10 - Trends in Computer Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Parallel.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Computer Organization and Architecture 18 th March, 2008.
Chapter 17 Parallel Processing.
Parallel Computer Architectures
10-1 Chapter 10 - Trends in Computer Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
Processor Organization and Architecture
Advanced Computer Architectures
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
4-1 Chapter 4 - The Instruction Set Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
5-1 Chapter 5 - Languages and the Machine Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer.
6-1 Chapter 6 - Languages and the Machine Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
July 2005Computer Architecture, Data Path and ControlSlide 1 Part IV Data Path and Control.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.
Pipelining and Parallelism Mark Staveley
Performance Enhancement. Performance Enhancement Calculations: Amdahl's Law The performance enhancement possible due to a given design improvement is.
1 Basic Components of a Parallel (or Serial) Computer CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM CPU MEM.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
EKT303/4 Superscalar vs Super-pipelined.
CPS 258, Fall 2004 Introduction to Computational Science.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Engineering Rabie A. Ramadan Lecture 2. Table of Contents 2 Architecture Development and Styles Performance Measures Amdahl’s Law.
CPS 258 Announcements –Lecture calendar with slides –Pointers to related material.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
Processor Level Parallelism 1
Advanced Architectures
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
William Stallings Computer Organization and Architecture 8th Edition
buses, crossing switch, multistage network.
Parallel Processing - introduction
Chapter 14 Instruction Level Parallelism and Superscalar Processors
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Fundamentals of Computer Design
Instruction Level Parallelism and Superscalar Processors
Pipelining and Vector Processing
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
Part IV Data Path and Control
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

10-1 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer Architecture Miles Murdocca and Vincent Heuring Chapter 10: Trends in Computer Architecture

10-2 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Chapter Contents 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath 10.4 Overlapping Register Windows 10.5 Multiple Instruction Issue (Superscalar) Machines – The PowerPC 10.6 Case Study: The PowerPC™ 601 as a Superscalar Architecture 10.7 VLIW Machines 10.8 Case Study: The Intel IA-64 (Merced) Architecture 10.9 Parallel Architecture Case Study: Parallel Processing in the Sega Genesis

10-3 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Instruction Frequency Frequency of occurrence of instruction types for a variety of languages. The percentages do not sum to 100 due to roundoff. (Adapted from Knuth, D. E., An Empirical Study of FORTRAN Programs, Software—Practice and Experience, 1, , 1971.)

10-4 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Complexity of Assignments Percentages showing complexity of assignments and procedure calls. (Adapted from Tanenbaum, A., Structured Computer Organization, 4/e, Prentice Hall, Upper Saddle River, New Jersey, 1999.)

10-5 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Speedup and Efficiency Speedup S is the ratio of the time needed to execute a program without an enhancement to the time required with an enhancement. Time T is computed as the instruction count IC times the number of cycles per instruction CPI times the cycle time . Substituting T into the speedup percentage calculation above yields:

10-6 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Example Example: Estimate the speedup obtained by replacing a CPU having an average CPI of 5 with another CPU having an average CPI of 3.5, with the clock period increased from 100 ns to 120 ns. The previous equation becomes:

10-7 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Four-Stage Instruction Pipeline

10-8 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Pipeline Behavior Pipeline behavior during a memory reference and during a branch.

10-9 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Filling the Load Delay Slot SPARC code, (a) with a nop inserted, and (b) with srl migrated to nop position.

10-10 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Call-Return Behavior Call-return behavior as a function of nesting depth and time (Adapted from Stallings, W., Computer Organization and Architecture: Designing for Performance, 4/e, Prentice Hall, Upper Saddle River, 1996).

10-11 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization SPARC Registers User view of RISC I registers.

10-12 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Overlapping Register Windows

10-13 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Example: Compiled C Program Source code for C program to be compiled with gcc.

10-14 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization gcc Generated SPARC Code

10-15 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization gcc Generated SPARC Code (cont’)

10-16 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Effect of Compiler Optimization SPARC code generated with the -O optimization flag:

10-17 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization The PowerPC 601 Architecture

10-18 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization 128-Bit IA-64 Instruction Word

10-19 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Parallel Speedup and Amdahl’s Law In the context of parallel processing, speedup can be computed: Amdahl’s law, for p processors and a fraction f of unparallelizable code: For example, if f = 10% of the operations must be performed sequentially, then speedup can be no greater than 10 regardless of how many processors are used:

10-20 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Efficiency and Throughput Efficiency is the ratio of speedup to the number of processors used. For a speedup of 5.3 with 10 processors, the efficiency is: Throughput is a measure of how much computation is achieved over time, and is of special concern for I/O bound and pipelined applications. For the case of a four stage pipeline that remains filled, in which each pipeline stage completes its task in 10 ns, the average time to complete an operation is 10 ns even though it takes 40 ns to execute any one operation. The overall throughput for this situation is then:

10-21 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Flynn Taxonomy Classification of architectures according to the Flynn taxonomy: (a) SISD; (b) SIMD; (c) MIMD; (d) MISD.

10-22 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Network Topologies Network topologies: (a) crossbar; (b) bus; (c) ring; (d) mesh; (e) star; (f) tree; (g) perfect shuffle; (h) hypercube.

10-23 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Crossbar Internal organization of a crossbar.

10-24 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Crosspoint Settings (a) Crosspoint settings for connections 0  3 and 3  0; (b) adjusted settings to accommodate connection 1  1.

10-25 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Three-Stage Clos Network

10-26 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization 12- Channel Three- Stage Clos Network with n = p = 6

10-27 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization 12- Channel Three- Stage Clos Network with n = p = 2

10-28 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization 12-Channel Three-Stage Clos Network with n = p = 4

10-29 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization 12-Channel Three-Stage Clos Network with n = p = 3

10-30 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization C function computes (x 2 + y 2 )  y 2

10-31 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Dependency Graph (a) Control sequence for C program; (b) dependency graph for C program.

10-32 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Matrix Multiplication (a) Problem setup for Ax = b; (b) equations for computing the b i.

10-33 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Matrix Multiplication Dependency Graph

10-34 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization The Connection Machine CM-1 Block diagram of the CM-1 (Adapted from Hillis, W. D., The Connection Machine, The MIT Press, 1985).

10-35 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization CM-1 Router Network A four-space hypercube for the router network.

10-36 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization CM-1 Processing Element

10-37 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization The Connection Machine CM-5

10-38 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Partitions on the CM-5

10-39 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Fat Tree

10-40 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Parallel Processing in Sega Genesis External view of the Sega Genesis home video game system.

10-41 Chapter 10 - Trends in Computer Architecture Department of Information Technology, Radford University ITEC 352 Computer Organization Sega Genesis Architecture External view of the Sega Genesis home video game system.