Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6.

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
CS5365 Pipelining. Divide task into a sequence of subtasks. Each subtask is executed by a stage (segment)of the pipe.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE Advanced Computer Architecture Week-4 Week of Feb 2, 2004 engr.smu.edu/~rewini/8383.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Chapter 7 – Registers.
Chapter 3 Pipelining. 3.1 Pipeline Model n Terminology –task –subtask –stage –staging register n Total processing time for each task. –T pl =, where t.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Logic and Computer Design Fundamentals Registers and Counters
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Relationships Among Total, Average, and Marginal Data.
Chapter 1_4 Part II Counters
Operand Addressing and Instruction Representation
The von Neumann Model – Chapter 4 COMP 2620 Dr. James Money COMP
Chapter 5 Normalization of Database Tables
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 3.
ECE 8053 Introduction to Computer Arithmetic (Website: Course & Text Content: Part 1: Number Representation.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Anshul Kumar, CSE IITD CSL718 : Pipelined Processors  Types of Pipelines  Types of Hazards 16th Jan, 2006.
Shift Registers pp Shift Registers Capability to shift bits ♦ In one or both directions Why? ♦ Part of standard CPU instruction set ♦ Cheap.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Chapter One Introduction to Pipelined Processors.
Reducing Test Application Time Through Test Data Mutation Encoding Sherief Reda and Alex Orailoglu Computer Science Engineering Dept. University of California,
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
Multiple-Cycle Hardwired Control Digital Logic Design Instructor: Kasım Sinan YILDIRIM.
Chapter 3 Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Complete Example.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
Reconfigurable Computing - Pipelined Systems John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
Chapter 3 Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Transistor: Building.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 10.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Chapter One Introduction to Pipelined Processors
Lecture 10: Computer Design Basics: The ALU and the Shifter Soon Tee Teoh CS 147.
1 Bottleneck Routing Games on Grids Costas Busch Rajgopal Kannan Alfred Samman Department of Computer Science Louisiana State University.
EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Full Design. DESIGN CONCEPTS The main idea behind this design was to create an architecture capable of performing run-time load balancing in order to.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
 Tata consultancy services Production Planning WORK CENTERS.
Normalizing Database Designs. 2 Objectives In this chapter, students will learn: –What normalization is and what role it plays in the database design.
Pipeline Design Problems
Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Chapter One Introduction to Pipelined Processors
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
Appendix D Mapping Control to Hardware
Overview Part 1 - Registers, Microoperations and Implementations
Memory System Performance Chapter 3
Copyright © Cengage Learning. All rights reserved.
EGR 2131 Unit 12 Synchronous Sequential Circuits
COMPUTER ORGANIZATION AND ARCHITECTURE
Linear Pipeline Processors
Pipelining and Superscalar Techniques
Presentation transcript:

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 2, 2006 Session 6

Computer Science and Engineering Copyright by Hesham El-Rewini Contents  Reservation Table  Latency Analysis  State Diagrams  MAL and its bounds  Delay Insertion  Throughput  Group Work

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table  A reservation table displays the time- space flow of data through the pipeline for one function evaluation  A static pipeline is specified by a single reservation table  A dynamic pipeline may be specified by multiple reservation tables

Computer Science and Engineering Copyright by Hesham El-Rewini Static Pipeline X X X X S1 S2 S3 S4 Time

Computer Science and Engineering Copyright by Hesham El-Rewini Dynamic Pipeline XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table (Cont.)  The number of columns in a reservation table is called the evaluation time of a given function.  The checkmarks in a row correspond to the time instants (cycles) that a particular stage will be used.  Multiple checkmarks in a row  repeated usage of the same stage in different cycles

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table (Cont.)  Contiguous checkmarks  extended usage of a stage over more than one cycle  Multiple checkmarks in one column  multiple stages are used in parallel  A dynamic pipeline may allow different initiations to follow a mix of reservation table

Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table AXXX BXX CXX DX

Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis  The number of cycles between two initiations is the latency between them  A latency of k  two initiations are separated by k cycles  Collision  resource conflict between two initiations  Latencies that cause collision  forbidden latencies

Computer Science and Engineering Copyright by Hesham El-Rewini Collision with latency 2 & 5 in evaluating X X1X2X1X2 X1 X1X2 X1X2 X1X2 X1 X1X2 X1X1 X2 X1 X2 S1 S2 S3 S1 S2 S3 5 2

Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis (cont.)  Latency Sequence  a sequence of permissible latencies between successive initiations  Latency Cycle  a latency sequence that repeats the same subsequence (cycle) indefinitely  Latency Sequence  1, 8  Latencies Cycle  (1,8)  1, 8, 1, 8, 1, 8 …

Computer Science and Engineering Copyright by Hesham El-Rewini Latency Analysis (cont.)  Average Latency (of a latency cycle)  sum of all latencies / number of latencies along the cycle  Constant Cycle  One latency value  Objective  Obtain the shortest average latency between initiations without causing collisions.

Computer Science and Engineering Copyright by Hesham El-Rewini Latency Cycle (1,8) X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X3 X4X3X3 X4X4 X3X4X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X3X4X3X3 X4X4 X5X5 X6X6 X1X1 X2X2 X1X1 X2X2 X1X1 X2X2 X3X4X4 X3X3 X4X4 X3X3 X4X5X5 Average Latency = (1+8)/2 = 4.5

Computer Science and Engineering Copyright by Hesham El-Rewini Latency Cycle (6) X1X1 X1X1 X2X2 X1X1 X2X3X3 X2X2 X3X3 X4X4 X3X3 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3 X4X4 X1X1 X1X1 X1X1 X2X2 X2X2 X3X3 X3X3X3 X4X4 Average Latency = 6

Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector C = (C m, C m-1, …, C 2, C 1 ) C i = 1 if latency i causes collision (forbidden) C i = 0 if latency i is permissible C m = 1 (always) maximum forbidden latency Maximum forbidden latency: m <= n-1 n = number of column in reservation table

Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector (X after X)  Forbidden Latencies: 2, 4, 5, 7  Collision Vector =

Computer Science and Engineering Copyright by Hesham El-Rewini Collision Vector (Y after Y)  Forbidden Latencies: 2, 4  Collision Vector =

Computer Science and Engineering Copyright by Hesham El-Rewini Single Function Controller C.V. Gate Grant X Grant X if 0 0 OR X after X

Computer Science and Engineering Copyright by Hesham El-Rewini Controller for a dual-function pipeline C.V. M after M C.V. M after A Gate Grant AGrant M OR Grant M if 0 0 C.V. A after M C.V. A after A Gate Grant AGrant M OR Grant A if 0 0

Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram  It specifies the permissible state transitions among successive initiations  Collision vector corresponds to the initial state at time t = 1 (initial collision vector)  The next state comes at time t + p, where p is a permissible latency in the range 1 <= p < m

Computer Science and Engineering Copyright by Hesham El-Rewini Right Shift Register The next state can be obtained with the help of an m-bit shift register Collision Safe to allow an initiation Each 1-bit shift corresponds to increase in the latency by 1

Computer Science and Engineering Copyright by Hesham El-Rewini The next state  The next state is obtained by bitwise ORing the initial collision vector with the shifted register  C.V. = (first state) C.V. 1-bit right shifted initial C.V OR

Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram for X *3* 1*1*

Computer Science and Engineering Copyright by Hesham El-Rewini Cycles  Simple cycles  each state appears only once (3), (6), (8), (1, 8), (3, 8), and (6,8)  Greedy Cycles  simple cycles whose edges are all made with minimum latencies from their respective starting states (1,8), (3)  one of them is MAL

Computer Science and Engineering Copyright by Hesham El-Rewini MAL  Minimum Average latency  At least one of the greedy cycles will lead to the MAL  Consider state diagram for Y, MAL is 3 (See diagram)

Computer Science and Engineering Copyright by Hesham El-Rewini State Diagram for Y *3* 1*1*

Computer Science and Engineering Copyright by Hesham El-Rewini Bounds on the MAL  MAL is lower bounded by the maximum number of checkmarks in any row of the reservation table. (Shar, 1972)  MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. (Shar, 1972)  The average latency of any greedy cycle is upper-bounded by the number of 1’s in the initial collision vector plus 1. This is also an upper bund on the MAL. (Shar, 1972)

Computer Science and Engineering Copyright by Hesham El-Rewini Delay Insertion  The purpose is to modify the reservation table, yielding a new collision vector  This may lead to a modified state diagram, which may produce greedy cycles meeting the lower bound on MAL

Computer Science and Engineering Copyright by Hesham El-Rewini Example S1 S2 S3 output

Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1XX S2XX S3XX Forbidden Latencies: 1, 2, 4 C.V. 

Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) State Diagram * 5+ MAL = 3

Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1 S2 S3 output D1 D2

Computer Science and Engineering Copyright by Hesham El-Rewini Example (Cont.) S1XX S2XX S3XX D1X D2X Forbidden: 2, 6 C.V. 

Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity 1 Find the State Diagram

Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Throughput  The average number of task initiations per clock cycle  The inverse of MAL

Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity S1XX S2X S3X C.V State DiagramSimple Cycles Greedy Cycles MAL Throughput (t = 20 ns)