Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,

Similar presentations


Presentation on theme: "ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,"— Presentation transcript:

1 ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance, Design with VHDL Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html

2 ECE 232 L4 perform.2 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Outline °Performance, evaluation Metrics: MIPS, CPI, execution time Amdahl’s law °VHDL basics Combinational logic Examples °Instruction formats, cont’d Addressing classes, modes Examples MIPS assembly

3 ECE 232 L4 perform.3 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Two notions of “performance” ° Time to do the task (Execution Time) – execution time, response time, latency ° Tasks per day, hour, week, sec, ns... (Performance) – throughput, bandwidth Response time and throughput often are in opposition Plane Boeing 747 Concodre Speed 610 mph 1350 mph NY to Paris 6.5 hours 3 hours Passengers 470 132 Throughput (p/mph) 286,700 178,200 Which has higher performance?

4 ECE 232 L4 perform.4 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Performance - Example Time of Concorde vs. Boeing 747? Concord is 1350 mph / 610 mph = 2.2 times faster (6.5 hours / 3 hours) Throughput of Concorde vs. Boeing 747 ? Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster” Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” Boeing is 1.6 times (“60%”)faster in terms of throughput Concord is 2.2 times (“120%”) faster in terms of flying time We will focus primarily on execution time for a single job Performance: in units of things/time_unit - bigger is better

5 ECE 232 L4 perform.5 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Metrics of performance (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Useful Operations per second Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units Each metric has a place and a purpose, and each can be misused

6 ECE 232 L4 perform.6 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Review: Aspects of CPU Performance CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds= Instructions x Cycles x Seconds Program Program Instruction Cycle Instr count CPIClock rate Program X Compiler X X Instr. Set X X Organization X X Technology X

7 ECE 232 L4 perform.7 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers MIPS, CPI CPU time = ClockCycleTime *  CPI i * Instr i i =1 n ° Invest Resources where time is Spent! CPI =  CPI i * F i where F i = Instr i i = 1 n Instruction Count "instruction frequency" CPI = Clock Cycles / Instruction Count = (CPU Time * Clock Rate) / Instruction Count CPI = average # cycles per instruction MIPS = # instructions per cycle (in millions) MIPS = Instruction count / Execution time *10 6 cycles per intstruction class i

8 ECE 232 L4 perform.8 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Evaluating Instruction Sets Design-time metrics: ° Can it be implemented, in how long, at what cost? ° Can it be programmed? Ease of compilation? Static Metrics: ° How many bytes does the program occupy in memory? Dynamic Metrics: ° How many instructions are executed? ° How many bytes does the processor fetch to execute the program? ° How many clocks are required per instruction? ° How "lean" a clock is practical? Best Metric: Time to execute the program! NOTE: this depends on instructions set, processor organization, and compilation techniques.

9 ECE 232 L4 perform.9 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Example (RISC processor) Typical Mix Base Machine (Reg / Reg) OpFreqCyclesCPI(i)% Time ALU50%1.523% Load20%5 1.045% Store10%3.314% Branch20%2.418% 2.2 How much faster would the machine be if a better data cache reduced the average load time to 2 cycles? How does this compare with using branch prediction to shave a cycle off the branch time? What if two ALU instructions could be executed at once?

10 ECE 232 L4 perform.10 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Amdahl's Law S peedup due to enhancement E: Exec_time w/o E Performance with E Speedup(E ) = ----------------------- = --------------------------- Exec_time with E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected. Then: Exec_time(with E) = ( F/S + (1-F) ) X Exec_time(w/o E) Speedup(with E) = 1 (1-F) + F/S FF 11/S

11 ECE 232 L4 perform.11 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Review: Summary of the Design Process Hierarchical Design to manage complexity Top Down vs. Bottom Up vs. Successive Refinement Importance of Design Representations: Block Diagrams Decomposition into Bit Slices Truth Tables, K-Maps Circuit Diagrams Other Descriptions: - state diagrams - timing diagrams - register transfer,... Optimization Criteria: Gate Count [Package Count] Logic Levels Fan-in/Fan-out Power Area Delay CostDesign time Pin Out top down bottom up

12 ECE 232 L4 perform.12 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Hardware Representation Languages Block Diagrams: FUs, Registers, & Dataflows Register Transfer Diagrams: Choice of busses to connect FUs, Regs Flowcharts State Diagrams Hardware Description Languages Verilog HDL VHDL Descriptions in these languages can be used as input to simulation systems synthesis systems Two different ways to describe sequencing & microoperations HW modules described like programs with i/o ports, internal state, & parallel execution of assignment statements "software breadboard" generate hw from high level description "To Design is to Represent"

13 ECE 232 L4 perform.13 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers VHDL (VHSIC Hardware Description Language ) °Goals : Support design, documentation, and simulation of hardware Digital system level to gate level “Technology Insertion” °Concepts: Design entity Time-based execution model. Design Entity = Hardware Component Interface = External Characteristics Architecture (Body ) = Internal Behavior or Structure

14 ECE 232 L4 perform.14 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers VHDL Example: nand Gate °Entity describes interface °Architecture give behavior (function) °y is a signal, not a variable it changes whenever the inputs change NAND process is in an infinite loop °Bit is 0, 1. Can also use STD_LOGIC (0,1, Z,X) ENTITY nand is PORT (a,b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y < = a NAND b; END behavioral; nand a b y names (given)

15 ECE 232 L4 perform.15 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Modeling Delays °Model temporal, as well as functional behavior, with delays in signal statements. Time is one difference from programming languages °Output y changes 1 ns after a or b changes °Delay statements not supported by synthesis tools (non-synthesizable) ENTITY nand is PORT (a,b: IN BIT; y: OUT BIT); END nand; ARCHITECTURE behavioral OF nand is BEGIN y < = a NAND b after 1 ns; END behavioral;

16 ECE 232 L4 perform.16 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Bit-vector Operators Can be converted to a 32 bit integer ENTITY nand32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END nand32; ARCHITECTURE behavioral OF nand32 is BEGIN y < = a NAND b; END behavioral; nand32 a [31:0] b [31:0] Y[31:0] STD_LOGIC_VECTOR

17 ECE 232 L4 perform.17 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Simple Operators LIBRARY ieee; USE ieee.std_logic_1164.all; ENTITY mux2to1 IS PORT (a, b, sel: IN STD_LOGIC; y: OUT STD_LOGIC; END mux2to1; ARCHITECTURE logic OF mux2to1 IS BEGIN WITH sel SELECT y <= a WHEN ‘0’ ; b WHEN OTHERS; END logic ; °Must use “others”, since sel={0,1,Z,X} (std_logic) a b y mux2to1 sel 0 1 You can also use other constructs: IF … THEN WHEN, etc.

18 ECE 232 L4 perform.18 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Arithmetic Operations ENTITY add32 is PORT (a,b: IN STD_LOGIC_VECTOR ( 31 downto 0); y: OUT STD_LOGIC_VECTOR ( 31 downto 0); END add32; ARCHITECTURE behavioral OF add32 is BEGIN y < = addum(a, b) ; END behavioral; °“addum” adds two n-bit vectors to produce an n+1 bit vector °Alternatively, you can declare a,b,y as INTEGERS, and use y <= a+b.

19 ECE 232 L4 perform.19 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers Control Constructs °Process fires whenever its “sensitivity list” changes °Evaluates the body sequentially °VHDL provide case statements as well ENTITY mux32 is PORT(A, B: In STD_LOGIC_VECTOR (31 downto 0); DOUT: STD_LOGIC_VECTOR (31 downto 0); SEL: in BIT); End mux32; ARCHITECTURE behavior Of mux32 Is begin mux32_process: process(A, B, SEL) begin if (SEL= 0) then DOUT <= A; else DOUT <= B; end if; end process; end behavior ;


Download ppt "ECE 232 L4 perform.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 4 Performance,"

Similar presentations


Ads by Google