Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 March 2010Summary, EE800 EE800 Circuit Elements in Digital Computations (Review) Professor S. Ko Electrical and Computer Engineering University of Saskatchewan.

Similar presentations


Presentation on theme: "1 March 2010Summary, EE800 EE800 Circuit Elements in Digital Computations (Review) Professor S. Ko Electrical and Computer Engineering University of Saskatchewan."— Presentation transcript:

1 1 March 2010Summary, EE800 EE800 Circuit Elements in Digital Computations (Review) Professor S. Ko Electrical and Computer Engineering University of Saskatchewan Spring 2010

2 2 March 2010Summary, EE800 To begin with Combinational logic vs. sequential logic Moore machine (current state only) vs. Mealy machine (current state + input) Latch vs. Flip-Flop

3 3 March 2010Summary, EE800 Performance and Cost Amdahl’s Law: Performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Ex. New CPU is 10 times faster than original processor. CPU busy with 40 % of the time, waiting for 60 % of the time. Sol. Fraction enhanced = 0.4, Speedup enhanced = 10 Speedup overall = 1 / {0.6 + (0.4/10)} = 1 / 0.64 = 1.56 Speedup overall = ExTime old ExTime new = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced

4 4 March 2010Summary, EE800 Performance and Cost CPU performance is equally dependent on 3 characteristics: clock cycle (or rate), clock cycles per instruction, and instruction count. CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle

5 5 March 2010Summary, EE800 More.. RISC vs. CISC Gustafson’s law Amdahl’s law Big Endian vs. Little Endian Moore’s law

6 6 March 2010Summary, EE800 AND/OR EXPRESSION & REALIZATION 111 11 1 11 00011110 00 01 11 10 xy zw x’ y z w z’ w’ y z’ w’ y’ z w’ x z’ w f

7 7 March 2010Summary, EE800 111 11 1 11 00011110 00 01 11 10 xy zw AND/XOR EXPRESSION & REALIZATION y’ w’ x’ y’ z’ x’ y w z’ f

8 8 March 2010Summary, EE800 More.. 5(6)-variable Kmap Q-M Algorithm Two-level minimization Multi-level minimization Technology mapping (Shannon’s theorem, Davio theorem)

9 9 March 2010Summary, EE800 Simple Adders Binary half-adder (HA) and full-adder (FA). S = x  y  c in C out = yc in + xy + xc in

10 10 March 2010Summary, EE800 Ripple-Carry Adder: Slow But Simple Ripple-carry binary adder with 32-bit inputs and output. Critical path

11 11 March 2010Summary, EE800 Carry Propagation 0111 + 0001 1000

12 12 March 2010Summary, EE800 Carry Lookahead Adder Multiplication: Booth algorithm Division: restoring, non-restoring, SRT.. Comparator Fixed point vs. Floating point etc.

13 13 EE800, U of S13 Objectives IEEE 754-2008 standard for Decimal Floating- Point (DFP) arithmetic (Lecture 1) –DFP numbers formats –DFP number encoding –DFP arithmetic operations –DFP rounding modes –DFP exception handling Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) –DFP adder/substracter –DFP multiplier –DFP divider –DFP transcendental function computation

14 14 DFP Add/Sub Data flow March 2010Summary, EE800

15 15 Architecture of DFP Multiplier March 2010Summary, EE800

16 16 DFP Division Data Flow Unpacking Decimal Floating-Point Number Check for zeros and infinity Subtract exponents Divide Mantissa Normalize and detect overflow and underflow Perform rounding Replace sign Packing March 2010Summary, EE800

17 17 Architecture: Decimal Log Converter

18 18 Architecture: Dec. Antilog Converter

19 19 March 2010Summary, EE800 Memory Hierarchy Principle of locality + smaller hardware is faster + make the common case faster + CPU-memory performance gap 4 memory hierarchy questions –Where can a block be placed in the upper level? –How is a block found if it is in the upper level? –Which block should be replaced on a miss? –What happens on a write? Reducing miss rate: –Compulsory: the first access to a block cannot be in the cache –Capacity: if the cache cannot contain all the blocks needed during execution of a program –Conflict: in set-associative or direct-mapped because a block may be discarded and later retrieved if too many blocks map to its set Performance = f (hit time, miss rate, miss penalty) –danger of concentrating on just one when evaluating performance

20 20 March 2010Summary, EE800 TechniqueMRMPHTComplexity Larger Block Size+–0 Higher Associativity+–1 Victim Caches++2 Pseudo-Associative Caches +2 HW Prefetching of Instr/Data+2 Compiler Controlled Prefetching+3 Compiler Reduce Misses+0 Priority to Read Misses+1 Subblock Placement +1 Early Restart & Critical Word 1st +2 Non-Blocking Caches+3 2 nd Level Caches+2 Small & Simple Caches–+0 Avoiding Address Translation+2 Pipelining Writes+1 miss rate hit time miss penalty Memory Hierarchy Cache Optimization Summary

21 21 March 2010Summary, EE800 Multiprocessors An Example Snoopy Protocol Invalidation protocol, write-back cache Each block of memory is in one state: –Clean in all caches and up-to-date in memory (Shared) –or Dirty in exactly one cache (Exclusive) –or Not in any caches Each cache block is in one state (track these): –Shared : block can be read –or Exclusive : cache has only copy, its writeable, and dirty –or Invalid : block contains no data Read misses: cause all caches to snoop bus Writes to clean line are treated as misses

22 22 March 2010Summary, EE800 Multiprocessors A Snoopy Cache Coherence Protocol Finite-state control mechanism for a bus-based snoopy cache coherence protocol with write-back caches P C P C P C P C Bus Memory

23 23 March 2010Summary, EE800 Multiprocessors Directories to Guide Data Access Distributed shared-memory multiprocessor with a cache, directory, and memory module associated with each processor

24 24 March 2010Summary, EE800 Multiprocessors Directory-Based Cache Coherence States and transitions for a directory entry in a directory-based cache coherence protocol (c is the requesting cache)

25 25 March 2010Summary, EE800 Multiprocessors Snooping vs. Directory Snooping –Useful for smaller systems –Send all requests for data to all processors »Processors snoop to see if they have a copy and respond accordingly »Requires broadcast, since caching information is at processors –Works well with bus (natural broadcast medium) »But, scaling limited by cache miss & write traffic saturating bus –Dominates for small scale machines (most of the market) Directory-based schemes –Scalable multiprocessor solution –Keep track of what is being shared in a directory –Distributed memory → distributed directory (avoids bottlenecks) –Send point-to-point requests to processors


Download ppt "1 March 2010Summary, EE800 EE800 Circuit Elements in Digital Computations (Review) Professor S. Ko Electrical and Computer Engineering University of Saskatchewan."

Similar presentations


Ads by Google