Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2.

Similar presentations


Presentation on theme: "Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2."— Presentation transcript:

1 Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2

2 Computer Science and Engineering Copyright by Hesham El-Rewini Contents (Memory)  Memory Hierarchy  Cache Memory  Placement Policies n Direct Mapping n Fully Associative n Set Associative  Replacement Policies n FIFO, Random, Optimal, LRU, MRU  Cache Write Policies

3 Computer Science and Engineering Copyright by Hesham El-Rewini Memory Hierarchy CPU Registers Cache Main Memory Secondary Storage Latency Bandwidth Speed Cost per bit

4 Computer Science and Engineering Copyright by Hesham El-Rewini Sequence of events 1.Processor makes a request for X 2.X is sought in the cache 3.If it exists  hit (hit ratio h) 4.Otherwise  miss (miss ratio m = 1-h) 5.If miss  X is sought in main memory 6.It can be generalized to more levels

5 Computer Science and Engineering Copyright by Hesham El-Rewini Cache Memory  The idea is to keep the information expected to be used more frequently in the cache.  Locality of Reference n Temporal Locality n Spatial Locality  Placement Policies  Replacement Policies

6 Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies How to Map memory blocks (lines) to Cache block frames (line frames) Blocks (lines) Block Frames (Line Frames) Memory Cache

7 Computer Science and Engineering Copyright by Hesham El-Rewini Placement Policies n Direct Mapping n Fully Associative n Set Associative

8 Computer Science and Engineering Copyright by Hesham El-Rewini Direct Mapping  Simplest  A memory block is mapped to a fixed cache block frame (many to one mapping)  J = I mod N n J  Cache block frame number n I  Memory block number n N  number of cache block frames

9 Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Address size log 2 (M * B) TagBlock frameWord log 2 Blog 2 NRemaining bits log 2 M/N

10 Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks TagBlock frameWord 475

11 Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 128 129 255 0 1 127 3968 4095 0 1 2 127 MemoryTagcache 0131 5 bits

12 Computer Science and Engineering Copyright by Hesham El-Rewini Fully Associative  Most flexible  A memory block is mapped to any available cache block frame  (many to many mapping)  Associative Search

13 Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Address size log 2 (M * B) TagWord log 2 BRemaining bits log 2 M

14 Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks TagWord 412

15 Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 0 1 4094 4095 0 1 2 127 Memory Tagcache 12 bits

16 Computer Science and Engineering Copyright by Hesham El-Rewini Set Associative  Compromise between the other two  Cache  number of sets  Set  number of blocks  A memory block is mapped to any available cache block frame within a specific set  Associative Search within a set

17 Computer Science and Engineering Copyright by Hesham El-Rewini Address Format  Memory  M blocks  Block size  B words  Cache  N blocks  Number of sets S  N/num of blocks per set  Address size log 2 (M * B) log 2 B TagSetWord log 2 S Remaining bits log 2 M/S

18 Computer Science and Engineering Copyright by Hesham El-Rewini Example  Memory  4K blocks  Block size  16 words  Address size log 2 (4K * 16) = 16  Cache  128 blocks  Num of blocks per set = 4  Number of sets = 32 4 TagSetWord 57

19 Computer Science and Engineering Copyright by Hesham El-Rewini Example (cont.) 0 1 2 3 126 127 Set 0 Tag cache 7 bits Set 31 32 33 63 0 1 314095 Memory 01 127 124 125

20 Computer Science and Engineering Copyright by Hesham El-Rewini Comparison  Simplicity  Associative Search  Cache Utilization  Replacement

21 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise The instruction set for your architecture has 40-bit addresses, with each addressable item being a byte. You elect to design a four-way set-associative cache with each of the four blocks in a set containing 64 bytes. Assume that you have 256 sets in the cache. Show the Format of the address

22 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.)  Address size = 40  Block size  64 words  Num of blocks per set = 4  Number of sets = 256  Cache  256*4 blocks 6 TagSetWord 826

23 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) Consider the following sequence of addresses. (All are hex numbers) 0E1B01AA05 0E1B01AA07 0E1B2FE305 0E1B4FFD8F 0E1B01AA0E In your cache, what will be the tags in the sets(s) that contain these references at the end of the sequence? Assume that the cache is initially flushed (empty).

24 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B01AA05 0E1B011010101000000101 0E1B01AA07 0E1B011010101000000111 0E1B2FE305 0E1B2F1110001100000101

25 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (cont.) 0E1B4FFD8F 0E1B4F1111110110001111 0E1B01AA0E 0E1B011010101000001110

26 Computer Science and Engineering Copyright by Hesham El-Rewini Replacement Techniques  FIFO  LRU  MRU  Random  Optimal

27 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise Suppose that your cache can hold only three blocks and the block requests are as follows: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1 Show the contents of the cache if the replacement policy is a) LRU, b) FIFO, c) Optimal

28 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) FIFO 7 0 1 2 0 1 2 0 1 2 3 1 2 3 0 4 3 0 4 2 0 4 2 3 0 2 3 0 2 3 0 2 3 0 1 3 0 1 2 0 1 2 7 1 2 7 0 2 7 0 1 0 1 2 7 0 1 MRU

29 Computer Science and Engineering Copyright by Hesham El-Rewini Group Exercise (Cont.) OPT 7 0 1 LRU 7 0 1

30 Computer Science and Engineering Copyright by Hesham El-Rewini Cache Write Policies  Cache Hit n Write Through n Write Back  Cache Miss n Write-allocate n Write-no-allocate

31 Computer Science and Engineering Copyright by Hesham El-Rewini Read Policy -- Cache Miss  Missed block is brought to cache – required word forwarded immediately to the CPU  Missed block is entirely stored in the cache and the required word is then forwarded to the CPU

32 Computer Science and Engineering Copyright by Hesham El-Rewini Pentium IV two-level cache Cache Level 1 L1 Cache Level 2 L2 Main Memory Processor

33 Computer Science and Engineering Copyright by Hesham El-Rewini Cache L1 Cache organizationSet-Associative Block Size64 bytes Cache L1 size 8KB Number of blocks per setFour CPU AddressingByte addressable

34 Computer Science and Engineering Copyright by Hesham El-Rewini CPU and Memory Interface MAR MDR CPU 0 1 2 2 n - 1 b n lines b lines R / W Main Memory

35 Computer Science and Engineering Copyright by Hesham El-Rewini Pipelining

36 Computer Science and Engineering Copyright by Hesham El-Rewini Contents  Introduction  Linear Pipelines  Nonlinear pipelines

37 Computer Science and Engineering Copyright by Hesham El-Rewini Basic Idea  Assembly Line  Divide the execution of a task among a number of stages  A task is divided into subtasks to be executed in sequence  Performance improvement compared to sequential execution

38 Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Task 1 2 n Sub-tasks 1 2 n Pipeline Stream of Tasks

39 Computer Science and Engineering Copyright by Hesham El-Rewini 5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task 5 1 23 4 5 67 8 Time

40 Computer Science and Engineering Copyright by Hesham El-Rewini Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

41 Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline  Processing Stages are linearly connected  Perform fixed function  Synchronous Pipeline  Clocked latches between Stage i and Stage i+1  Equal delays in all stages  Asynchronous Pipeline (Handshaking)

42 Computer Science and Engineering Copyright by Hesham El-Rewini Latches S1 S2 S3 L1 L2 Equal delays  clock period Slowest stage determines delay

43 Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Table X X X X S1 S2 S3 S4 Time

44 Computer Science and Engineering Copyright by Hesham El-Rewini 5 tasks on 4 stages XXXXX XXXXX XXXXX XXXXX S1 S2 S3 S4 Time

45 Computer Science and Engineering Copyright by Hesham El-Rewini Non Linear Pipelines  Variable functions  Feed-Forward  Feedback

46 Computer Science and Engineering Copyright by Hesham El-Rewini 3 stages & 2 functions S1 S2 S3 Y X

47 Computer Science and Engineering Copyright by Hesham El-Rewini Reservation Tables for X & Y XXX XX XXX YY Y YYY S1 S2 S3 S1 S2 S3

48 Computer Science and Engineering Copyright by Hesham El-Rewini Linear Instruction Pipelines Assume the following instruction execution phases: n Fetch (F) n Decode (D) n Operand Fetch (O) n Execute (E) n Write results (W)

49 Computer Science and Engineering Copyright by Hesham El-Rewini Pipeline Instruction Execution I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 I1I1I1I1 I2I2I2I2 I3I3I3I3 F D E W O

50 Computer Science and Engineering Copyright by Hesham El-Rewini Dependencies nData Dependency (Operand is not ready yet) nInstruction Dependency (Branching) Will that Cause a Problem?

51 Computer Science and Engineering Copyright by Hesham El-Rewini Data Dependency I 1 -- Add R1, R2, R3 I 2 -- Sub R4, R1, R5 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O 1 2 3 45 6

52 Computer Science and Engineering Copyright by Hesham El-Rewini Solutions  STALL  Forwarding  Write and Read in one cycle  ….

53 Computer Science and Engineering Copyright by Hesham El-Rewini Instruction Dependency I 1 – Branch o I 2 – 1 2 3 45 6 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 I1I1I1I1 I2I2I2I2 F D E W O 1 2 3 45 6

54 Computer Science and Engineering Copyright by Hesham El-Rewini Solutions  STALL  Predict Branch taken  Predict Branch not taken  ….

55 Computer Science and Engineering Copyright by Hesham El-Rewini Floating Point Multiplication  Inputs (Mantissa 1, Exponenet 1 ), (Mantissa 2, Exponent 2 )  Add the two exponents  Exponent-out  Multiple the 2 mantissas  Normalize mantissa and adjust exponent  Round the product mantissa to a single length mantissa. You may adjust the exponent

56 Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point multiplication Add Exponents Multiply Mantissa Normalize Round Partial Products Accumulator Add Exponents Normalize Round Re normalize

57 Computer Science and Engineering Copyright by Hesham El-Rewini Linear Pipeline for floating-point Addition Partial Shift Add Mantissa Subtract Exponents Find Leading 1 Round Re normalize Partial Shift

58 Computer Science and Engineering Copyright by Hesham El-Rewini Combined Adder and Multiplier Partial Shift Add Mantissa Exponents Subtract / ADD Find Leading 1 Round Re normalize Partial Shift Partial Products C A B ED F G H


Download ppt "Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January 24 2008 Session 2."

Similar presentations


Ads by Google