Presentation is loading. Please wait.

Presentation is loading. Please wait.

August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.

Similar presentations

Presentation on theme: "August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache."— Presentation transcript:

1 August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache

2 Motivation Cache Background System Overview Methodology Progress Future Work Outline 2

3 Goal Create a configurable shared Last Level Cache for the use in the PolyBlaze system Motivation 3

4 Introduction 4 Zia Eric Kevan

5 In modern systems, processors out perform main memory, creating a bottleneck This problem is only exacerbated as more cores contend for the memory This problem is reduced if each processor maintains a local copy of the data Cache Background 5

6 A cache is a small amount of memory on the same die as the processor The cache is capable of providing a lower latency and a higher throughput than the main memory Systems may include multiple cache levels The smallest and most local cache is the L1 cache. The next level cache is the L2, etc Caches 6

7 Shared Last Level Cache Acts as a common location for data Can be used to maintain cache coherency between processors Does not exist in current MicroBlaze system We will design our own shared L2 Cache to maintain cache coherency 7

8 Cache Speeds In typical systems: An L1 cache is very fast (1 or 2 cycles ) An L2 cache is slower (10’s of cycles) Main memory is very slow (100’s of cycles) 8

9 Cache Speeds In our system we expect : The L1 cache to be very fast (1 or 2 cycles ) The L2 cache to be about (10 of cycles) Main memory to be faster (10’s of cycles) In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory 9

10 Direct Mapped Cache 10 Caches store Data, a Valid Bit and a unique identifier called a tag

11 Tags 11 As an example imagine a system with the following : 32-bit Address Bus, and 32-bit Word Size 64-KByte Cache with 32-Byte Line Size Therefore we have 2047 (2 11 ) Lines

12 Set-Associated Cache 12 A cache with n possible entries for each address is called an n-way set associated cache 4-Way Set Associated Cache

13 Replacement Policies 13 When an entry needs to be evicted from the cache we need to decide which Way it is evicted from. To do this we use a replacement policy LRU Clock FIFO

14 LRU 14 Keep track of when each entry is accessed Always evict the Least Recently Used Implemented using a stack MRU LRU Access 4Access 2

15 Clock 15 For each Way we store a Reference Bit Also store a pointed to the oldest entry (Hand) Starting with the Hand we test and clear each R Bit until we reach one that is 0 0123 0111000

16 System Overview 16

17 PolyBlaze L2 Cache 17 1-16 Way Set Associated Cache LRU or Clock Replacement Policy 32 or 64 Byte Line Width 64 Bit Memory Interface Write Back Cache

18 L2 Cache 18

19 Reuse Policy 19 Determines which Way is evicted on Cache Miss Currently uses LRU Policy

20 Tag Bank 20 Contains Tags and Valid Bits Stored on FPGA using BRAMs Instantiate one bank for each Way

21 Control Unit 21 Finite State Machine for L2 Cache Pipelining If a request is outstanding from NPI we can service other requests in SRAM

22 Data Bank 22 Control interface for off-chip SRAM

23 SRAM 23 32-bit ZBT synchronous SRAM 1 MB

24 Methodology 24 Break L2 cache into three parts and test separately then combine and test system SRAM Controller NPI Interface L2 Core Complete L2 Cache

25 SRAM Controller 25 Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL Write a program that will write and read data to all addresses in the SRAM Write all 1’s Write all 0’s Alternate writing all 1’s and all 0’s Write Random data √ √ √ √

26 NPI Interface 26 Uses a custom FSL width, so we cannot test using MicroBlaze Create a hardware test bench to read and write data to all addresses Write all 1’s Write all 0’s Alternate writing all 1’s and all 0’s Write Random data X X X X

27 L2 Core 27 Simulate the core of the L2 cache in iSim Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface The test bench will write to each line multiple times to create a large number of cache misses X X X

28 Complete L2 Cache 28 Combine the L2 Cache with the rest of PolyBlaze Write test programs to read and write to various regions of memory X X

29 Current Progress 29 SRAM Controller and Data Bank: Designed and Tested NPI Interface: Testing and Debugging in Progress L2 Core: Testing and Debugging in Progress

30 Future Work 30 Add Clock Replacement Policy to L2 Cache Add a Write Back Buffer to L2 Cache Migrate System from XUPV5 to a BEE3 so we can create a system with more cores Modify the L2 Cache into a NUMA system Add Custom Hardware Accelerators to PolyBlaze

31 Questions? 31

Download ppt "August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache."

Similar presentations

Ads by Google