Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel.

Similar presentations


Presentation on theme: "Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel."— Presentation transcript:

1 Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel Department of Computer Science University of California Riverside, CA 92521 {rlysecky, vahid, givargis, rrpatel}@cs.ucr.edu This work was supported in part by the NSF and a DAC scholarship.

2 Roman LyseckyUniversity of California, Riverside2 Introduction Core Library MIPS MEM Cache DSP DMA Core XCore Y Core-based designs are becoming common –available as both soft and hard Problem - How can interfacing be simplified to ease integration?

3 Roman LyseckyUniversity of California, Riverside3 Introduction One Solution - One standard on-chip bus –All cores have same interface –Appears to be unlikely (VSIA) Another Solution - Divide core into a bus wrapper and internal parts –Rowson and Sangiovanni-Vincentelli ‘97 - Interface-Based Design –VSIA developing standard for interface between wrapper and internals Far simpler than standard on-chip bus –Refer to bus wrapper as an interface module(IM)

4 Roman LyseckyUniversity of California, Riverside4 Introduction Problem - Using an Interface Module can result in extra cycles for reads Pre-fetching can reduce or eliminate extra cycles Outline –Interfacing Options –Classification of registers and common registers occurrences –Architecture of IM and pre-fetch heuristics –Experiments –Conclusions

5 Roman LyseckyUniversity of California, Riverside5 No Interface Module(IM) Interface logic is designed as part of the core’s internal logic Pros –Small Size –High Performance (No Overhead) Cons –May be hard to integrate with different busses

6 Roman LyseckyUniversity of California, Riverside6 Separating a Core into IM & Internals Interface module is separate from core internal –Standard bus between IM and internals Pros –Easily integrate with different busses –Any changes are restricted to the IM Cons –May incur performance overhead due to the interface module –Possible increases in size and power

7 Roman LyseckyUniversity of California, Riverside7 Proposed Solution - Pre-fetching in IM Pre-fetching –Analogous to caching, store local copies of registers inside the interface module –Enable quick response time –Eliminates extra cycles for register reads –Transparent to system bus and core internals Pros –Easily integrate with different busses –No performance overhead Cons –Possible increases in size and power

8 Roman LyseckyUniversity of California, Riverside8 Classification of Core Registers Different registers need different pre-fetch scheme Need classification for registers –Update Type –Access Type –Notification Type –Structure Type

9 Roman LyseckyUniversity of California, Riverside9 Common Register Types We identified three common register combinations found in cores –Configuration, Task, and Input-buffered registers –Implemented cores representative of each of these three common register combinations –Provide classification for registers in each of the cores

10 Roman LyseckyUniversity of California, Riverside10 Common Register Types Core1 - Configuration Registers –Example: Configuration registers in a UART or DMA Controller Configuration Register(D)

11 Roman LyseckyUniversity of California, Riverside11 Common Register Types Core2 - Task Registers –Example: JPEG or MPEG CODEC, or DES Encryption Data Input Register(DI) Data Output Register(DO) Status Register(S)

12 Roman LyseckyUniversity of California, Riverside12 Common Register Types Core3 - Input-buffered Registers –Example: FIFO or UART Status Register(S) Data Register(D)

13 Roman LyseckyUniversity of California, Riverside13 Architecture of IM pre-fetch registers Pre-fetch Unit - Implements the pre- fetching heuristic Goal: maximize the number of hits Controller - Interfaces to system bus

14 Roman LyseckyUniversity of California, Riverside14 Pre-fetch Heuristic for Core2 Core2 - Task Register –After system writes to register DI Read S into pre-fetch register S’ When S indicates completion, read DO from core into pre-fetch register DO’ –Repeat this process Similar heuristics were developed for Core1 and Core3

15 Roman LyseckyUniversity of California, Riverside15 Experiments - Area(Gates) Note: To better evaluate the effects of IM’s, our cores were kept simple, thus resulting in a smaller than normal size. Average increase of IM w/o PF over no IM of 1.4K gates Average increase of IM w/ PF over IM w/o PF of 1.3K gates

16 Roman LyseckyUniversity of California, Riverside16 Experiments - Performance(ns)

17 Roman LyseckyUniversity of California, Riverside17 Experiments - Energy(nJ)

18 Roman LyseckyUniversity of California, Riverside18 Digital Camera Peripheral Read Access(cycles) 12% of execution time for peripheral reads 50% decrease in peripheral read access 25% decrease in overall peripheral access 3.2% improvement in overall system performance

19 Roman LyseckyUniversity of California, Riverside19 Conclusion Separating interface from internals eases core integration but may yield increase in read cycles Pre-fetching eliminated the performance degradation in common cases –Increases in size and power were acceptable –Transparent to system bus and core internals –Pre-fetching thus improves the marketability of cores


Download ppt "Roman LyseckyUniversity of California, Riverside1 Pre-fetching for Improved Core Interfacing Roman Lysecky, Frank Vahid, Tony Givargis, & Rilesh Patel."

Similar presentations


Ads by Google