Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.

Similar presentations


Presentation on theme: "Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University."— Presentation transcript:

1 Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University Houston, Texas Sarita Adve Dept. of Computer Science University of Illinois at Urbana Champaign Urbana, Illinois Norman P. Jouppi Western Research Laboratory Compaq Computer Corporation Palo Alto, California

2 Reconfigurable Caches -2- Partha Ranganathan Motivation (1 of 2) Different workloads on general-purpose processors Scientific/engineering, databases, media processing, … Widely different characteristics Challenge for future general-purpose systems Use most transistors effectively for all workloads

3 Reconfigurable Caches -3- Partha Ranganathan Motivation (2 of 2) Challenge for future general-purpose systems Use most transistors effectively for all workloads 50% to 80% of processor transistors devoted to cache Very effective for engineering and database workloads BUT large caches often ineffective for media workloads Streaming data and large working sets [ISCA 1999] Can we reuse cache transistors for other useful work?

4 Reconfigurable Caches -4- Partha Ranganathan Reconfigurable Caches Flexibility to reuse cache SRAM for other activities Several applications possible Simple organization and design changes Small impact on cache access time Contributions

5 Reconfigurable Caches -5- Partha Ranganathan Reconfigurable Caches Flexibility to reuse cache SRAM for other activities Several applications possible Simple organization and design changes Small impact on cache access time Application for media processing e.g., instruction reuse – reuse memory for computation 1.04X to 1.20X performance improvement Contributions

6 Reconfigurable Caches -6- Partha Ranganathan Outline for Talk Motivation Reconfigurable caches Key idea Organization Implementation and timing analysis Application for media processing Summary and future work

7 Reconfigurable Caches -7- Partha Ranganathan Reconfigurable Caches: Key Idea Dynamically divide SRAM into multiple partitions Use partitions for other useful activities On-chip SRAM Cache Partition A - cache Partition B - lookup Current use of on-chip SRAM Proposed use of on-chip SRAM  Cache SRAM useful for both conventional and media workloads Key idea: reuse cache transistors!

8 Reconfigurable Caches -8- Partha Ranganathan Reconfigurable Cache Uses Number of different uses for reconfigurable caches Optimizations using lookup tables to store patterns Instruction reuse, value prediction, address prediction, … Hardware and software prefetching Caching of prefetched lines Software-controlled memory QoS guarantees, scratch memory area  Cache SRAM useful for both conventional and media workloads

9 Reconfigurable Caches -9- Partha Ranganathan Key Challenges How to partition SRAM? How to address the different partitions as they change? Minimize impact on cache access (clock cycle) time On-chip SRAM Cache Partition A - cache Partition B - lookup Current use of on-chip SRAM Proposed use of on-chip SRAM  Associativity-based partitioning

10 Reconfigurable Caches -10- Partha Ranganathan Conventional Cache Organization TagIndex StateTagData Block Compare Select Data out Hit/miss Address Way 1 Way 2

11 Reconfigurable Caches -11- Partha Ranganathan Associativity-Based Partitioning Address TagIndex StateTagData Block Compare Select Data out Hit/miss Way 1 Way 2 Partition 1 Partition 2TagIndexBlock Choose Partition at granularity of “ways” Multiple data paths and additional state/logic

12 Reconfigurable Caches -12- Partha Ranganathan Reconfigurable Cache Organization Associativity-based partitioning Simple - small changes to conventional caches But # and granularity of partitions depends on associativity Alternate approach: Overlapped-wide-tag partitioning More general, but slightly more complex Details in paper

13 Reconfigurable Caches -13- Partha Ranganathan Other Organizational Choices (1 of 2) Ensuring consistency of data at repartitioning Cache scrubbing: flush data at repartitioning intervals Lazy transitioning: Augment state with partition information Addressing of partitions - software (ISA) vs. hardware On-chip SRAM Cache Partition A Partition B Current use of on-chip SRAM Proposed use of on-chip SRAM

14 Reconfigurable Caches -14- Partha Ranganathan Other Organizational Choices (2 of 2) Method of partitioning - hardware vs. software control Frequency of partitioning - frequent vs. infrequent Level of partitioning - L1, L2, or lower levels Tradeoffs based on application requirements On-chip SRAM Cache Partition A Partition B Current use of on-chip SRAM Proposed use of on-chip SRAM

15 Reconfigurable Caches -15- Partha Ranganathan Outline for Talk Motivation Reconfigurable caches Key idea Organization Implementation and timing analysis Application for media processing Summary and future work

16 Reconfigurable Caches -16- Partha Ranganathan Conventional Cache Implementation Tag and data arrays split into multiple sub-arrays to reduce/balance length of word lines and bit lines VALID OUTPUT TAG ARRAY DATA ARRAY ADDRESS DATA WORD LINES BIT LINES COLUMN MUXES SENSE AMPS COMPARATORS OUTPUT DRIVER MUX DRIVERS OUTPUT DRIVERS DECODERS

17 Reconfigurable Caches -17- Partha Ranganathan Associate sub-arrays with partitions Constraint on minimum number of sub-arrays Additional multiplexors, drivers, and wiring Changes for Reconfigurable Cache ADDRESS VALID OUTPUT TAG ARRAY DATA ARRAY DATA WORD LINES BIT LINES COLUMN MUXES SENSE AMPS COMPARATORS OUTPUT DRIVER MUX DRIVERS OUTPUT DRIVERS DECODERS [1:NP]

18 Reconfigurable Caches -18- Partha Ranganathan Impact on Cache Access Time Sub-array-based partitioning Multiple simultaneous accesses to SRAM array No additional data ports Timing analysis methodology CACTI analytical timing model for cache time (Compaq WRL) Extended to model reconfigurable caches Experiments varying cache sizes, partitions, technology, …

19 Reconfigurable Caches -19- Partha Ranganathan Impact on Cache Access Time Cache access time Comparable to base (within 1-4%) for few partitions (2) Higher for more partitions, especially with small caches But still within 6% for large caches Impact on clock frequency likely to be even lower

20 Reconfigurable Caches -20- Partha Ranganathan Outline for Talk Motivation Reconfigurable caches Application for media processing Instruction reuse with media processing Simulation results Summary and future work

21 Reconfigurable Caches -21- Partha Ranganathan Instruction reuse/memoization [Sodani and Sohi, ISCA 1997] Exploits value redundancy in programs Store instruction operands and result in reuse buffer If later instruction and operands match in reuse buffer, skip execution; read answer from reuse buffer Application for Media Processing cache partition cache partition cache partition Few changes for implementation with reconfigurable caches

22 Reconfigurable Caches -22- Partha Ranganathan Simulation Methodology Detailed simulation using RSIM (Rice) User-level execution-driven simulator Media processing benchmarks JPEG image encoding/decoding MPEG video encoding/decoding GSM speech decoding and MPEG audio decoding Speech recognition and synthesis

23 Reconfigurable Caches -23- Partha Ranganathan System Parameters Modern general-purpose processor with ILP+media extensions 1 GHz, 8-way issue, OOO, VIS, prefetching Multi-level memory hierarchy 128KB 4-way associative 2-cycle L1 data cache 1M 4-way associative 20-cycle L2 cache Simple reconfigurable cache organization 2 partitions at L1 data cache 64 KB data cache, 64KB instruction reuse buffer Partitioning at start of application in software

24 Reconfigurable Caches -24- Partha Ranganathan Impact of Instruction Reuse Performance improvements for all applications (1.04X to 1.20X) Use memory to reduce compute bottleneck Greater potential with aggressive design [details in paper] JPEG decodeMPEG decodeSpeech synthesis 100 84 100 89 100 92

25 Reconfigurable Caches -25- Partha Ranganathan Goal: Use cache transistors effectively for all workloads Reconfigurable Caches: Flexibility to reuse cache SRAM Simple organization and design changes Small impact on cache access time Several applications possible Instruction reuse - reuse memory for computation 1.04X to 1.20X performance improvement More aggressive reconfiguration currently under investigation Summary

26 Reconfigurable Caches -26- Partha Ranganathan More information available at http://www.ece.rice.edu/~parthas parthas@rice.edu


Download ppt "Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University."

Similar presentations


Ads by Google