Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.

Trace Caches Michele Co CS 451

Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider dispatch and issue paths  Execution units designed for high parallelism –Many functional units –Large issue buffers –Many physical registers  Fetch bandwidth becomes performance bottleneck

Fetch Performance Limiters  Cache hit rate  Branch prediction accuracy  Branch throughput  Need to predict more than one branch per cycle  Non-contiguous instruction alignment  Fetch unit latency

Problems with Traditional Instruction Cache  Contain instructions in compiled order  Works well for sequential code with little branching, or code with large basic blocks

Suggested Solutions  Multiple branch target address prediction  Branch address cache (1993, Yeh, Marr, Patt) –Provides quick access to multiple target addresses –Disadvantages Complex alignment network, additional latency

Suggested Solutions (cont’d)  Collapsing buffer  Multiple accesses to btb (1995, Conte, Mills, Menezes, Patel) –Allows fetching non- adjacent cache lines –Disadvantages Bank conflicts Poor scalability for interblock branches Significant logic added before and after instruction cache  Fill unit  Caches RISC-like instructions derived from CISC instruction stream  (1988, Melvin, Shebanow, Patt)

Problems with Prior Approaches  Need to generate pointers for all noncontiguous instruction blocks BEFORE fetching can begin  Extra stages, additional latency  Complex alignment network necessary  Multiple simultaneous access to instruction cache  Multiporting is expensive  Sequencing  Additional stages, additional latency

Potential Solution – Trace Cache  Rotenberg, Bennett, Smith (1996)  Advantages  Caches dynamic instruction sequences –Fetches past multiple branches  No additional fetch unit latency  Disadvantages  Redundant instruction storage –Between trace cache and instruction cache –Within trace cache

Trace Cache Details  Trace  Sequence of instructions potentially containing branches and their targets  Terminate on branches with indeterminate number of targets –Returns, indirect jumps, traps  Trace identifier  Start address + branch outcomes  Trace cache line  Valid bit  Tag  Branch flags  Branch mask  Trace fall-through address  Trace target address

Next Trace Prediction (NTP)  History register  Correlating table  Complex history indexing  Secondary Table  Indexed by most recently committed trace ID  Index generating function

NTP Index Generation

Return History Stack

Trace Cache vs. Existing Techniques

Trace Cache Optimizations  Performance  Partial matching [Friendly, Patel, Patt (1997)]  Inactive issue [Friendly, Patel, Patt (1997)]  Trace preconstruction [Jacobson, Smith (2000)]  Power  Sequential access trace cache [Hu, et al., (2002)]  Dynamic direction prediction based trace cache [Hu, et al., (2003)]  Micro-operation cache [Solomon, et al., 2003]

Trace Processors  Trace Processor Architecture  Processing elements (PE) –Trace-sized instruction buffer –Multiple dedicated functional units –Local register file –Copy of global register file  Use hierarchy to distribute execution resources  Addresses superscalar processor issues  Complexity –Simplified multiple branch prediction (next trace prediction) –Elimination of local dependence checking (local register file) –Decentralized instruction issue and result bypass logic  Architectural limitations –Reduced bandwidth pressure on global register file (local register files)

Trace Processor

Trace Cache Variations  Block-based trace cache (BBTC)  Black, Rychlik, Shen (1999)  Less storage capacity needed

Trace Table: BBTC Trace Prediction

Block Cache

Rename Table

BBTC Optimization  Completion time multiple branch prediction (Rakvic, et al., 2000)  Improvement over trace table predictions

Tree-based Multiple Branch Prediction

Tree-PHT

Tree-PHT Update

Trace Cache Variations (cont’d)  Software trace cache  Ramirez, Larriba-Pey, Navarro, Torrellas (1999)  Profile-directed code reordering to maximize sequentiality –Convert taken branches to not-taken –Move unused basic blocks out of execution path –Inline frequent basic blocks –Map most popular traces to reserved area of i-cache

Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.

Similar presentations

Presentation on theme: "Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider.

Similar presentations

Presentation on theme: "Page 1 Trace Caches Michele Co CS 451. Page 2 Motivation  High performance superscalar processors  High instruction throughput  Exploit ILP –Wider."— Presentation transcript:

Similar presentations

About project

Feedback