Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.

Similar presentations


Presentation on theme: "Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore."— Presentation transcript:

1 Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore

2 Trends.. General Purpose Computers - Superscalar: O-o-o, deeper pipeline, hardware complex: automated process - VLIW(DSP): Needs compiler support. Unsuccessful for general purpose processors. Software complex: difficult design and debug - Itanium? Relaxed issue rules -> EPIC O-o-o memory support a.k.a superscalar

3 Continued.. Compilers: increasingly sophisticated to handle some of the hand coded optimizations in DSP GPP : ISA support for SIMD instructions. Specifically for multimedia apps. Supposed to be the most important benchmarks to determine performance

4 Our Contention: VLIWs : special high performance high data streaming applications. DSPs : reduce VLIW, more like GPs If Data parallel : VLIW If control intensive : superscalar

5 Task Specific Processing Eg. FFT, OFDM – DSP Video Compression based on MPEG: Motion compensated prediction DCT on prediction error Motion estimation- Block Matching, video coding applications High throughput data : VLIW (SIMD) Bitstream Processing – RISC core - Scalar Operation flow - Concurrent data and instruction access - Control tasks

6 Expected Results VLIW : Video compression better RISC: - Audio Video stream generation better in RISC -Run-Length coding of DCT coefficients -Variable length coding of coded DCT coefficients (using Huffman table)

7 MediaBench Benchmarks Popular tool for evaluating multimedia systems Uses HLL to stress compilation technology Ran simulation on Simplescalar…some default parameters changed to resemble embedded processors. Also used sim-profile Need to infer more from the data

8 Some initial results.. Branch prediction rates lower than that for SPEC2k. Why? Cache miss rates are lower than SPEC Constants – cachable Data streams – uncachable. Can’t predict Inference Memory traffic is lower Can bus width be reduced for multimedia apps?

9 Power Analysis of loop unrolling Execution time levels out, because of limited hardware, instruction window size.. BUT, with increase in unrolling, branch prediction reduces (larger code, more misprediction penalty). RUU gets filled up => fetch unit stalled Average power decreases.. Expect a different design point for a power-aware loop unrolling

10 To Do List Optimize an existing DSP application. Examine different loop optimization techniques Profile a multimedia application. Identify bottlenecks. Perform optimizations like loop unfolding to increase performance. Expect superscalar to perform better for streaming apps, VLIW for data intensive like compression etc.


Download ppt "Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore."

Similar presentations


Ads by Google