Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-path Synthesis of VLIW Video Signal Processor Zhao Wu and Wayne Wolf Dept. of Electrical Engineering, Princeton University.

Similar presentations


Presentation on theme: "Data-path Synthesis of VLIW Video Signal Processor Zhao Wu and Wayne Wolf Dept. of Electrical Engineering, Princeton University."— Presentation transcript:

1 Data-path Synthesis of VLIW Video Signal Processor Zhao Wu and Wayne Wolf Dept. of Electrical Engineering, Princeton University

2 Outline IntroductionIntroduction Architectural paradigmArchitectural paradigm Trace-driven simulationTrace-driven simulation Performance estimationPerformance estimation ConclusionsConclusions

3 Introduction Why programmable VSP?Why programmable VSP? –intense computation –complex and diverse video applications –increased development cost –time-to-market pressure Why VLIW?Why VLIW? –Easy to implement in hardware –high speed –high degree of ILP available in video applications

4 Architecture Paradigm

5 Architectural Parameters Register fileRegister file –number of registers Functional unitFunctional unit –number and type of functional units InterconnectInterconnect –number of clusters –interconnect mechanism

6 Impact on MPEG-2 Encoder

7 Trace-Driven Scheduling Binary program prog Disassembled program prog.asm Run pixie -idtrace Run dis -h Dynamic trace Scheduler Result & statistics Resource description Instrumented program prog.pixie

8 Block Diagram of the Scheduler Dependency analyzer disassembled program Register manager Result & statistics Resource description Scheduling record Assembly code parser Memory manager Funct unit manager Register scoreboard Memory scoreboard Reservation station VLIW scheduler Resource manager Program trace

9 Features of the Scheduler (Relatively) fast(Relatively) fast –Instrumentation rather than interpretation –linear to trace length Moderate memory requirementModerate memory requirement –Pipelining saves storage Large scheduling windowLarge scheduling window –up to 10 9 instructions –simulates both a VLIW compiler & a VLIW processor Realistic modelRealistic model –limited resources

10 Performance Estimation Why do we need performance estimation?Why do we need performance estimation? –trace-driven simulation too slow (trace too long) –design space too big How do we estimate?How do we estimate? –start from full-length trace simulation results –increase resource: lower bound on cycle count –decrease resource: upper bound on cycle count target design bigger design smaller design

11 IPC Histogram of ALU Average IPC ALU = 11.47 Average IPC ALU = 13.24

12 Increase and Decrease Resources

13 Decrease resource Split cycles that issue more FU ops and retimeSplit cycles that issue more FU ops and retime –16  8+8, 15  8+7, 14  8+6, 13  8+5, 12  8+4, … Why upper bound of cycle countWhy upper bound of cycle count –7, 6, 5, 4, … could be combined with 1, 2, 3, 4, …

14 Increase resource T new = T old - T 8T new = T old - T 8 –16  8+8, 15  8+7, 14  8+6, 13  8+5, 12  8+4, … Why lower bound of cycle countWhy lower bound of cycle count –sometimes can’t merge (e.g. increase from 8 to 12) –sometimes no parallelism This cycle removed

15 Change More Than One Resource Have to take into account resource inter-correlationHave to take into account resource inter-correlation –{}: # of cycles when at least one - instruction depends on -instructions –{ dep res1,res2,n }: # of cycles when at least one res1 - instruction depends on n res2 -instructions Combine several bounds into one semi-boundCombine several bounds into one semi-bound Increase resource (m>n):Increase resource (m>n): Decrease resource (m<n):Decrease resource (m<n):

16 Results

17 Conclusions Trace-driven simulationTrace-driven simulation –quantitative evaluation of an architecture –too slow to be applied for every possible design Performance estimationPerformance estimation –based on simulated results –automated procedure –accurate enough


Download ppt "Data-path Synthesis of VLIW Video Signal Processor Zhao Wu and Wayne Wolf Dept. of Electrical Engineering, Princeton University."

Similar presentations


Ads by Google