Presentation is loading. Please wait.

Presentation is loading. Please wait.

HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.

Similar presentations


Presentation on theme: "HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe."— Presentation transcript:

1 HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe

2 Outline Objectives Background HW-SW Partitioning SW/HW Design Testing and Debug VGA Display Driver Results Lessons Learned Future Work

3 Objectives Accelerate MPEG-2 Decoder – Identify bottlenecks – Isolate bottleneck functions and partition design – Convert SW functions to HW blocks – Design HW/SW interfaces for communication – Measure accelerated performance Design VGA display driver on FPGA – Attempt to display decoded stream in real-time

4 Background Development Platform – TLL-5000 prototyping board ARMv9, Spartan3 FPGA, VGA DAC (ADV7125) Source code for MPEG2 Decoder – Obtained from sourceforge.net

5 Background – MPEG2 Consists of Group of pictures (GOP) sequence Types of pictures – I-picture (Intra coded) – P-picture (Forward predicted) – B-picture (Bidirectional predicted)

6 Background – MPEG2

7 HW-SW Partitioning Linux profiling done to determine critical functions – Results based on a particular input (mpeg file) – Assumed to be representative of a typical use case – Profiling done on x86 Linux and as well as on the board gmon.out generated on board

8 Profiling on x86-Linux

9 Profiling on ARM-Linux

10 HW-SW Partitioning

11 SW Design IDCT function uses pointers to access an input array – Not suitable for synthesis by Catapult-C – Converted all pointer accesses to array accesses IDCT performs non sequential accesses with varying stride – Modified caller of the IDCT function to re-organize access pattern into sequential form – Created temporary array, which is passed to function – Return array from function is re-distributed to correct locations Changes to software verified using golden code

12 SW Flow Chart MPEG2 SW code.……........ ……. IDCT function call.……........ ……. Create temporary buffer Pass input values in temporary buffer to FPGA memory Issue Start command to FPGA IDCT does computation and stores data back in FPGA memory Generates interrupt signal after computation is done Reads values from FPGA memory to temporary buffer Stores values from temp buffer back to original array in order start Wait for Interrupt interrupt..........

13 HW Design Mentor Catapult-C Synthesis Tool – High level synthesis from C/C++ to Verilog RTL

14 HW Design High Level Synthesis – Tool schedules operations on a cycle-by cycle basis – Constrained to available resources Uses target device and library information – Built RTL as a interface + controller + datapath

15 Example: Y = A*C + B*D

16

17

18

19 HW Design Code conversion for synthesis – Isolate IDCT function from MPEG2 code – Merge initialization functions One initialization construct was needed – Remove all global variables Few dependencies for the IDCT function – Convert pointer arithmetic to array offsets Most work needed for this conversion No standard guidelines available

20 HW Design Pointer conversions

21 HW Design Hardware Interface

22 HW Design Verifying Isolated IDCT function in C and RTL – C testbench written to test isolated IDCT function – Catapult-C allows testing of C function vs. RTL Ensure RTL generation matches expected behavior Un-converted pointer code generated wrong RTL

23 HW Design Integration with communication interface – Communication FSM given – Integrate IDCT block

24 Problems Faced IDCT RTL would not synthesize to 66 MHz – 27 MHz clock used instead IDCT code takes ~30 minutes to synthesize – Inefficiency of using Catapult-C to generate code Catapult code difficult to debug Some reads not returning correct values – Read/Write alignment – Synthesis could be a problem

25 Debug Techniques Removed IDCT block for fast synthesis – Used to check interface memory writes – Showed 16 bit writes were not successful Routed state bits to board LEDS – Helpful when program hangs due to lack of DTACK – OR’d DTACK with DIP switch to prevent hang printf and printk statements to check addresses and data being sent

26 Delay Values Hardware Delay – Approximately 10 us to compute IDCT Based on cycle count provided by Catapult-C and 27 MHz clock frequency of FPGA Pure software implementation – Approximately 30 us Overhead for communication – ~15000 us

27 VGA Display Block Diagram VGA Application Driver VGA Controller Main FSM RAM 1 RAM 2 ADV 7125 Monitor VGA On Board FPGA ARM Generated ppm files

28 VGA Hardware: ADV7125 Video DAC ADV7125 has triple 8-bit video DAC’s VGA DAC requires R, G, B 8-bit values Needs H-Sync and V-Synch signals

29 VGA Controller Used double buffer to store frame data – FIFO implementation didn’t work ARM cannot keep up with the display data rate requirement – Frame resolution: 64X48 – Each frame transfer requires 3072 words – Used 12KB RAM memory to implement double buffer One full frame transferred with single driver call – Reduces system call overhead – Each call overhead ~26 μs Interrupt used to communicate to User application – Fills the next buffer

30 VGA Display Demonstration

31 Lessons Learned Debugging on an FPGA is difficult! Hand-conversion of C code could have been more efficient Create test bench to simulate ARM-FGPA communication – Allows quick debug of FPGA hardware – Visibility into internal signals Hardware partition should have high computation to communication ratio – IDCT called many times with small computation time – ~10 us of computation; ~15000 us of communication

32 Future Work Fix erroneous reads from IDCT Integrate VGA display driver and MPEG2 Decoder

33 Thank you!


Download ppt "HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe."

Similar presentations


Ads by Google