Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project.

Similar presentations


Presentation on theme: "Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project."— Presentation transcript:

1 Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

2  The algorithm  Part A overview  Part B challenges  Blocks implementation  Conclusions

3 The algorithm: Nonlinear Diffusion The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve the diffusion equation use numeric solution with iterations to solve the diffusion equation Why use it for image processing? Why use it for image processing?  Image noise is smoothed  Edges remain sharp

4 Original image

5 dt = 30 !!! one iteration dt = 30 !!! one iteration Look at the edges (sharp!) Look at the hat (smoothed)

6 Difficulties with the algorithm: Difficulties with the algorithm:  Very complex design, makes real time almost impossible  Transpose entire image  Reverse order loop  huge memory bandwidth required So why use this model ? So why use this model ?  Good results even after a single iteration (Yoni & Zion needed at least 20 iterations => need for multiple FPGAs)

7 Exploring different architecture solutions in Matlab Exploring different architecture solutions in Matlab  Comparing “sub-frames” processing vs. entire frame processing Fixed-point analysis of the algorithm in Matlab Fixed-point analysis of the algorithm in Matlab Learning about memory resources: Learning about memory resources:  Internal memory: MRAM, M4K, M512  External memory: DDR Analyzing the memory bandwidth requirements of the algorithm Analyzing the memory bandwidth requirements of the algorithm DVI signal generators DVI signal generators Implementation of a real-time streaming of pixels through DDR double buffering: Implementation of a real-time streaming of pixels through DDR double buffering: DVI in=>DDR write=>DDR read =>DVI outDVI in=>DDR write=>DDR read =>DVI out

8 Transpose image implementation Transpose image implementation First transpose (800x525 => 525x800)First transpose (800x525 => 525x800) Second transpose (525x800 => 800x525)Second transpose (525x800 => 800x525) Each transpose implies synchronization between internal memories and external memories using dedicated controllers and FIFOsEach transpose implies synchronization between internal memories and external memories using dedicated controllers and FIFOs Detection of frame first pixel Detection of frame first pixel Needed because each transpose block should start operating only at the first pixel of a frameNeeded because each transpose block should start operating only at the first pixel of a frame Also needed because the pipeline of Sergey & Roman need to get a starting signal, when the first pixel of a frame enter the pipeline.Also needed because the pipeline of Sergey & Roman need to get a starting signal, when the first pixel of a frame enter the pipeline. Implementation of frame rate convertors Implementation of frame rate convertors Down rate convertor at the input (60 fps => 15 fps)Down rate convertor at the input (60 fps => 15 fps) Up rate convertor at the output (15 fps => 60 fps)Up rate convertor at the output (15 fps => 60 fps) CORRECT DVI Synchronization! CORRECT DVI Synchronization! PLL fixed location at input and output pins.PLL fixed location at input and output pins. Registered Input/output pins.Registered Input/output pins. Fixed-point analysis of the algorithm in Quartus Fixed-point analysis of the algorithm in Quartus

9 DVI IN DVI IN DVI OUT DVI OUT data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 2 banks Gidel’s memory controller 180MHz StratixII data 24bit Internal memories

10 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 2 banks Gidel’s memory controller 180MHz StratixII data 24bit

11 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

12 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

13 There are 4 bidirectional communication channels to/from DDR There are 4 bidirectional communication channels to/from DDR Each channel requires another controller which is a variation of a fundamental controller Each channel requires another controller which is a variation of a fundamental controller Up rate Up rate Down rate Down rate First tranpose (800x525 => 525x800) First tranpose (800x525 => 525x800) Second Transpose (525x800 => 800x525) Second Transpose (525x800 => 800x525) Each one has asymmetric behavior for read and write Each one has asymmetric behavior for read and write

14 WRITE controller READ controller Synchronization states

15 Dual Clock FIFO DDR WR controller DDR RD controller wr fin continue rd fin When finishing a frame: Each controller calculates its new address and waits for the other controller to finish. While waiting, the controller keeps sending “continue” signal to the other controller. Dual Clock FIFO Pipe

16 Flush -According to Gidel’s manual: Flush -According to Gidel’s manual: flush signal is used to force writing the data to the memory when the last word is incomplete. flush signal is used to force writing the data to the memory when the last word is incomplete. BUT, even when using a port size equal to the memory width, one must use the ‘flush’ signal. Write empty: Write empty: When performing write bursts from different addresses, one must wait for signal write_empty before starting a new burst. Without waiting - the data is lost. When performing write bursts from different addresses, one must wait for signal write_empty before starting a new burst. Without waiting - the data is lost. NOT in Gidel’s manual! NOT in Gidel’s manual!

17 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

18 Write controller: Write controller: Writes to DDR only one frame out of every 4 frames. Writes to DDR only one frame out of every 4 frames. Frame rate: 15 frames/sec, pixel rate: 6.2MHz Frame rate: 15 frames/sec, pixel rate: 6.2MHz Data loss is almost unnoticeableData loss is almost unnoticeable Algorithm performance is not affected!Algorithm performance is not affected! Actual bandwidth: 25 MHz (DVI clock) Actual bandwidth: 25 MHz (DVI clock) Read controller: Read controller: Same as the fundamental DDR controller (burst of entire frame) Same as the fundamental DDR controller (burst of entire frame) Actual bandwidth: 6.2 MHz Actual bandwidth: 6.2 MHz

19 “normal” READ controller WRITE controller Write 1 frame to DDR Counts 3 more frames, cleans the pipe

20 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

21 Write controller: Write controller: Same as the fundamental DDR controller (burst of entire frame) Same as the fundamental DDR controller (burst of entire frame) Actual bandwidth: 6.2 MHz Actual bandwidth: 6.2 MHz Read controller: Read controller: Reads the same frame from the DDR 4 times Reads the same frame from the DDR 4 times To meet DVI data rate requirementsTo meet DVI data rate requirements Actual bandwidth : 25MHz Actual bandwidth : 25MHz

22 READ controller WRITE controller Main “loop”- reads 4 times the same frame Sync with WR, swap addresses

23 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

24 stratixII A reminder of how it works: M-RAM WRITE M-RAM READ DDRII T’ WRITE DDRII T’ READ Penalty every row skip Sequential read from DDR Penalty all the time !

25 Two different transposes: Two different transposes: The first transpose - 800x525 The first transpose - 800x525 Transpose back - 525x800 Transpose back - 525x800 Debugging difficulty… Debugging difficulty… Synchronization to the beginning of the frame is required Synchronization to the beginning of the frame is required Transpose counters: Transpose counters: “heavy” sequential Combinational logic causes Timing problems “heavy” sequential Combinational logic causes Timing problems Transpose on read or on write? Transpose on read or on write?

26 Mram Mram Max number of rows (minimum penalty) Max number of rows (minimum penalty) Number must divide 800 or 525 (no reminder) Number must divide 800 or 525 (no reminder) Number must agree with Gidel controller Number must agree with Gidel controller We chose 50 and 35 lines respectively We chose 50 and 35 lines respectively DDR DDR Load balancing Load balancing Gidel requirements Gidel requirements

27 Mram Write and read Address counters Beginning of frame detection unit delaying the data 3 Mrams for RGB

28 DDR Synchronization on the WR controller:  New “Data in” port  designated states to deal with the first pixel of the frame after reset.  “cleans” the DCFIFO until detecting the first pixel of a new frame.  The WR controller sends reset signal to the RD controller.

29 DDR and Mram counters: DDR and Mram counters: The “heaviest” combinational logic of the entire design The “heaviest” combinational logic of the entire design If (a) and (not b) and (not c) then If (a) and (b) and (not c) then If (a) and (b) and (c) then Long CL paths results in timing problems! No code reuse and more HW (but we have enough!) guarantees shorter, parallel CL If (a) then If (b) then If (c) then

30 Can’t easily “divide and conquer”- Can’t easily “divide and conquer”- Result is available only after 2 transposes: Result is available only after 2 transposes: We used SignalTap and built verification units We used SignalTap and built verification units Mram DDR Addresses counters First T’ sync Dual clk FIFO Mram DDR Addresses counters Second T’ sync Dual clk FIFO

31 Can’t simulate DDR’s behavior in MODELSIM Can’t simulate DDR’s behavior in MODELSIM We don’t have a reliable model of the external memory’s behavior We don’t have a reliable model of the external memory’s behavior Gidel’s controller is NOT “transparent” to the users - We know nothing about: Gidel’s controller is NOT “transparent” to the users - We know nothing about: Gidel’s Internal implementationGidel’s Internal implementation Gidel’s handling requests policy of the DDRGidel’s handling requests policy of the DDR We can read from the DDR through PCI but – it changes the data path… We can read from the DDR through PCI but – it changes the data path…

32 Read and Write protocols are different Read and Write protocols are different WRITE: WRITE: Wait 16clks after startWait 16clks after start Wait ~100 clks after flushWait ~100 clks after flush Wait for signal write_emptyWait for signal write_empty READ: READ: Wait for signal almost_empty_RDWait for signal almost_empty_RD Looks like READ loop is shorter! Looks like READ loop is shorter! We successfully implemented transpose on read. We successfully implemented transpose on read. However, the improvement is not good enough to avoid using down/up rate controllers. However, the improvement is not good enough to avoid using down/up rate controllers. The combined up rate and transpose: read loop is more “busy”, better perform T’ on write! The combined up rate and transpose: read loop is more “busy”, better perform T’ on write!

33 Can we avoid the loss of data? 2 iterations: 2 iterations: Only 2 transposes are needed! Only 2 transposes are needed! 2 FPGAs 2 FPGAs DDR configuration (for each FPGA): DDR configuration (for each FPGA): 1 transpose on bank A (19 MHz)1 transpose on bank A (19 MHz) 1 transpose on bank B (19 MHz)1 transpose on bank B (19 MHz) For each bank: 180x0.75/3=45 >25.2 !!! Add more memory: 1 T’ on bank A, 1 on bank B, 1 on additional memory:1 T’ on bank A, 1 on bank B, 1 on additional memory: For each bank: 180x0.75/3=45 >25.2 !!!

34 T’ DVI IN DVI IN PIPE DVI OUT DVI OUT columns lines Freq controller: 4F to F T’ PIPE Freq Controller+T’ 4F to F data 24bit (RGB) 3bit DVI sync PLL Reset detector DVI Ctrl signals generator DVI sync 3bit 25.2MHz DVI clk ¼ DVI clk DDR 8 Double Buffers Gidel’s memory controller 180MHz StratixII data 24bit

35 Problems Problems Inconsistent compilation results Inconsistent compilation results Jittery image Jittery image Lost data Lost data Timing problems Timing problems Solutions Solutions Registered I/Os Registered I/Os PLL Fixed placing PLL Fixed placing

36 Multiport Multiport Data loss at end of burstData loss at end of burst Long penaltiesLong penalties I/O strengthI/O strength ProcII vs. ProcIII (no DVI)ProcII vs. ProcIII (no DVI) Sync Sync Waiting for signal from second groupWaiting for signal from second group 12345271217 678910381318 1112131415  491419 16171819205101520 611161

37 SignalTap SignalTap

38 Internal memory blocks: Internal memory blocks: Addressing controller Addressing controller Transpose Transpose Line reverse Line reverse External memory: External memory: Double buffer on DDR Double buffer on DDR Up/down rate controller Up/down rate controller DVI synchronization DVI synchronization

39

40 We invite you to join us in the lab for a short demonstration


Download ppt "Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project."

Similar presentations


Ads by Google