Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)

Similar presentations


Presentation on theme: "Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)"— Presentation transcript:

1 Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)

2  Project Recap  Data Flow  Blocks implementation  Conclusions  Project B - Time Table

3 The algorithm: Nonlinear Diffusion The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve the diffusion equation use numeric solution with iterations to solve the diffusion equation Why use it for image processing? Why use it for image processing? Image noise is smoothed Image noise is smoothed Edges remain sharp Edges remain sharp

4 Original image

5 dt = 30 !!! one iteration dt = 30 !!! one iteration Look at the edges (sharp!) Look at the hat (smoothed)

6 Difficulties with the semi-implicit model: Difficulties with the semi-implicit model:  Very complex design (Thomas), makes real time almost impossible  Transpose entire image  Reverse order loop  multiple memory accesses So why use this model ??? So why use this model ???  Strong effect - good results after very few iterations

7 DVI IN DVI IN DVI OUT DVI OUT Lines PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ Columns PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ T’ How to implement T’ In real time???

8 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns rows M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ Double buffers External memory Balanced channels Reduced frequency

9 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

10 Addressing controller Addressing method - First attempt:Addressing method - First attempt: Use cache organization approach: Use cache organization approach: Fast - direct access to data in memoryFast - direct access to data in memory Easy to implement - no logic is needed for “translation”Easy to implement - no logic is needed for “translation” However, expensive : 10 bits is more than we need for column representation10 bits is more than we need for column representation 4bits10bits 1bit rowAreacolumn 15 bits

11 Addressing controller 1 st attempt implementation requires: 98KB1 st attempt implementation requires: 98KB 1 M-RAM block is 64KB1 M-RAM block is 64KB Solution Use consecutive addressing Use consecutive addressing Address = block + row + phase Address = block + row + phase Requires “translation” … but: Requires “translation” … but: Size: 61KB - Fits! Size: 61KB - Fits! Quartus report

12 Addressing controller Address translation units

13 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

14 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

15 TransposeGoal: write the transposed data, so it can later be read sequentially, in rowswrite the transposed data, so it can later be read sequentially, in rowsProblem: Random access in DDR is too expensive: 32 clk penalty!Random access in DDR is too expensive: 32 clk penalty!solution: Use internal memory to inverse order:Use internal memory to inverse order: - “pay” most penalty in random accesses to FPGA mem - “pay” most penalty in random accesses to FPGA mem Write to DDR in “windows” :Write to DDR in “windows” : - Enable sequential row write - Penalty only every row skip

16 Transpose how it works: M-RAM WRITE M-RAM READ DDRII T’ WRITE DDRII T’ READ Penalty every row skip Sequential read from DDR Penalty all the time !

17 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

18 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

19 Reverse Line Order Used for Thomas algorithmUsed for Thomas algorithm Implementation Implementation On M4K blocksOn M4K blocks Double sized buffer with alternating pointers for Read/WriteDouble sized buffer with alternating pointers for Read/Write 0 640 0 640 Read Write Swap addresses Read Write

20 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

21 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

22 We need very large double buffers, that can be integrated easily with FPGA designs We need very large double buffers, that can be integrated easily with FPGA designs FPGA is resource limited FPGA is resource limited Solution: use external memory for this purpose. Solution: use external memory for this purpose.

23 Enables efficient usage of the memory on GiDEL PROC board Enables efficient usage of the memory on GiDEL PROC board Up to 16 ports per bank, 2 banks per FPGA Up to 16 ports per bank, 2 banks per FPGA Each port may be forced to access a different memory area and limited to a certain address space Each port may be forced to access a different memory area and limited to a certain address space Straightforward random memory access with random ports – slow and not efficient Straightforward random memory access with random ports – slow and not efficient Segmented working mode option for sequential ports. Enables to perform fast read/write bursts. Segmented working mode option for sequential ports. Enables to perform fast read/write bursts.

24  Two ports: sequential read and write. Each accesses a different memory area.  Implement double buffer: by switching the starting address at the end of every burst.

25 Pipeline Design Multi port coreOurEntity with Controller Control signals Write sequential port Read sequential port Fixed CLK External DVI CLK PROBLEM

26 Add FIFO to implement data rate matching. Add FIFO to implement data rate matching. Altera provides dual-clock FIFO (DCFIFO) megafunction. Using it before and after each write/read port would solve the problem. Altera provides dual-clock FIFO (DCFIFO) megafunction. Using it before and after each write/read port would solve the problem. Control logic is integrated into the control entity. Control logic is integrated into the control entity. Extra FIFOs = extra FPGA resources Extra FIFOs = extra FPGA resources

27 Solution Pipeline Design Multi port coreOurEntity with Controller Control signals Write sequential port Read sequential port

28 DVI clk Multi clk

29 Reset Prepare for read \ write Read \ write Flush Following DDR protocol including wait states Symmetric read \ write bursts according to FIFOs states Burst length can be adjusted Next slide… Buffer controller Schema

30 Problem: Data is written to DDR, only when the internal DDR FIFO is full Problem: Data is written to DDR, only when the internal DDR FIFO is full Solution: Flush forces the FIFO to pass data. Not using the Accurate flush length results in image noise! Solution: Flush forces the FIFO to pass data. Not using the Accurate flush length results in image noise! Problem: Flush delay length is not constant and depends on burst length Problem: Flush delay length is not constant and depends on burst length Solution: stretch write bursts until FIFO is almost full. This will lower flush influence. Solution: stretch write bursts until FIFO is almost full. This will lower flush influence.

31 Reset Prepare for read \ write Read \ writeFlush Fixed controller Schema Internal fifo is almost full

32 Up to 8 buffers per memory bank Up to 8 buffers per memory bank Must comply with bandwidth restrictions (MultiPort utilization) Must comply with bandwidth restrictions (MultiPort utilization) Integration effort Integration effort

33 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

34 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

35 In original design – down rate used internal memory. However, needed FIFO will not fit on FPGA In original design – down rate used internal memory. However, needed FIFO will not fit on FPGA Implementation is based on the DDR buffer with asymmetric read / write Implementation is based on the DDR buffer with asymmetric read / write Extra DDR access Extra DDR access Input output DCFIFOs are asymmetric in size Input output DCFIFOs are asymmetric in size Full data path Full data path Down rate buffer save to DDR only 1 frame out of 4 Down rate buffer save to DDR only 1 frame out of 4 Up rate buffer read from DDR same frame 4 times Up rate buffer read from DDR same frame 4 times

36 Prepare for write Read Flush reset Prepare for read Write Flush reset Prepare for write Flush Read/write reset Prepare for write Flush Read/write reset Re/Wr Sync controller

37 AGENDA  Internal memory blocks:  Addressing controller  Transpose  Line reverse  External memory:  Double buffer on DDR  Up/down rate controller  DVI synchronization

38

39 DVI in controller Mux Flag frame Flag detector Signal generation DVI rx DVI tx 24 data bit 12 bits hsync vsync date enable clk FPGA Data path with memory access Data path with memory access PLL 24bit to 12bit double rate gen hsync gen vsync gen de clk The signals must Pass through the same long delays as data extra bits written to memory

40 DVI in controller Mux Flag frame Flag detector Signal generation Send a known flag through the data path Send a known flag through the data path Start generating according to flag arrival Start generating according to flag arrival DVI rx DVI tx 24 data bit 12 bits hsync vsync date enable clk FPGA Data path with memory access Data path with memory access PLL 24bit to 12bit double rate gen hsync gen vsync gen de clk

41 Freq controller: 4F to F Freq controller: 4F to F Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT DDRII T’ WRITE DDRII T’ READ columns lines 48bit M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Delay M-RAM WRITE M-RAM READ

42 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines 48bit M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Delay M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

43 Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ

44 Summery Internal memory blocks: Internal memory blocks: Addressing controller Addressing controller Transpose Transpose Line reverse Line reverse External memory: External memory: Double buffer on DDR Double buffer on DDR Up/down rate controller Up/down rate controller DVI synchronization DVI synchronization

45 Problem with the board’s RESET Problem with the board’s RESET Problem with loading design Problem with loading design

46 Plan and implement logic blocks: Plan and implement logic blocks: SQRT, DIV are the main problemSQRT, DIV are the main problem Verify required precisionVerify required precision (based on our conclusions from part A) Integration of frequency controllers and transpose blocks Integration of frequency controllers and transpose blocks Implement one full iteration Implement one full iteration

47 Divide between 2 problems: Design of logic blocks Design of logic blocks Full DDR blocks integration Full DDR blocks integrationHow? Implement the processing algorithm for a smaller frame - Avoid using external memory Implement the processing algorithm for a smaller frame - Avoid using external memory

48 DVI IN DVI IN DVI OUT DVI OUT Logic blocks M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Sample smaller frame

49 Project B goal: create end to end data path - with Image Processing

50


Download ppt "Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)"

Similar presentations


Ads by Google