VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:

VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor: Tsachi Martsiano

Table of content Project goals Compression methods The DWT Why is it any good? DWT vs DFT Project stages The MATLAB algorithm Results – MATLAB Top level architecture Micro-architecture Results - Simulation Results - Synthesis Frequencies Suggestions for a continues project

Project goals – Implementation of high-speed and real-time 2-D Discrete Wavelet Transform on FPGA – Based on new and fast convolution approach – Efficient memory area (in-place) – Article I use: World Academy of Science, Engineering and Technology 21 2008, VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High-Speed Image Computing, by Mountassar Maamoun, Mehdi Neggazi, Abdelhamid Meraghni, and Daoud Berkani.

Compression methods Lossless vs. Lossy Compression Lossless – Digitally identical to the original image – Only achieve a modest amount of compression Lossy – Discards components of the signal that are known to be redundant – Signal is therefore changed from input – Achieving much higher compression under normal viewing conditions no visible loss is perceived (visually lossless)

The DWT The wavelet transform has gained widespread acceptance in signal processing and image compression. Because of their inherent multi-resolution nature, wavelet-coding schemes are especially suitable for applications where resolution and quality of the image are important In the year 2000, the JPEG committee has released its new image coding standard, JPEG-2000, which has been based upon DWT.

The DWT cont. Wavelet transform decomposes a signal into a set of basis functions. These basis functions are called wavelets Wavelets are obtained from a single prototype wavelet called mother wavelet by scaling and shifting: where a is the scaling parameter and b is the shifting parameter

The DWT cont. Discrete wavelet transform (DWT), transforms a discrete time signal to a discrete wavelet representation. it converts an input series x 0, x 1,..x N, into one high-pass wavelet coefficient series and one low-pass wavelet coefficient series (of length N/2 each - down sample) given by: where h and g are called wavelet filters, and n=0,..., [N/2]-1.

The DWT cont. In practice, such transformation will be applied recursively on the low-pass series until the desired number of iterations is reached. YLYL YHYH

Why is it any good? Most elements of the given equations are zeroes because of the wavelet filters length and this gives us fast results. Using a smart architecture to achieve a valid result every clock period(after a short latency). Efficient memory usage(in-place) to reduce the memory size needed for implementation.

DWT vs DFT Localization in both time and frequency – According to Heisenberg uncertainty principle:. Because of the mother wavelet,is constant when using DFT and varies when we use DWT. This behavior is key because it gives us the ability to make a certain tradeoff between time and frequency domains and reach the desired result. Efficient –complexity of to the with DFT. Speed – faster to calculate.

Project stages – Learn the 2D-DWT algorithm from the article – Write floating point MATLAB DWT and IDWT Choose coefficients Compare the results to MATLAB DWT function – Write fixed point MATLAB DWT and IDWT Compare the results to MATLAB DWT function Select the fixed point resolution – Architecture: Learn the proposed architecture from the paper Adjust it to our case - different coefficients and picture size – Code the module in VHDL – Simulate the module using ModelSim – Synthesis of the module using Vivado

The MATLAB algorithm The coefficients I use in my project are the series biorthogonal4.4: Because a floating point is not supported by the FPGA we wrote a fixed point algorithm using only part of each of the coefficients to see what resolution(number of digits after floating point) will give us a good result: Performing DWT/IDWT on the image with the floating point coefficients. Modifing the coefficient to a fixed value( ) Performing the DWT/IDWT on the image with the fixed point coefficients. Comparing results

Results - MATLAB Fixed point(1024) Floating point Original picture:

Results - MATLAB My DWT & IDWT: MATLAB DWT & IDWT:

Top Level Architecture DWT (rows) MEM (high) DWT (cols_high) (SM) Controller MEM (low) DWT (cols_low) start data reset clk data_valid Y LL /Y LH Y HL /Y HH

Micro-architecture Number of units Component 25Registers(9-14bit) 8Adders(9-13bit) 5Multipliers(9-11bit) Registers are added between multipliers and adders to speed up the computing. The outputs Y L and Y H are obtained alternately at the tailing edges of the even and odd clock. The latency until the first output is ready is 10 clock cycles(e.g at cycles 10, 12,… we get Y H0, Y H1,… and at cycles 11,13,… we get Y L0, Y L1,…).

Memory usage – in-place The first row of Y LH and Y HH can be obtained after the beginning of the third row storage of the first level outputs. After the beginning of the fifth storage of the first level outputs, we can obtain the second row of Y LH and Y HH and the first row of Y LL and Y HL. Nine FPGA block RAM in Dual-Port Mode are required to accomplish the second level of the parallel DWT architecture with our wavelet filters.

Results - Simulation Using MATLAB we created a.txt file containing the original picture values. Our TB read the values from the file and entered them as input into our model. Once our model has started generating valid outputs, the TB then wrote them into a new.txt file. Using MATLAB to read the values, we were able to evaluate the results.

Results - Simulation My DWT & IDWT (MATLAB): Simulation DWT & IDWT:

Results - Synthesis Using the Zynq ZedBoard we performed two synthesis: – Regular memory size usage. – In-place memory size usage. The goal was to reach up to 600 Mhz

Frequencies Regular memory usage – 165 Mhz. the slow speed is Due to a large counter in the controller due to large memory sizes. In-place memory usage – 376 Mhz. the slow speed is due to the connection to the device output. Ways to improve performance: Using faster and smaller memories. Improving the address counter.

Suggestions for a continues project Implementing the in-place architecture into our model Improving the controller of the model to overcome the address counter issue.

Development environments MATLAB - modeling MODELSIM -simulation VIVADO - synthesis

THANK YOU!

VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:

Similar presentations

Presentation on theme: "VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:

Similar presentations

Presentation on theme: "VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:"— Presentation transcript:

Similar presentations

About project

Feedback