Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini.

Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini School of EECS, University of Central Florida, Orlando, FL 32816

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF2 Goals Propose a new FPGA-µP based architecture for the RMF algorithm to computer Discrete Wavelet Transform. Technique to overcome the data routing bottleneck of the Recursive Merge Filtering for DWT technique. Transformation of the data routing problem for RMF to basic arithmetic computation on the FPGA with local memory access.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF3 Introduction to Reconf. Computing Reconfigurable Computing (EE Times, Nov. 1998)

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF4 Why Reconfigurable Computing? ASICs have high design turnover times Rapid Prototyping using FPGAs High design change/error costs Incorrect designs in silicon incur a very high cost of modification Speedup achievable is far greater Image Correlation: 0.69 sec on FPGA Vs 38 sec using 133MHz Pentium processor[Kean et al., 1997] 512 bit RSA decoding implementation using FPGA decodes at 200kbits/sec Vs 19kbits/sec for ASIC implementation.[Bertin et al., 1992] Reusability of hardware Same silicon chip used for diverse applications unlike ASICs Dynamic Reconfigurability FPGA can be configured for a different application while some other application is already running on the chip

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF5 Recursive Merge Filtering Algorithm Based on the principle of preserving the spatial correlation between the inputs and the wavelet coefficients obtained at any stage. RMF algorithm based on recursive sub-block computation, reducing the size of the image whose RMF is computed by half at each iteration(1-D case). Computation of the blocks bottom up followed by hierarchical merging of the sub-blocks to obtain the wavelet transform.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF6 Fast Wavelet Transform Data Flow

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF7 RMF DWT Data Flow

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF8 Data Routing (DR) in RMF RMF

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF9 RMF Algorithm Formally the RMF technique is defined as (1-D): where h and g are Häar Filters and is the concatenation op. The DWT operation can be defined in terms of RMF as:

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF10 RMF Algorithm (3) :RMF for 2-D: If k > 1, then

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF11 RMF Algorithm (4): RMF for 2-D … If k =1, then,  Key Point : RMF algorithm has two parts in the computation: Arithmetic computation phase and routing phase corresponding to merge and filter.  Separation of these two phases can lead to improvement.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF12 Transformation of DR to +/-: x y BR x, BR y TL x, TL y 1 2 3 4 5 7 6 9 8 Quad. 1 Quad. 2 Quad. 3 Quad. 4 Data movement : Block 9 and block 1 need to be swapped. Key : Use a virtual position matrix for the data items in the quadrants instead of moving the data items. 9 2 8 4 5 7 6 1 3

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF13 Virtual Mapping Index Initially: for every position (i, j) in the input data (2-D) do set the virtual index position (i, j) to, where is the packed storage of i and j. endfor. Data (i, j)=Image Pixel Value Data Matrix Initial state : Set Virtual Map VirtMap (i, j)= Data (i, j)=Image Pixel Value Data Matrix New State : Pixel (i, j) moved by (x,y) Virtual Map VirtMap (i, j)= Position of data pixel at (i, j) = VirtMap(i, j)

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF14 Architecture for RMF Using FPGAs The use of the virtual mapping index separates the routing and the computation. The microprocessor can proceed with the arithmetic computation while the circuit loaded onto the FPGA can carry out the data routing. The virtual index is stored in the FPGA board RAM allowing the FPGA fast access to the virtual index table. The microprocessor has to refer to the virtual index table to determine the actual position of the values needed during the computation.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF15 Computation using RMF Arch:Initially 1 4 3 2 5 6 7 8 16 Virtual Map (on FPGA RAM) Intermediate Mapping Input Data Array(4x4). Main Memory Queue 1 Queue 2 PBU MPU

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF16 RMF Computation(2) 1 4 3 2 5 6 7 8 16 Virtual Map (on FPGA RAM) Intermediate Mapping Input Data Array(4x4). Main Memory Queue 1 Queue 2 PBU MPU Set global context to Q1 i.e. all input co-ordinates read from Q1. Read 2x2 blocks and compute filter operation on CPU. Store back results in the main memory. Write co-ordinates to Q1. Repeat process until all 2x2 blocks computed.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF17 RMF Computation(3) 1 4 3 2 5 6 7 8 16 Virtual Map (on FPGA RAM) Intermediate Mapping Input Data Array(4x4). Main Memory Queue 1 Queue 2 PBU MPU Set global context to Q1 i.e. all input co-ordinates read from Q1. Read 2x2 blocks and compute filter operation on CPU. Store back results in the main memory. Write co-ordinates to Q1. Repeat process until all 2x2 blocks computed. Read 4 2x2 co-ordinates from Q1 and merge. Generate new co- ordinates on FPGA using RAM values. If block size > 2x2, then write to Q2 with parameter ‘false’ to specify that block size not yet 2x2 If block size 2x2 then then basic filter computation needs to be done on CPU. Write to Q2 with ‘true’

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF18 RMF Computation (4) 1 4 3 2 5 6 7 8 16 Virtual Map (on FPGA RAM) Intermediate Mapping Input Data Array(4x4). Main Memory Queue 1 Queue 2 PBU MPU When Q1 becomes set global context to Q2 Read co-ordinates from Q2 and repeat merging process Merge and write to Q1 with ’true/false’ set. In parallel, PBU checks global context queue and checks the flag. If filter operation is to be carried out, PBU reads FPGA RAM location from using co-ordinates in queue and determines the data values. Computes the data values and writes back to the same location in main memory.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF19 RMF Computation(5) 1 4 3 2 5 6 7 8 16 Virtual Map (on FPGA RAM) Intermediate Mapping Input Data Array(4x4). Main Memory Queue 1 Queue 2 PBU MPU Process repeated for PBU and MPU until one of the queues contains only a single co-ordinate pair, for the whole input data. The final coefficients are generated by resetting the main memory and putting the data in their proper positions.

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF20 Filter and Merge Equations Given any data item in block 1 in position (x, y), it is moved to position : Where : (BR x,BR y ) and (TL x,TL y ) are the bottom right and top left coordinates of the block. We further define the width and height of the block as:

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF21 Merge and Filter equations: The primitive Block( ) computes the merge process for the given top and bottom co-ordinates. Two more primitives are defined as a part of the Block () Move_Data( ) : Handles data movement Compute_1D ( ) : Computes the 1D RMF for rows and columns. To compute the basic 2x2 block we define another primitive

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF22 Block primitive invokes: If the size of the block is greater than 2x2, the Block primitive is invoked as The Block primitive invokes the following primitives to perform the proper filter and merge operations:

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF23 Block primitive invokes:

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF24 Block primitive invokes:

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF25 Architecture for RMF (2)… Hardware Software Architecture for DWT using RMF. Primitive Block Computation SW Unit Merge Process SW Unit Queue Structure Q1 Queue 1 Exclusion Zone Microprocessor FPGA RAM Main Memory Queue Structure Q2 rMap Access Exclusion Zone

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF26 Results : Accesses to main memory Original Image Reconstructed Image

11 Feb, 2000 VLSI Systems Lab, School of EECS, UCF27 Total Data Accesses

Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini.

Similar presentations

Presentation on theme: "Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini.

Similar presentations

Presentation on theme: "Novel Hardware-software Architecture for Computation of DWT Using Recusive Merge Algorithm Piyush Jamkhandi, Amar Mukherjee, Kunal Mukherjee, Robert Franceschini."— Presentation transcript:

Similar presentations

About project

Feedback