Presentation on theme: "Multiprocessor Architecture for Image Processing Under the guidance of Dr. Anshul Kumar Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412."— Presentation transcript:
Multiprocessor Architecture for Image Processing Under the guidance of Dr. Anshul Kumar Mayank Kumar 2006EE10331 Pushpendre Rastogi 2006EE50412
Introduction Signal Processing, particularly image/video processing in embedded platform for implementing complex algorithms meeting real time deadlines requires high end processors. Power consumption and cost are the major issues against massive deployments of Embedded processing nodes. Eg surveillance camera network, traffic monitoring and control etc
Introduction FPGA/Reconfigurable ASIC provide promising solution to the above problem by designing specific hardware utilizing the parallelism in algorithm. Though, there are many shortcomings Gates get used up when complex algorithm are implemented. Implementing sequential algorithms on FPGA directly is highly inefficient.
Our approach To design a multiprocessor architecture to facilitate the processing of high resolution image/video frames. Design of PE, or node processor customized to handle pixel/region level operations efficiently. Given the PE, design of the architecture for interconnecting these processors and design of input/output Hardware.
Novelty By having an array of processors, we are exploiting the parallelism offered by processing different regions of frame in different processors. In any processor, sequential algorithm are efficiently implemented by providing application specific instruction set. Locally Sequential and Globally parallel
Locally Sequential Globally Parallel Any class of algorithms which are window based and essentially operates on regions of the image, rather then the image as a whole. Image change detection for surveillance applications Optic flow, motion estimation, filtering etc We chose Image change detection using Background Modeling as a test algorithm.
Word Done Hardware Part Initial Architecture Drawbacks Change of platform New Architecture Implementation Software Part Algorithm Analysis and implementation Fixed point Matlab Simulation C Implementation
Initial Architecture Camera Video ADC` Virtex II Pro RGB Conversion Power PC M1 MEMORYMEMORY Video DAC MPMC Monitor Array Topology
Architectural Drawbacks Multi processor Memory controller could only handle finite (2-4) parallel access from different processors. Solution: We should use BRAM for parallel access. We need to store the whole frame as the image format in XUPV30 is interlacing. -> Will use up all available BRAMs Solution: Use a board which provides progressive data. Moreover, all digital camera these days provide progressive image data.
Change of Platform We switched to Xilinx ML401 Virtex Video Starter Kit. Provides progressive Video input Much more BRAM, Matlab/Simulink as a design platform for designing at higher abstraction level. Though, switching platform consumed time due to a associated learning curve.
New Architecture Camera Video ADC` VIO_in Custom Memory Controller (Verilog Module) ` Array of Block Ram Array of Processor Network VIO_in Video DAC Monitor
Description and Implementation ML401 VSK provides two FPGAs Xilinx XUP2V7 for image input/output Xilinx ML401 for developing application. VIO_in and VIO_out are reference design which sandwiches the user level design. It provides progressive image data. We designed the custom Memory controller suited to our needs. It writes data to FIFOs implemented using BRAMs.
Custom Memory controller Takes H_sync, v_sync, rst, Pixel_clk as input and selects a target FIFO to write the incoming data. Each BRAM stores Image data corresponding to 4 lines. It first empties the queue reading the result computed in the last iteration. The other end of the FIFO is read through the Microblaze processor using FSL Links.
Processor Network Each processor network comprises of one Master processor, and 1-7 slave processors. Master processor reads data from FIFO and distribute the work among slave processors. We demonstrated this using 3 processor- 1 master and 2 slave
Processor Network Basic Design We connected the master processor to Uart to establish a serial link for input/output. The master processor connected to slave processor which are running the same algorithm. It takes input from uart, and passes it to diferent slaves. Master processor distributes work, by sending different regions of the image to different processors.
Software Architecture Studied the Adaptive Background Mixture Model. ,  Analysis of the algorithm for: Parallelism exploitation Length of code for implementation Memory requirements to store data. Feasibility
The Algorithm Models each region of the image frame as a sum of N Gaussians with respective weights attached, Update the model when new frame arrives. Depending on which Gaussian distribution (k) the current pixel data belongs to, make the Foreground/Background decision Effectively models repetitive changes in background. Resistant to noise and slow illumination variations
Fixed Point Matlab simulation Using Fixed point toolbox, we redefined our variables and constant in Q format. Data Types: DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 31 DataTypeMode: Fixed-point: binary point scaling Signed: true WordLength: 32 FractionLength: 23 Weight/other Constants Pixel Data
C implementation The Code is ported onto Xilinx Platform Studio for putting it onto Microblaze processors. Simulations shows equivalent results. All the PE contains the same code, they get different data to operate upon coming from different regions of the image.
Pitfalls Xilinx VSK design suit promises high level design of image/video processing using simulink. We tried using this, but it does not provide enough granularity for our design needs. Design become very complex to debug. Very tough to tweak sample design Xilinx EDK should be used for these kind of designs.
Conclusions We designed different parts of our proposed architecture: Input/output Custom Memory controller Basic Network processor. We have simulated and implemented the test algorithm on a network of processor as a proof of concept. We learnt the FPGA design flow and the Hardware Software Co-design.
Future work In this work, we used Microblaze processors. Instruction set not optimized for Pixel/Region based image processing. Lots of extra features that can be trimmed. Design of a custom processor suited for these application. Less FPGA Area need More efficient
References  Adaptive Background Mixture Model for Real-time tracking – Cris Stauffer, WELGrimson: AI, MIT – 1999  Understanding Background Mixture model- P Wayne Power, Johnn A. Schoonees: Image and vision computing NZ, 2002  A Microblaze based Multiprocessor SoC – P. Huerta, J. Castillo, J.I. Martinaze: 2007 Xilinx Microblaze ProcessorReference V7.0 UG081 Xilinx Virtex II Pro User Guide  Xilinx Video Start Kit (VSK) user Guide  Xilinx: SAPP529 Connecting customized IP to the Microblaze Soft Processor Core using FSL Link  EDK 9.1i Microblaze tutorial – A getting Started Guide  Xilinx White paper: Multiprocessor on XPS