Presentation is loading. Please wait.

Presentation is loading. Please wait.

May 16-18, 2005 1 Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.

Similar presentations


Presentation on theme: "May 16-18, 2005 1 Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter."— Presentation transcript:

1 May 16-18, 2005 1 Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter Caarls, Pieter Jonker, Henk Corporaal Quantitative Imaging Group, department of Imaging Science & Technology

2 May 16-18, 20052 Overview Introduction & Motivation Approach Algorithmic skeletons Asynchronous RPC Implementation Run-time system Prototype architecture Results Conclusions & Future work

3 May 16-18, 20053 Introduction SmartCam: Integrating efficient user programmable image processing hardware within the camera or sensor itself. Philips CFT IV Inca+ 320 x 10-bit SIMD 5-issue VLIW Efficient hardware implies parallelism and heterogeneity Efficient programmability implies custom, application dependent hardware Finding the best hardware configuration for an application requires hardware independent software For wide acceptance, programming should be easy

4 May 16-18, 20054 Our approach Data parallelismAlgorithmic skeletons Avoid parallel bookkeeping Hide hardware implementation Locality of reference HeterogeneityTask parallelismAsynchronous RPC Familiar Hide processor configuration

5 May 16-18, 20055 Algorithmic Skeletons Separating structure from computation <=t >t += <=t >t += <=t >t +=

6 May 16-18, 20056 Algorithmic Skeletons Implicit parallel programming Choice of skeleton implies set of constraints (dependencies) System is free as long as constraints are not violated Distribution Scanning order Consistent library interface facilitates between-skeleton dependency analysis No side effects Well-defined inputs and outputs

7 May 16-18, 20057 Algorithmic Skeletons Inability to parallelize algorithms that cannot be expressed using one of the skeletons in the library Inability to specify certain algorithmic optimizations Inability to specify architecture-dependent optimizations Solution: Allow the programmer to add his own (application-specific or architecture-specific) skeletons to the library Disadvantages

8 May 16-18, 20058 Remote procedure call Just like a function call Computes the function on a different processor All data goes through the calling processor Synchronous: stub returns when remote function is done; data is available immediately Asynchronous: stub returns immediately; data is available later. Function1(…) Function2(…) Control processorCoprocessor Function1 { } Function2 { }

9 May 16-18, 20059 Futures Function returns reference to future result Reference can be used in other RPC calls Using the reference outside an RPC call requires an (implicit) block. Function1(&a) Function2(&a) Control processorCoprocessor Function1 { } Function2 { }

10 May 16-18, 200510 Parallelism through RPC RPC is not intrinsically parallel Synchronous RPC calling parallel function: data parallelism Asynchronous RPC calling (parallel) function: task parallelism Function1(…) Function2(…) Control processorCoprocessor 1 Function1 { } Coprocessor 2 Function2 { } Function1(…) Function2(…) Control processorCoprocessor 1 Function1 { } Function2 { } Coprocessor 2 Function1 { } Function2 { }

11 May 16-18, 200511 Optimizing communications Real-time image processing requires vast amounts of bandwidth Scatter-gather creates a bottleneck at the control processor.  Allow peer-to-peer communications between remote functions Function1(…) Function2(…) Control processorCoprocessor 1 Function1 { } Coprocessor 2 Function2 { }

12 May 16-18, 200512 Optimizing memory usage In embedded applications, memory is scarce Normal task parallelism requires a frame store per concurrent operation  Pipelining Function1(…) Function2(…) Control processorCoprocessor 1 Function1 { } Coprocessor 2 Function2 { }

13 May 16-18, 200513 Example /* Object following */ While (1) { GetImage(in); IsoWindowOp(WDW(5), gauss_5x5, in, filtered); IsoPixelOp(binarize, filtered, segmented, 50); {x, y, n} = {0, 0, 0}; AnisoPixelReductionOp(gravity, add, segmented, &{x, y, n}, 3*sizeof(int)); block(&{x, y, n}); x=x/n; y=y/n; SetMotorSpeed(WIDTH/2-x, HEIGHT/2-y); } SensorSIMDGP GetImage gauss_5x5 binarize {x, y, n}={0,0,0} SetMotorSpeed gravity x=x/n; y=y/n; GetImage gauss_5x5 binarize {x, y, n}={0,0,0} SetMotorSpeed gravity x=x/n; y=y/n; Object following

14 May 16-18, 200514 Run-time system implementation Function calls Read(&a); Process(&a, &b); Frontend Marshalling Resolving futures Mapper Find processor (user/static/dynamic) Dispatcher Set up FIFOs, channels Dispatch operations Gatherer Await completion Signal blocks

15 May 16-18, 200515 Prototype architecture

16 May 16-18, 200516 Results Double thresholding edge detection OperationSingleSplit (TM only) Split (TM+XTL) Time (ms)11512467 Relative100%108%58%

17 May 16-18, 200517 Conclusion The proposed programming model Is easy to use Skeletons hide data parallel bookkeeping RPC hides task parallel implementation Is architecture independent A skeleton can be implemented for different architectures RPC can map to heterogeneous system Is optimized for embedded usage Peer-to-peer communication: no scatter/gather bottleneck Pipelined: no frame stores

18 May 16-18, 200518 Future work Skeletons Skeleton Definition Language Skeleton merging Mapping Memory Scalar dependencies Evaluation New prototype architecture Dynamic, complex application


Download ppt "May 16-18, 2005 1 Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter."

Similar presentations


Ads by Google