Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems.

Similar presentations


Presentation on theme: "Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems."— Presentation transcript:

1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems Lazaros Papadopoulos, Ivan Walulya, Philippas Tsigas, Dimitrios Soudris and Brendan Barry Lazaros Papadopoulos, Ivan Walulya, Philippas Tsigas, Dimitrios Soudris and Brendan Barry National Technical University of Athens School of Electrical and Computer Engineering Division of Computer Science

2 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 2 “Watt’s Next?” http://bit.ly/t6zo2j Power consumption –Design decisions –Performance/watt metric Improvements in compute performance -More power budget -Cooling problems

3 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 3 GPU FLOPS/W Trend

4 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 4 GPU FLOPS/W Trend 1000.00 Myriad 2 438.86 28nm 2014 100.00 GPU rate of increase Myriad 49.37 1.4x per Year 65nm 2011 7 Years to hit 50GFLOPS/W! 10.00 6.056.19 4.99 3.95 2.02 1.00 0.40 0.10 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 4 Emerging Embedded Systems Trend

5 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 5 But how? Old ApproachNew Approach

6 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 6 Now that I’ve got an Ultra low power Compute Platform What can I do with it? Potential of such low power processors for use in high end computations. Can they offer a solution to power problems Can high-performance computing techniques be deployed on these processors?

7 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 7 Introduction –Synchronization on multi-core platforms –Movidius SoC Algorithmic Designs Experimental results Conclusions Outline

8 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 8 Synchronization Hardware support Mutexes –Scalability –Busy Waiting Lock alternatives or lock-free designs?? Message-passing techniques from HPC domain

9 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 9 Myriad architecture Processors: –32-bit general purpose RISC SPARC processor (LEON). –8 SHAVE (Streaming Hybrid Architecture Vector Engine) processors for computational processing. Memory: –CMX (Connection Matrix): 1 MB on-chip RAM (with 128KB per SH AVE core) –SDRAM: 64MB. Synchronization support on Myriad1: Mutexes, FIFO registers

10 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 10 Single Lock Double Lock Client-Server Remote Core Locking - RCL Algorithmic Designs

11 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 11 No concurrency Busy waiting No Scalability Single Lock Done yet? Done yet? Done yet? Done yet?

12 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 12 Better concurrency Improved scalability Busy waiting Multiple Locks

13 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 13 Request for access Spin on local variable Access granted by server Limited Concurrency Client-Server arbitration (C-S) Thread Server Pend Post Queue

14 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 14 Migrate Critical Section No shared data transfers Reduced Bus traffic Remote Core Locking (RCL) Queue Thread Server Post

15 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 15 Remote Core Locking - RCL Th-1 Th-2 Server head tail Memory headtail Th-1 Th-2 e1e5 e0 e4 tail head enq() &e6 e1 e5 deq() &e1 deq(&e1) e4 tail e6 head

16 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 16 Clients server communication costs Serialization of a concurrent data structure Losing one core RCL - Drawbacks

17 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 17 Experimental evaluation FIFO Queues Cores execute Enqueue and Dequeue operations o High contention Test Configurations 1.Random, initial size 0 2.N/2 Producers / N/2 Consumers, initial size 0 3.20,000 Operations Measured execution time in cycles

18 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 18 Experimental Results

19 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 19 Experimental Results

20 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 20 Complex data structures can be deployed on ultra low power processors With relatively low absolute performance can they be viable for high-end computing With 3D stacking it may become possible to stack many processors for very fast and energy-efficient communication Conclusions

21 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 21 Questions?


Download ppt "Evaluation of Message Passing Synchronization Algorithms in Embedded Systems 1 Evaluation of Message Passing Synchronization Algorithms in Embedded Systems."

Similar presentations


Ads by Google