Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combinatorial Optimization for Embedded System Design

Similar presentations


Presentation on theme: "Combinatorial Optimization for Embedded System Design"— Presentation transcript:

1 Combinatorial Optimization for Embedded System Design
Michela Milano Michele Lombardi, Alessio Bonfietti, Luca Benini, Andrea Bartolini, Davide Bertozzi, Alessio Guerri, Martino Ruggiero, Giuseppe Tagliavini

2 Embedded Systems A rough definition
“Any computing system which is not a computer” Large variety of devices High performance (as they often real time applications) High energy efficiency (e.g. in case of battery supplied power) Issue: efficiency Classical design approaches → dedicated G.P. systems → dedicated hardware Issue: high design cost + poor flexibility

3 MultiProcessor Systems on Chip
MPSoCs address all such problems: Flexibilty through software (or mixed HW/SW) applications Performance through parallelism Low power consumption by using low(er) frequency cores and power saving techniques HOWEVER: Thermal issues Requires proper use of the exposed resources Squeezing out the full power of a modern MPSoC can be defintiely HARD

4 This thing has to perform some automatic optimization
Ideally Given: Input application (code) Target Platform description Yield: optimized application What’s the black box? A compiler? A CAD tool (less blackish)? A run time support? ? FOR SURE: This thing has to perform some automatic optimization OPTIMIZED APPLICATION

5 On-line vs off-line approaches
How should the black-box look like? → most likely: two distinct components ? ? ? ON-LINE OFF-LINE OS level scheduler On-line application-to-core dispatcher (Current multi-core CPUs...) Out of order execution ... Off-line code optimization (e.g. VLIW compilers) Memory allocation (even hand made) Off-line resource allocation (e.g. mixed HW/SW design)

6 Requirements for automatic optimization
1. A formal description of the application must be available Formal = can be undertood by a computer can be manipulated by mathematics Usually: task based models Task = atomic computation unit (e.g. an instruction, a process, a code block...) Tasks may have dependencies Task have measurable “features” (e.g. execution time) which must be computed in some way Tasks use hardware resources T0 T1 T2 T3 T4 T5 T6

7 Requirements for automatic optimization
2. A formal description of the platform must be available Usually: resource based models Resource = an “energy” provider over time Each resoruce has a finite capacity Platform = collection of resources PROC cores Additionally, for off-line approaches: 3. A formal description of the performance metrics must be available completion time (makespan) throughput energy consuption number of bus transactions ... PROC MEM memory devices MEM

8 Compiled code + directives + run time support = OPTIMIZED APPLICATION
An example on MPSoCs Mapping & Scheduling Problem: Application description Platform description Through: Off-line optimization algorithm Provide directives for the run-time support: Resource-to-task allocation Task scheduling decisions ? Compiled code + directives + run time support = OPTIMIZED APPLICATION OPT. APP.

9 Communication-aware approach
Communication aware: the approach minimizes inter-core communication between tasks Decisions: task allocation and scheduling Approach: Logic-based Benders decomposition Validation on a cycle accurate simulator and target platform Martino Ruggiero, Alessio Guerri, Davide Bertozzi, Michela Milano, Luca Benini: A Fast and Accurate Technique for Mapping Parallel Applications on Stream-Oriented MPSoC Platforms with Communication Awareness. International Journal of Parallel Programming 36(1): 3-36 (2008) Luca Benini, Michele Lombardi, Michela Milano, Martino Ruggiero: Optimal resource allocation and scheduling for the CELL BE platform. Annals OR 184(1): (2011)

10 MP-OPT Cell Programming Interface Solver Runtime System
A High-Performance Data-Flow Programming Environment for the Cell BE Processor Programming Interface Solver Runtime System

11 Communication-aware approach
Robustness and Variability Performance Cell SuperScalar MPOpt All experiments were executed on a PlayStation 3 (3.2 GHz Cell) running Yellow Dog Linux 6.0

12 Dynamic voltage and frequency scaling
Energy aware: the approach minimizes energy dissipation Decisions: task allocation, frequency and scheduling Approach: Logic-based Benders decomposition Validation a cycle-accurate simulator Martino Ruggiero, Davide Bertozzi, Luca Benini, Michela Milano, A. Andrei: Reducing the Abstraction and Optimality Gaps in the Allocation and Scheduling for Variable Voltage/Frequency MPSoC Platforms. IEEE Trans. on CAD of Integrated Circuits and Systems 28(3): (2009)

13 Energy-aware approach

14 Robust optimization for conditional task graphs
Robust optimization: minimizes expected execution time guaranteeing resource feasibility in all scenarios Conditional task graphs Approach: Constraint Programming solver: transformation of a probabilistic problem in a deterministic counterpart Michele Lombardi, Michela Milano, Martino Ruggiero, Luca Benini: Stochastic allocation and scheduling for conditional task graphs in multi-processor systems-on-chip. J. Scheduling 13(4): (2010) Michele Lombardi, Michela Milano: Allocation and scheduling of Conditional Task Graphs. Artif. Intell. 174(7-8): (2010) Against scenario-based scheduling Same performance of solvers considering 50% scenarios Much higher solution quality: 49% improvements Against scenario based scheduling: same performance of 50% scenarios – much higher solution quality 49% on average

15 Robust Optimization under duration uncertainty
Robust optimization: minimizes expected execution time guaranteeing deadline feasibility in all scenarios Task graphs with WCET and BCET known Approach: Partial Order scheduler with min-flow algorithm for identifying critical sets Michele Lombardi, Michela Milano, Luca Benini: Robust Scheduling of Task Graphs under Execution Time Uncertainty. IEEE Trans. Computers 62(1): (2013) Fixed priority scheduler based on tabu search

16 Synchronous data flow graphs
Syncronous data-flow graphs: Maximizes throughput Approach: Constraint Programming solver: throughput constraint Alessio Bonfietti, Michele Lombardi, Michela Milano, Luca Benini: Maximum-throughput mapping of SDFGs on multi-core SoC platforms. J. Parallel Distrib. Comput. 73(10): (2013) Against SDF3 and SMS SDF3 fastest SDF3 has 12.1% average optimality gap SMS has 4,8% average optimality gap Against simulation and Swing modulo Scheduling implemented in gcc The ‘‘SDF3’’ tool is the fastest approach, however its solutions present an average gap of 12.1% (with a peak of 47.7% in the MPEG-2 benchmark) w.r.t. the 4.8% of the SMS.

17 Cyclic scheduling Cyclic applications: possibly more than one with different period. Approach: CP solver with a modular arithmetic based approach Alessio Bonfietti, Michele Lombardi, Luca Benini, Michela Milano: CROSS cyclic resource-constrained scheduling solver. Artif. Intell. 206: (2014) Presentation tomorrow by Alessio Bonfietti on a project with ABB

18 Empirical Model Learning
Machine learning and data analytics for characterizing the app and the platform Insert the learned model into the optimization model Michele Lombardi, Michela Milano, Andrea Bartolini: Empirical decision model learning Artif. Intell.: online (2016)

19 Empirical Model Learning
Thermal behaviour is complex Depends on: Room temperature Core workload Neighbor workload Heat and sink position

20 Empirical Model Learning
Building and training a Neural Network to model the thermal behavior of a (simulated) quad core chip temp_(0) pwr_3 pwr_2 pwr_1 pwr_0 sigmoid linear temp_3(Δ) Power input Encoding the network in CP, using a “Neuron Constraint” to encode each neuron and decisions for the input Building a CP model to perform thermal aware workload dispatching with temperature constraints

21 Empirical Model Learning
WITHOUT EML WITH EML

22 Open issues Accuracy vs. efficiency
How to consider accuracy/confidence levels IN the optimization process Definition of the training set More inference methods for ML models


Download ppt "Combinatorial Optimization for Embedded System Design"

Similar presentations


Ads by Google