CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

System-level Trade-off of Networks-on-Chip Architecture Choices Network-on-Chip System-on-Chip Group, CSE-IMM, DTU.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Sensor Network Platforms and Tools
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
Tecniche di ottimizzazione per lo sviluppo di applicazioni embedded su piattatforme multiprocessore su singolo chip Michela Milano
History of Distributed Systems Joseph Cordina
Making Choices using Structure at the Instance Level within a Case Based Reasoning Framework Cormac Gebruers*, Alessio Guerri †, Brahim Hnich* & Michela.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Scheduling Parallel Task
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Mapping Stream Programs onto Heterogeneous Multiprocessor Systems [by Barcelona Supercomputing Centre, Spain, Oct 09] S. M. Farhad Programming Language.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Preemption Control.
CMPE 511 DATA FLOW MACHINES Mustafa Emre ERKOÇ 11/12/2003.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Meta Optimization Improving Compiler Heuristics with Machine Learning Mark Stephenson, Una-May O’Reilly, Martin Martin, and Saman Amarasinghe MIT Computer.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
Energy Aware Task Mapping Algorithm For Heterogeneous MPSoC Based Architectures Amr M. A. Hussien¹, Ahmed M. Eltawil¹, Rahul Amin 2 and Jim Martin 2 ¹Wireless.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Carnegie Mellon Interactive Resource Management in the COMIREM Planner Stephen F. Smith, David Hildum, David Crimm Intelligent Coordination and Logistics.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.
Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.
Architectures and Algorithms for Future Wireless Local Area Networks  1 Chapter Architectures and Algorithms for Future Wireless Local Area.
Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Exploiting Group Recommendation Functions for Flexible Preferences.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,
Linear Analysis and Optimization of Stream Programs Masterworks Presentation Andrew A. Lamb 4/30/2003 Professor Saman Amarasinghe MIT Laboratory for Computer.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Martino Ruggiero, Michele Lombardi, Michela Milano and Luca Benini
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Simulation of O2 offline processing – 02/2015 Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture Eugen Mudnić.
Static Translation of Stream Program to a Parallel System S. M. Farhad The University of Sydney.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Parallel Algorithm Design & Analysis Course Dr. Stephen V. Providence Motivation, Overview, Expectations, What’s next.
Resource Management IB Computer Science.
Combinatorial Optimization for Embedded System Design
Conception of parallel algorithms
A Methodology for System-on-a-Programmable-Chip Resources Utilization
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
CSCI1600: Embedded and Real Time Software
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
CSCI1600: Embedded and Real Time Software
Topology Optimization through Computer Aided Software
Presentation transcript:

CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna

Talk outline OUTLINE Talk topic  Present a mixed resource allocation and scheduling problem arising in the design flow of embedded systems  Discuss some problem variants (including a stochastic one)  Discuss the instances we used…  …and the instance generator

Motivation MOTIVATION & CONTEXT  Mainly due to the resource allocation phase Why is this stuff interesting? 1.It’s challenging 2.It’s a real world problem 3.We considered many problem variants  It’s important to test solvers on real problems  Different objective function  Deterministic vs stochastic 4.The instance generator  Flexible, from random to “real like” instances

Embedded system MOTIVATION & CONTEXT Embedded system They are special purpose automotivedigital convergence market share any information processing device embedded in another product They are finding widespread application: They often run a fixed set of applications Strong off-line optimization before the deployment

Platforms for embedded systems MOTIVATION & CONTEXT General trend: A promising platform: MPSoCs H/S co-design:  special purpose application  “general purpose” platform IBM, CELL processor

Embedded system design MOTIVATION & CONTEXT System design with MPSoCs Exploit application and platform parallelism to achieve real time performance  Given a platform description  and an application abstraction  compute an allocation & schedule  verify results & perform changes Design flow P1P2ram P1 P t Crucial role of the allocation & scheduling algorithm

MPSoC platform  Identical processing elements (PE)  Local storage devices  Remote on-chip memory  Shared bus PROBLEM DESCRIPTION MPSoC platform Resources: Constraints:  PEs  Local memory devices  Shared BUS  PE frequencies  Local device capacity  Bus bandwidth (additive resource)  Architecture dependent constraints DVS

Application  Nodes are tasks/processes  Arcs are data communication PROBLEM DESCRIPTION Task graph RD Each task:  Reads data for each ingoing arc  Performs some computation  Writes data for each outgoing arc EXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ.

Application  Memory allocation  Remote memory is slower than the local ones  Execution frequency PROBLEM DESCRIPTION Task graph RDEXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ. Durations depend on: Different phases have different bus requirements

Application PROBLEM VARIANTS O. F. We focused on problem variants with different  Objective function  Graph features G.F. Bus traffic Energy consumption Makespan Pipelined Generic Generic with cond. branches DVS no DVS

Objective function PROBLEM VARIANTS Bus traffic  Tasks produce traffic when they access the bus  Completely depends on memory allocation Makespan  Completely depends on the computed schedule Energy (DVS)  The higher the frequency, the higher the energy consumption  Cost for frequency switching 123 t 4 Time & energy cost Allocation dependent Schedule dependent

Graph structure PROBLEM VARIANTS 1234 Pipelined Typical of stream processing applications Generic Generic with cond. branches a !a b !b Stochastic problem Stochastic O.F. (exp. value)

Instances – an overview INSTANCES & INSTANCE GENERATOR  Real world instances  “Synthetic” instances  Random instances …different instances Different needs…  Solve real problems  Test the solvers and the models  Perform computation studies Where do they come from?

Real world instances INSTANCES & INSTANCE GENERATOR Real world instances From application code… …extract a task graph… …and compute durations by repeated runs The extraction phase is tricky and slow

Synthetic instances INSTANCES & INSTANCE GENERATOR Synthetic instances Randomly generate a graph… …wrap it into a synthetic computer program… …again, compute durations by repeated runs Faster, but not enough for a statistical complexity study

Random instances INSTANCES & INSTANCE GENERATOR Why random instances?  Quickly available “real like” instances  Completely random instances to test the solvers Instance generator* able to provide both Generation algorithm Start from a bipartite graph…..repeatedly replace an arc with series of nodes… * Soon available for download at

Random instances INSTANCES & INSTANCE GENERATOR..randomly add some nodes…..randomly add some arcs..cut arcs to turn some nodes into “tails”… Random Real like Many types of instances by balancing these two steps

Random instances INSTANCES & INSTANCE GENERATOR The generation process is tuned by means of “random parameters”  rp(MIN, MAX, DISTRIBUTION)  mrp(MEAN, ST. DEV., DISTRIBUTION) Linear, quadratic, exponential Nodes and arc attributes can depend on each other and are specified via a simple functional language  node_attrdurrp(100, 200, LIN)  node_attrlong_dursum(dur, rp(50, 100, EXP)) Nodes and arcs are labeled with custom lists of attributes Crucial to have “real like” instances

Conclusions CONCLUSIONS  Mixed allocation & scheduling problem  Complex allocation phase  It’s important for a competition to evaluate solvers on real problems  Different O.F.s  Deterministic & Stochastic problems  Random instances  Realistic instances needed for real problems Questions?

CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna