1 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Constraint Satisfaction Problems Russell and Norvig: Parts of Chapter 5 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
EDA (CS286.5b) Day 10 Scheduling (Intro Branch-and-Bound)
G53CLP Constraint Logic Programming Modeling CSPs – Case Study I Dr Rong Qu.
Constraint Programming for Compiler Optimization March 2006.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Constraint Processing and Programming Introductory Exemple Javier Larrosa.
Precedence Constrained Scheduling Abhiram Ranade Dept. of CSE IIT Bombay.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Precedence Constrained Scheduling Abhiram Ranade Dept. of CSE IIT Bombay.
Evaluation of representations in AI problem solving Eugene Fink.
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Programming with Constraints Jia-Huai You. Subject of Study Constraint Programming (CP) studies the computational models, languages, and systems for solving.
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
1 CP in Electronic Design Automation (EDA) (Java Constraint Programming) JaCoP solver Radoslaw (Radek) Szymanek.
Bogdan Tanasa, Unmesh D. Bordoloi, Petru Eles, Zebo Peng Department of Computer and Information Science, Linkoping University, Sweden December 3, 2010.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Memory Access Scheduling and Binding Considering Energy Minimization in Multi- Bank Memory Systems Chun-Gi Lyuh, Taewhan Kim DAC 2004, June 7-11, 2004.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Interval-based Inverse Problems with Uncertainties Francesco Fedele 1,2 and Rafi L. Muhanna 1 1 School of Civil and Environmental Engineering 2 School.
Scheduling Parallel Task
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Copyright R. Weber Search in Problem Solving Search in Problem Solving INFO 629 Dr. R. Weber.
CONSTRAINT PROGRAMMING Computer Science Seminar April 9 th, 2004 Kerem Kacel.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
Storage Allocation for Embedded Processors By Jan Sjodin & Carl von Platen Present by Xie Lei ( PLS Lab)
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems Junhe Gan, Flavius Gruian, Paul Pop, Jan Madsen.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Constraint Propagation as the Core of Local Search Nikolaos Pothitos, George Kastrinis, Panagiotis Stamatopoulos Department of Informatics and Telecommunications.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Franciszek Seredynski, Damian Kurdej Polish Academy of Sciences and Polish-Japanese Institute of Information Technology APPLYING LEARNING CLASSIFIER SYSTEMS.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
Introduction to Real-Time Systems
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Roman Barták (Charles University in Prague, Czech Republic) ACAT 2010.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Advanced Algorithms Analysis and Design
Automatic Test Generation
Dynamo: A Runtime Codesign Environment
Parallel Programming By J. H. Wang May 2, 2017.
Objective of This Course
Presented By: Darlene Banta
Presentation transcript:

1 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design

2 Outline Introduction Problem Formulation and Motivational Example CLP Introduction CLP Modeling Optimization Heuristic and Experimental Results Conclusions

3 System Level Synthesis (SLS) Multiprocessor embedded systems are designed using CPU’s, ASIC’s, buses, and interconnection links The application areas range from signal and image processing to multimedia and telecommunication Task graph representation for application The main design activities are task assignment and scheduling for a given architecture Memory constraints (code and data memory)

4 SLS with memory constraints annotated task graph target architecture P1 ROMRAM P2 ROMRAM P3 ROM RAM A1 RAM B1 L1 L2

5 Data dominated application represented as directed bipartite acyclic task graph Each task is annotated with execution time, code and data memory requirements Heterogeneous architecture Both tasks and communications are atomic and they must be performed in one step Find a good CLP model Find a good heuristic for memory constrained time minimization task assignment and scheduling satisfying all constraints Problem Assumptions and Formulation

6 Motivation SoC multiprocessor architectures Co-design methodology needs tool support Memory consideration to decrease cost and power consumption System Level design for fast evaluation

7 Motivating example (memory) task graph D C2 t D C3 t architecture P1 L1 C1C1 C3C3 C2C2 C2C2 P1 L1 P2 C3C3 P1 L1 P2 D C3 t t D C2 D C3 P1 P2 P1P2 D C1 Schedule Data Memory Task - 1kB code memory, 4kB data memory, Communication - 2kB data memory P2

8 CLP Introduction “Constraint programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it.” Eugene C. Freuder CONSTRAINTS, April 1997

9 CLP Introduction Relatively young and attractive approach for modeling many types of optimization problems Many heterogeneous applications of constraints programming exist today State decision variables which constitute to solution State constraints which must be satisfied by solution Search for solutions using knowledge you can derive from constraints

10 Constraints properties may specify partial information — need not uniquely specify the values of its variables, non-directional — typically one can infer a constraint on each present variable, declarative — specify relationship, not a procedure to enforce this relationship, additive — order of imposing constraints does not matter, rarely independent — typically they share variables.

11 A simple constraint problem 1. Specify all decision variables and their initial domains CLP description TP1, TP2, TP3 :: 1..2, TS1, TS2, TS3 :: 0..10, Cost :: 0..10, Natural language description There are three tasks, namely, T1, T2, and T3. Each of these tasks can execute on any of two available processors, P1 and P2. Tasks T1 and T2 send data to task T3. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds.

12 A simple constraint problem 2. Specify all constraints and additional variables The execution time of task T1 is four seconds on processor P1 and two seconds on processor P2. Task T2 requires three and five seconds to complete execution on processor P1 and P2 respectively. Task T3 always needs three seconds for execution. If TP1 = 1 then TD1 = 4. If TP1 = 2 then TD1 = 2, If TP2 = 1 then TD2 = 3, If TP2 = 2 then TD2 = 5, TD3 = 3,

13 A simple constraint problem Tasks T1 and T2 must execute on different processors. Tasks T1 and T2 send data to task T3. If two communicating tasks are executed on different processors there must be at least one second delay between them so the data can be transferred. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds. TP1 != TP2, If TP1 != TP3 then D1 = 1 else D1 = 0, TS1 + TD1 + D1 <= TS3, […], Cost >= TS1 + TD1, Cost >= TS2 + TD2, Cost >= TS3 + TD3.

14 Search Tree

15 Modeling Constraint Logic Programming (finite domain, CHIP solver) Global constraints (cumulative, diffn, sequence, etc.) reduce model complexity of the synthesis problem and exploit specific features of the problem Global constraints are useful for modeling placement problems and graph problems Problem-specific search heuristic for NP-hard problem

16 CLP Model Decision variables for task TS – start time of the task execution TP – resource on which task is executed TDP – exact placement of task local data in memory Additional variables for task TD – task duration TCM and TDM denote the amount of code and data memory for task execution

17 CLP Model Decision variables for data DS – start time of the data communication DB – resource on which data is communicated DCP and DPP – exact placement of data in memory of the producer and consumer processor Additional variables for data DD – data communication duration

18 CLP Model – Task Requirements TS TP TD 1 time PU a) execution time TP 1 TCM PU CM b) code memory TS TDP TD TDM DM c) data memory time

19 CLP Model – Data Requirements DS DB DD 1 CU communication time time TS p DPP DA DM data mem (prod) time DS+DD DS DCP DA DM data mem (cons) time TS c + TD c

20 Simple Example P2 P1 P2 P1 B1 T2 C1 T3 T1 D1_e D1_p D2_pD2_c D3_e D2_e D1_c T1 D1 D2 T2 T3 Diffn constraint

21 Code Memory Constraint Processor Code Memory Limit T5 T2 T7 T6 T3 T4 T8 T1

22 Constraints types precedence constraints processing resources constraints communication resource constraints pipelining constraints code memory constraints data memory constraints

23 data memory estimate no. 1 holds? Task Assignment and Scheduling Heuristic Undo all decision – choose a task which consumes the most data data memory estimate no. 2 holds? Assign data memory Schedule communications that T i is minimal Assign the task to a processor with the minimal implementaion cost c i Choose a task from ready task set with min(max(T i )) – minimize schedule length Y Y N N

24 Execution Cost Ind = LowTS/PTS – LowCM/PCM ATS = available time slots, ACM – available code memory i-th task, n-th processor

25 Data and Communication Cost i-th task, n-th processor

26 Estimate no. 1 where S (S n ) is a set of tasks already scheduled on a processor (processor P n ), tasks t j are direct successors of task t i, and d ij is amount of data communicated between t i and t j. Estimate no. 2 uses the global constraint diffn and it takes time into account Estimates

27 MATAS System

28 Synthesis Results - H.261 example DCT FB1 IN BMA FIR PRAE Q IQ IDCT REK FB2 C Video Coding Algorithm H.261

29 Experimental results H.261 example

30 Experimental results (random task graphs)

31 Main Contributions Definition of the extended task assignment and scheduling problem Inclusion of memory constraints to decrease the cost for data dominated applications Specialized search heuristic to solve resource constrained task assignment and scheduling CLP modeling framework to facilitate an efficient, clean, and readable problem definition

32 Conclusions and Future Work The synthesis problem modeled as a constraint satisfaction problem and solved by the proposed heuristic, Good coupling between model and search method for efficient search space pruning, Memory constraints and pipelined designs taken into account, Heterogeneous constraints can be modeled in CLP, important advantage over other approaches  Need for our own constraint engine implementation, approximate solutions, mixture of techniques  Need for better lower bounds, problem specific global constraints, designer interaction during search

33 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design

34 Related Work J. Madsen, P. Bjorn-Jorgensen, “Embedded System Synthesis under Memory Constraints”, CODES ‘99 (GA, only RAM) S. Prakash and A. Parker, “Synthesis of Application- Specific Heterogeneous Multiprocessor Systems”, VLSI Signal Processing, ‘94 (MILP, no ASIC’s, optimal)

35 A simple constraint problem There are three tasks, namely, T1, T2, and T3. Each of these tasks can execute on any of two available processors, P1 and P2. Tasks T1 and T2 send data to task T3. Tasks T1 and T2 must execute on different processors due to some fault tolerant issues. The execution time of task T1 is four seconds on processor P1 and two seconds on processor P2. Task T2 requires three and five seconds to complete execution on processor P1 and P2 respectively. Task T3 always needs three seconds for execution. In case when two communicating tasks are executed on different processors there must be one second delay between them so the data can be transferred. The tasks should be assigned and scheduled in such a way that the schedule length does not exceed 10 seconds.