1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Outline LP formulation of minimal cost flow problem
ECE 667 Synthesis and Verification of Digital Circuits
ECE-777 System Level Design and Automation Hardware/Software Co-design
A Graph-Partitioning-Based Approach for Multi-Layer Constrained Via Minimization Yih-Chih Chou and Youn-Long Lin Department of Computer Science, Tsing.
MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.
Greedy Algorithms Greed is good. (Some of the time)
1 of 14 1 /23 Flexibility Driven Scheduling and Mapping for Distributed Real-Time Systems Paul Pop, Petru Eles, Zebo Peng Department of Computer and Information.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Embedded Systems Exercise 2: Scheduling Real-Time Aperiodic Tasks.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Distributed Algorithms for Secure Multipath Routing
Abhijit Davare 1, Qi Zhu 1, Marco Di Natale 2, Claudio Pinello 3, Sri Kanajan 2, Alberto Sangiovanni-Vincentelli 1 1 University of California, Berkeley.
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. Sait Habib Youssef Junaid A. KhanAimane El-Maleh Department of Computer.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Fuzzy Simulated Evolution for Power and Performance of VLSI Placement Sadiq M. SaitHabib Youssef Junaid A. KhanAimane El-Maleh Department of Computer Engineering.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Fuzzy Evolutionary Algorithm for VLSI Placement Sadiq M. SaitHabib YoussefJunaid A. Khan Department of Computer Engineering King Fahd University of Petroleum.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
ICS 252 Introduction to Computer Design
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Interference Minimization and Uplink Relaying For a 3G/WLAN Network Ju Wang Virginia Commonwealth University May, 2005.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Copyright R. Weber Search in Problem Solving Search in Problem Solving INFO 629 Dr. R. Weber.
Modeling and simulation of systems Simulation optimization and example of its usage in flexible production system control.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
Linear Programming Data Structures and Algorithms A.G. Malamos References: Algorithms, 2006, S. Dasgupta, C. H. Papadimitriou, and U. V. Vazirani Introduction.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
ILP-Based Pin-Count Aware Design Methodology for Microfluidic Biochips Chiung-Yu Lin and Yao-Wen Chang Department of EE, NTU DAC 2009.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
1. Placement of Digital Microfluidic Biochips Using the T-tree Formulation Ping-Hung Yuh 1, Chia-Lin Yang 1, and Yao-Wen Chang 2 1 Dept. of Computer Science.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Outline:  Optimization of Timed Systems  TA-Modeling of Scheduling Tasks  Transformation of TA into Mixed-Integer Programs  Tree Search for TA using.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Wajid Minhass, Paul Pop, Jan Madsen Technical University of Denmark
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 Chapter 6 Reformulation-Linearization Technique and Applications.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Production systems Scheduling of batch processing.
Timetable Problem solving using Graph Coloring
CHaRy Software Synthesis for Hard Real-Time Systems
Algorithms for Budget-Constrained Survivable Topology Design
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca Baldini Moreno Coli ENEA – Computing and Modeling Unit University “La Sapienza” – Electronic Engineering Dep’t

2 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

3 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

4 Problem Statement We want to synthesize a synchronous pipelined system which executes both the task P Sy, sustaining its throughput, and m mutually exclusive tasks P As 1, P As 2, …, P As m whose activations are randomly triggered and whose results must be produced within a prefixed time.

5 Problem Statement We represent the tasks as Control Data Flow Graphs (CDFG) G = (N, E) N = {n 1, n 2, …, n N }: operations of the task E= (data and control dependencies)

6 Problem Statement Aperiodic tasks, characterized by random execution requests and called asynchronous to mark the difference with the synchronous nature of periodic tasks, are subjected to Real-Time constraints (RTC), collected in the set RTC As = {RTC As 1, RTC As 2,..., RTC As m }, where RTC As i contains the RTC on the i th aperiodic task. Input data for the synchronous task P Sy arrive with frequency f i = 1/  t, being  t the period characterizing P Sy.

7 Problem Statement We present a method to determine The target architecture: a (nearly) minimum set of HW devices to execute all the tasks (synchronous and asynchronous); The feasible mapping onto the architecture: the allocation and the scheduling on the HW resources so that P Sy is executed sustaining its prefixed throughput and all the mutually exclusive asynchronous tasks P As 1, P As 2, …, P As m satisfy the constraints in RTC As.

8 Problem Statement The adoption of a parallel system can be mandatory when Real Time Constraints are computationally demanding The iterative arrival of input data makes pipeline systems a very suitable solution for the problem.

9 Problem Statement Example of a pipeline serving the synchronous task P Sy

10 Problem Statement  S k  = (k-1)T ck and  S k  = kT ck In a pipeline with L stages, S L denotes the last stage. DII =  t/T ck 

11 Problem Statement We assume the absence of synchronization delays due to control or data dependencies: Throughput of the pipeline system =1/DII.

12 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

13 Asynchronous events We assume the asynchronous tasks to be mutually exclusive, i.e. the activation of only one asynchronous task can be requested between two successive activations of the periodic task

14 Asynchronous events In red the asynchronous service requests in a pipelined system.

15 Asynchronous events Like the synchronous events, we represent the asynchronous events {P As 1, P As 2,..., P As m } through a set of CDFG ASG = {AsG 1 (N As 1,E As 1 ),..., AsG m (N As m,E As m )}

16 Asynchronous events We consider a unique CDFG made up by composing the graph of the periodic task with the m graphs of the aperiodic tasks: G(N, E) = SyG(N Sy, E Sy )  AsG 1 (N As 1, E As 1 )  AsG 2 (N As 2, E As 2 ) ..…….  AsG m (N As m, E As m )

17 Asynchronous events Aperiodic tasks are subjected to Real-Time constraints (RTC): As all RTC must be respected, mapping function M has to define a scheduling so that -  i  0  RTC As i  RTC As

18 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

19 Mapping methodology In order to develop a pipeline system implementing G a HW resource r j = D(n j ) and a time step S k must be associated to each n j  N

20 Mapping methodology We must determine the mapping function M: N  UR  S UR is the set of the used HW resources (each r j is replicated k j times),

21 Mapping methodology r j = D(n i ) is the HW resource on which n i will be executed S(n i ) is the stage of the pipeline, or the time step, in which n i will be executed

22 Mapping methodology We search for the mapping function M’ which, for a given DII: Respects all the RTC Uses a minimum number ur of resources Gives the minimum pipeline length for the periodic task

23 Mapping methodology The mapping is determined by solving the following minimization problem: C 1 (M) is responsible of the fulfillment of all the RTC C 2 (M) minimizes the used silicon area C 3 (M) minimizes the length of the pipeline.

24 Mapping methodology While searching for a mapping of G, we force the response to aperiodic tasks to be synchronous with the periodic task The execution of an aperiodic task, requested at a generic time instant t 0, is delayed till the next start of the pipeline of the periodic task.

25 Mapping methodology

26 Mapping methodology

27 Mapping methodology In a pipelined system with DII=1 the used resource set is maximum the execution time of each AsG i on the pipeline is minimum A lower bound for the execution time of AsG i is given by the lowest execution time of the longest path of AsG i : Lp As i is such a lower bound, expressed in number of clock cycles

28 Mapping methodology Maximum value allowed for DII, compatible with all the RTC As i  RTC As : Lp As i  T ck gives the minimal execution time for AsG i The deadline associated to AsG i is  i.

29 Mapping methodology Maximum value allowed for DII, compatible with all the RTC As i  RTC As (continued): The request for the aperiodic task can be sensed immediately after the pipeline start, the aperiodic task will begin to be executed DII  T ck seconds after the request: at the next start of the pipeline.

30 Mapping methodology Maximum value allowed for DII, compatible with all the RTC As i  RTC As (continued): A necessary condition to match all the RTC As i  RTC As is that the lower bound of the execution time of each asynchronous task must be smaller than the associated deadline diminished by the DII, i.e.  i  DII  T ck + Lp As i  T ck,  i = 1, 2,..., m

31 Mapping methodology Combining previous relations with a congruence condition between the period of the synchronous task (  t) and the clock period (T ck ), we obtain the set DII p wich contains all the admissible DII values.

32 Mapping methodology Steps of the Mapping methodology: A set of allowed values of DII is determined Sufficient HW resource set UR 0 is determined At the end of optimization process the number of used resources ur could be less than ur 0 if mutually exclusive nodes are contained in the graph

33 Mapping methodology Steps of the Mapping methodology (continued): An initial feasible mapping M 0 is determined; S L0 is the last time step needed to execute P by using M 0. Starting from M 0, we use the Simulated Annealing algorithm to solve the minimization problem

34 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

35 Searching space In order to represent a mapping function M we adopt the formalism based on the Allocation Tables t(M) t(M) is a table with ur horizontal lines and DII vertical sectors Os i with i=1,2,...,DII Each Os i contains time steps S i+kDII (k=0, 1, 2,...) which will be overlapped during the execution of P

36 Searching space Each node is assigned to a cell of t(M), i.e. it is associated to an HW resource and to a time step. For example, we consider the 23-node graph AsG 1

37 Searching space

38 Searching space For DII=3, a possible mapping M is described through the following t(M )

39 Searching space An allocation table t(M) must respect both 1. Causality condition And the 2. Overlapping condition

40 Searching space We define the Ω searching space over which minimization of C(M) must be performed. Ω is the space containing all the feasible allocation tables:  ={t(M) | t(M) is a feasible mapping}; is not feasible.

41 Searching space We can write the minimization problem in terms of the cost associated to the mapping M represented by the allocation table:

42 Searching space We solve the problem by using a Simulated Annealing (SA) algorithm SA requires the generation of a sequence of points belonging to the searching space; each point of the sequence must be close, according to a given measure criterion, to its predecessor and to its successor.

43 Searching space As  consists of allocation tables, we have to generate a sequence of allocation tables t(M i )  Neigh[t(M i-1 )] being Neigh[t(M)] the set of the allocation tables adjacent to t(M) according to some adjacency criteria

44 Searching space Searching space connection: Theorem 2. The  searching space is connected adopting the adjacency conditions.

45 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

46 Optimization by RT-PSA Algorithm We start from a feasible allocation table t(M 0 )  We entrust in the optimization algorithm to find the wanted mapping M

47 Optimization by RT-PSA Algorithm We iterate over all the allowed values of DII The final result of the whole optimization process will be the allocation table characterized by minimal cost.

48 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

49 Results In order to illustrate the results achievable through the presented RT- PSA algorithm, we consider the following graphs

50 Results

51 Results

52 Results

53 Results

54 Results We have N = N Sy + N As 1 + N As 2 + N As 3 = = 126 r 1 = A,r 2 = B,r 3 = C,r 4 = E The execution times and resources areas are T(A) = 10ut, Ar(A) = 10us T(B) = 20ut,Ar(B) = 10us T(C) = 30ut,Ar(C) = 13us T(E) = 40ut, Ar(E) = 15us T r = 5ut,Ar(mr) = 1us

55 Results The input data interarrival period is  t = 150ut We fix the pipeline clock cycle T ck = 50ut RTC are RTC As 1 = 300ut RTC As 2 = 250ut RTC As 3 = 350ut. The set of DII possible values is DII p = {1, 3}

56 Results Results for DII = 3

57 Results Results for DII = 1

58 Outline of Presentation Problem statement Asynchronous events Mapping methodology Searching space Optimization by RT-PSA Algorithm Results Conclusions

59 Conclusions We presented an algorithm to optimize the mapping, into a dedicated pipeline system, of a periodic task P Sy and m mutually exclusive aperiodic tasks P As 1, P As 2, … P As m subjected to real time (RT) constraints

60 Conclusions The algorithm, while searching for a mapping which satisfies all RT constraints of the aperiodic tasks, tries to minimize the number of HW resources needed to implement the system as well the length of the schedule. The mapping optimization is formulated as a minimization problem that has been solved through the Simulated Annealing algorithm.

61 Conclusions Mappings are represented through allocation tables. The searching space, as well adjacency criteria on it and a cost function evaluating the quality of a mapping have been defined. We demonstrated that the searching space containing all the feasible mappings is connected.

62 Remarks