1 Discrete Event u Explicit notion of time (global order…) u DE simulator maintains a global event queue (Verilog and VHDL) u Drawbacks s global event.

Slides:



Advertisements
Similar presentations
Modelos de Computação Básicos Prof. Dr. César Augusto Missio Marcon Parcialmente extraído de trabalhos de Axel Jantch, Edward Lee e Alberto Sangiovanni-Vincentelli.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
The cardiac pacemaker – SystemJ versus Safety Critical Java Heejong Park, Avinash Malik, Muhammad Nadeem, and Zoran Salcic. University of Auckland, NZ.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Giving a formal meaning to “Specialization” In these note we try to give a formal meaning to specifications, implementations, their comparisons. We define.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Timed Automata.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Requirements on the Execution of Kahn Process Networks Marc Geilen and Twan Basten 11 April 2003 /e.
DATAFLOW PROCESS NETWORKS Edward A. Lee Thomas M. Parks.
1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
I MPLEMENTING S YNCHRONOUS M ODELS ON L OOSELY T IME T RIGGERED A RCHITECTURES Discussed by Alberto Puggelli.
PTIDES: Programming Temporally Integrated Distributed Embedded Systems Yang Zhao, EECS, UC Berkeley Edward A. Lee, EECS, UC Berkeley Jie Liu, Microsoft.
Ordering and Consistent Cuts Presented By Biswanath Panda.
A denotational framework for comparing models of computation Daniele Gasperini.
Timing-Based Communication Refinement for CFSMs Presenters:Heloise Hse, Irene Po Mentors:Jonathan Martin, Marco Sgroi Professor:Alberto Sangiovanni-Vincentelli.
Using Interfaces to Analyze Compositionality Haiyang Zheng and Rachel Zhou EE290N Class Project Presentation Dec. 10, 2004.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Distributed Process Management
A Schedulability-Preserving Transformation of BDF to Petri Nets Cong Liu EECS 290n Class Project December 10, 2004.
Review of “Embedded Software” by E.A. Lee Katherine Barrow Vladimir Jakobac.
FunState – An Internal Design Representation for Codesign A model that enables representations of different types of system components. Mixture of functional.
AR vs. CFSM Abdallah Tabbara. CFSM Overview 4 CFSM has: –a finite state machine part –a data computation part –a locally synchronous behavior transitions.
1 Quasi-Static Scheduling of Embedded Software Using Free-Choice Petri Nets Marco Sgroi, Alberto Sangiovanni-Vincentelli Luciano Lavagno University of.
Dataflow Process Networks Lee & Parks Synchronous Dataflow Lee & Messerschmitt Abhijit Davare Nathan Kitchen.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Software Engineering, COMP201 Slide 1 Protocol Engineering Protocol Specification using CFSM model Lecture 30.
Models of Computation for Embedded System Design Alvise Bonivento.
A Denotational Semantics For Dataflow with Firing Edward A. Lee Jike Chong Wei Zheng Paper Discussion for.
Heterochronous Dataflow in Ptolemy II Brian K. Vogel EE249 Project Presentation, Dec. 4, 1999.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
1 System Modeling. Models of Computation. System Specification Languages Alexandru Andrei.
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing.
Mahapatra-A&M-Sprong'021 Co-design Finite State Machines Many slides of this lecture are borrowed from Margarida Jacome.
01/27/2005 Combinationality of cyclic definitions EECS 290A – Spring 2005 UC Berkeley.
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
System-Level Types for Component-Based Design Paper by: Edward A. Lee and Yuhong Xiong Presentation by: Dan Patterson.
Models of Computation Reading Assignment: L. Lavagno, A.S. Vincentelli and E. Sentovich, “Models of computation for Embedded System Design”
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
S. M. Farhad PhD Student Supervisor: Dr. Bernhard Scholz
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Hardware/Software Codesign of Embedded Systems Voicu Groza SITE Hall, Room.
Benjamin Gamble. What is Time?  Can mean many different things to a computer Dynamic Equation Variable System State 2.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
- 1 - Embedded Systems - SDL Some general properties of languages 1. Synchronous vs. asynchronous languages Description of several processes in many languages.
Reliable Communication in the Presence of Failures Based on the paper by: Kenneth Birman and Thomas A. Joseph Cesar Talledo COEN 317 Fall 05.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Ch. 2. Specification and Modeling 2.1 Requirements Describe requirements and approaches for specifying and modeling embedded systems. Specification for.
Mahapatra-A&M-Fall'001 Co-design Finite State Machines Many slides of this lecture are borrowed from Margarida Jacome.
ECE-C662 Lecture 2 Prawat Nagvajara
School of Computer Science, The University of Adelaide© The University of Adelaide, Control Data Flow Graphs An experiment using Design/CPN Sue Tyerman.
Hwajung Lee.  Models are simple abstractions that help understand the variability -- abstractions that preserve the essential features, but hide the.
CSCI1600: Embedded and Real Time Software Lecture 11: Modeling IV: Concurrency Steven Reiss, Fall 2015.
Royal Institute of Technology System Specification Fundamentals Axel Jantsch, Royal Institute of Technology Stockholm, Sweden.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 3: Embedded Computing High Performance Embedded Computing Wayne Wolf.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Model Checking Lecture 1. Model checking, narrowly interpreted: Decision procedures for checking if a given Kripke structure is a model for a given formula.
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
High Performance Embedded Computing © 2007 Elsevier Lecture 4: Models of Computation Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Agenda  Quick Review  Finish Introduction  Java Threads.
Introduction to distributed systems description relation to practice variables and communication primitives instructions states, actions and programs synchrony.
CPE555A: Real-Time Embedded Systems
2. Specification and Modeling
Discrete Event Explicit notion of time (global order…)
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Presentation transcript:

1 Discrete Event u Explicit notion of time (global order…) u DE simulator maintains a global event queue (Verilog and VHDL) u Drawbacks s global event queue => tight coordination between parts s Simultaneous events => non-deterministic behavior u Some simulators use delta delay to prevent non- determinacy

2 Simultaneous Events in DE ABC t t Fire B or C? ABC t ABC t t B has 0 delay B has delta delay Fire C once? or twice? t+ Fire C twice. Still have problem with 0-delay (causality) loop Can be refined E.g. introduce timing constraints (minimum reaction time 0.1 s)

3 Outline u Synchrony and asynchrony u CFSM definitions s Signals & networks s Timing behavior s Functional behavior u CFSM & process networks u Example of CFSM behaviors s Equivalent classes

4 Codesign Finite State Machine u Underlying MOC of Polis u Combine aspects from several other MOCs u Preserve formality and efficiency in implementation u Mix s synchronicity t zero and infinite time s asynchronicity t non-zero, finite, and bounded time u Embedded systems often contain both aspects

5 Synchrony: Basic Operation u Synchrony is often implemented with clocks u At clock ticks s Module reads inputs, computes, and produce output s All synchronous events happen simultaneously s Zero-delay computations u Between clock ticks s Infinite amount of time passed

6 Synchrony: Basic Operation (2) u Practical implementation of synchrony s Impossible to get zero or infinite delay s Require: computation time <<< clock period s Computation time = 0, w.r.t. reaction time of environment u Feature of synchrony s Functional behavior independent of timing t Simplify verification s Cyclic dependencies may cause problem t Among (simultaneous) synchronous events

7 Synchrony: Triggering and Ordering u All modules are triggered at each clock tick u Simultaneous signals s No a priori ordering s Ordering may be imposed by dependencies t Implemented with delta steps computation continuous time ticks delta steps

8 Synchrony: System Solution u System solution s Output reaction to a set of inputs u Well-designed system: s Is completely specified and functional s Has an unique solution at each clock tick s Is equivalent to a single FSM s Allows efficient analysis and verification u Well-design-ness s May need to be checked for each design (Esterel) t Cyclic dependency among simultaneous events

9 Synchrony: Implementation Cost u Must verify synchronous assumption on final design s May be expensive u Examples: s Hardware t Clock cycle > maximum computation time u Inefficient for average case s Software t Process must finish computation before u New input arrival u Another process needs to start computation

10 Asynchrony: Basic Operation u Events are never simultaneous s No two events have the same tag u Computation starts at a change of the input u Delays are arbitrary, but bounded

11 Asynchrony: Triggering and Ordering u Each module is triggered to run at a change of input u No a priori ordering among triggered modules s May be imposed by scheduling at implementation

12 Asynchrony: System Solution u Solution strongly dependent on input timing u At implementation s Events may “appear” simultaneous s Difficult/expensive to maintain total ordering t Ordering at implementation decides behavior t Becomes DE, with the same pitfalls

13 Asynchrony: Implementation Cost u Achieve low computation time (average) s Different parts of the system compute at different rates u Analysis is difficult s Behavior depends on timing s Maybe be easier for designs that are insensitive to t Internal delay t External timing

14 Asynchrony vs. Synchrony in System Design u They are different at least at s Event buffering s Timing of event read/write u Asynchrony s Explicit buffering of events for each module t Vary and unknown at start-time u Synchrony s One global copy of event t Same start time for all modules

15 Combining Synchrony and Asynchrony u Wants to combine s Flexibility of asynchrony s Verifiability of synchrony u Asynchrony s Globally, a timing independent style of thinking u Synchrony s Local portion of design are often tightly synchronized u Globally asynchronous, locally synchronous s CFSM networks

16 CFSM Overview u CFSM is FSM extended with s Support for data handling s Asynchronous communication u CFSM has s FSM part t Inputs, outputs, states, transition and output relation s Data computation part t External, instantaneous functions

17 CFSM Overview (2) u CFSM has: s Locally synchronous behavior t CFSM executes based on snap-shot input assignment t Synchronous from its own perspective s Globally asynchronous behavior t CFSM executes in non-zero, finite amount of time t Asynchronous from system perspective u GALS model s Globally: Scheduling mechanism s Locally: CFSMs

18 12/09/1999 Network of CFSMs: Depth-1 Buffers CFSM2 CFSM3 C=>G CFSM1 C=>F B=>C F^(G==1) (A==0)=>B C=>A CFSM1 CFSM2 C=>B F G C C B A C=>G C=>B u u Globally Asynchronous, Locally Synchronous (GALS) model

19 Introducing a CFSM u A Finite State Machine u Input events, output events and state events u Initial values (for state events) u A transition function à Transitions may involve complex, memory-less, instantaneous arithmetic and/or Boolean functions à All the state of the system is under form of events u Need rules that define the CFSM behavior

20 CFSM Rules: phases u Four-phase cycle: ¶ Idle · Detect input events ¸ Execute one transition ¹ Emit output events u Discrete time s Sufficiently accurate for synchronous systems s Feasible formal verification u Model semantics: Timed Traces i.e. sequences of events labeled by time of occurrence

21 CFSM Rules: phases u Implicit unbounded delay between phases u Non-zero reaction time (avoid inconsistencies when interconnected) u Causal model based on partial order (global asynchronicity) s potential verification speed-up u Phases may not overlap u Transitions always clear input buffers (local synchronicity)

22 Communication Primitives u Signals s Carry information in the form of events and/or values t Event signals: present/absence t Data signals: arbitrary values u Event, data may be paired s Communicate between two CFSMs t 1 input buffer / signal / receiver s Emitted by a sender CFSM s Consumed by a receiver CFSM by setting buffer to 0 s “Present” if emitted but not consumed

23 Communication Primitives (2) u Input assignment s A set of values for the input signals of a CFSM u Captured input assignment s A set of input values read by a CFSM at a particular time u Input stimulus s Input assignment with at least one event present

24 Signals and CFSM u CFSM s Initiates communication through events s Reacts only to input stimulus t except initial reaction s Writes data first, then emits associated event s Reads event first, then reads associated data

25 CFSM networks u Net s A set of connections on the same signal s Associated with single sender and multiple receivers s An input buffer for each receiver on a net t Multi-cast communication u Network of CFSMs s A set of CFSMs, nets, and a scheduling mechanism s Can be implemented as t A set of CFSMs in SW (program/compiler/OS/uC) t A set of CFSMs in HW (HDL/gate/clocking) t Interface (polling/interrupt/memory-mapped)

26 Scheduling Mechanism u At the specification level s Should be as abstract as possible to allow optimization s Not fixed in any way by CFSM MOC u May be implemented as s RTOS for single processor s Concurrent execution for HW s Set of RTOSs for multi-processor s Set of scheduling FSMs for HW

27 Timing Behavior u Scheduling Mechanism s Globally controls the interaction of CFSMs s Continually deciding which CFSMs can be executed u CFSM can be s Idle t Waiting for input events t Waiting to be executed by scheduler s Executing t Generate a single reaction t Reads its inputs, computes, writes outputs

28 Timing Behavior: Mathematical Model u Transition Point s Point in time a CFSM starts executing u For each execution s Input signals are read and cleared s Partial order between input and output s Event is read before data s Data is written before event emission

29 Timing Behavior: Transition Point u A transition point t i s Input may be read between t i and t i+1 s Event that is read may have occurred between t i-1 and t i+1 s Data that is read may have occurred between t 0 and t i+1 s Outputs are written between t i and t i+1 u CFSM allow loose synchronization of event & data s Less restrictive implementation s May lead to non intuitive behavior

30 Event/Data Separation Sender S Receiver R t1t1 t i-1 t2t2 titi t3t3 t4t4 t i+1 Read EventRead Value Write v1EmitWrite v2Emit u Value v1 is lost even though s It is sent with an event s Event may not be lost u Need atomicity

31 Atomicity u Group of actions considered as a single entity u May be costly to implement u Only atomicity requirement of CFSM s Input events are read atomically t Can be enforced in SW (bit vector) HW (buffer) t CFSM is guaranteed to see a snapshot of input events u Non-atomicity of event and data s May lead to undesirable behavior s Atomicized as an implementation trade-off decision

32 Non Atomic Data Value Reading u Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4) u X=4 Y=5 never occurs u Can be remedied if values are sent with events s still suffers from separation of data and event Sender S Receiver R1 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 Receiver R2 X:=4 Y:=4 X:=5 Y:=5 Read X Read Y

33 Atomicity of Event Reading u R1 sees no events, R2 sees X, R3 sees X, Y u Each sees a snapshot of events in time u Different captured input assignment s Because of scheduling and delay Sender S Receiver R1 t1t1 t2t2 t3t3 t4t4 t5t5 Receiver R2 Receiver R3 Emit XEmit YRead

34 Functional Behavior u Transition and output relations s input, present_state, next_state, output u At each execution, a CFSM s Reads a captured input assignment s If there is a match in transition relation t consume inputs, transition to next_state, write outputs s Otherwise t consume no inputs, no transition, no outputs

35 Functional Behavior (2) u Empty Transition s No matching transition is found u Trivial Transition s A transition that has no output and no state changes s Effectively throw away inputs u Initial transition s Transition to the init (reset) state s No input event needed for this transition

36 CFSM and Process Networks u CFSM s An asynchronous extended FSM model s Communication via bounded non-blocking buffers t Versus CSP and CCS (rendezvous) t Versus SDL (unbounded queue & variable topology) s Not continuous in Kahn’s sense t Different event ordering may change behavior u Versus dataflow (ordering insensitive)

37 CFSM Networks u Defined based on a global notion of time s Total order of events s Synchronous with relaxed timing t Global consistent state of signals is required t Input and output are in partial order

38 Buffer Overwrite u CFSM Network has s Finite Buffering s Non-blocking write t Events can be overwritten u if the sender is “faster” than receiver u To ensure no overwrite s Explicit handshaking mechanism s Scheduling

39 Example of CFSM Behaviors u A and B produce i1 and i2 at every i u C produce err or o at every i1,i2 u Delay ( i to o ) for normal operation is n r, err operation 2n r u Minimum input interval is n i u Intuitive “correct” behavior s No events are lost A B C i i1 i2 erro

40 Equivalent Classes of CFSM Behavior u Assume parallel execution (HW, 1 CFSM/processor) u Equivalent classes of behaviors are: s Zero Delay t n r = 0 s Input buffer overwrite t n i  n r s Time critical operation t n i /2  n r  n i s Normal operation t n r  n i /2

41 Equivalent Classes of CFSM Behavior (2) u Zero delay: n r = 0 s If C emits an error on some input t A, B can react instantaneously & output differently s May be logically inconsistent u Input buffers overwrite: n i  n r s Execution delay of A, B is larger than arrival interval t always loss of event t requirements not satisfied

42 Equivalent Classes of CFSM Behavior (3) u Time critical operation: n i /2  n r  n i s Normal operation results in no loss of event s Error operation may cause lost input u Normal operation: n r  n i /2 s No events are lost s May be expensive to implement u If error is infrequent s Designer may accept also time critical operation t Can result in lower-cost implementation

43 Equivalent Classes of CFSM Behavior (4) u Implementation on a single processor s Loss of Event may be caused by t Timing constraints u n i <3n r t Incorrect scheduling u If empty transition also takes n r ACBC round robin will miss event ABC round robin will not

44 Some Possibility of Equivalent Classes u Given 2 arbitrary implementations, 1 input stream: s Dataflow equivalence t Output streams are the same ordering s Petri net equivalence t Output streams satisfied some partial order s Golden model equivalence t Output streams are the same ordering u Except reordering of concurrent events t One of the implementations is a reference specification s Filtered equivalence t Output streams are the same after filtered by observer

45 Conclusion u CFSM s Extension: ACFSM: Initially unbounded FIFO buffers t Bounds on buffers are imposed by refinement to yield ECFSM s Delay is also refined by implementation s Local synchrony t Relatively large atomic synchronous entities s Global asynchrony t Break synchrony, no compositional problem t Allow efficient mapping to heterogeneous architectures

46 Data-flow networks u A bit of history u Syntax and semantics s actors, tokens and firings u Scheduling of Static Data-flow s static scheduling s code generation s buffer sizing u Other Data-flow models s Boolean Data-flow s Dynamic Data-flow

47 Data-flow networks u Powerful formalism for data-dominated system specification u Partially-ordered model (no over-specification) u Deterministic execution independent of scheduling u Used for s simulation s scheduling s memory allocation s code generation for Digital Signal Processors (HW and SW)

48 A bit of history u Karp computation graphs (‘66): seminal work u Kahn process networks (‘58): formal model u Dennis Data-flow networks (‘75): programming language for MIT DF machine u Several recent implementations s graphical: t Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven) t SPW (Cadence), COSSAP (Synopsys) s textual: t Silage (UCB, Mentor) t Lucid, Haskell

49 Data-flow network u A Data-flow network is a collection of functional nodes which are connected and communicate over unbounded FIFO queues u Nodes are commonly called actors u The bits of information that are communicated over the queues are commonly called tokens

50 Intuitive semantics u (Often stateless) actors perform computation u Unbounded FIFOs perform communication via sequences of tokens carrying values s integer, float, fixed point s matrix of integer, float, fixed point s image of pixels u State implemented as self-loop u Determinacy: s unique output sequences given unique input sequences s Sufficient condition: blocking read (process cannot test input queues for emptiness)

51 Intuitive semantics u At each time, one actor is fired u When firing, actors consume input tokens and produce output tokens u Actors can be fired only if there are enough tokens in the input queues

52 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

53 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

54 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

55 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

56 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

57 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

58 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2

59 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2

60 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2

61 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2

62 Questions u Does the order in which actors are fired affect the final result? u Does it affect the “operation” of the network in any way? u Go to Radio Shack and ask for an unbounded queue!!

63 Formal semantics: sequences u Actors operate from a sequence of input tokens to a sequence of output tokens u Let tokens be noted by x 1, x 2, x 3, etc… u A sequence of tokens is defined as X = [ x 1, x 2, x 3, …] u Over the execution of the network, each queue will grow a particular sequence of tokens u In general, we consider the actors mathematically as functions from sequences to sequences (not from tokens to tokens)

64 Ordering of sequences u Let X 1 and X 2 be two sequences of tokens. u We say that X 1 is less than X 2 if and only if (by definition) X 1 is an initial segment of X 2 u Homework: prove that the relation so defined is a partial order (reflexive, antisymmetric and transitive) u This is also called the prefix order u Example: [ x 1, x 2 ] <= [ x 1, x 2, x 3 ] u Example: [ x 1, x 2 ] and [ x 1, x 3, x 4 ] are incomparable

65 Chains of sequences u Consider the set S of all finite and infinite sequences of tokens u This set is partially ordered by the prefix order u A subset C of S is called a chain iff all pairs of elements of C are comparable u If C is a chain, then it must be a linear order inside S (otherwise, why call it chain?) u Example: { [ x 1 ], [ x 1, x 2 ], [ x 1, x 2, x 3 ], … } is a chain u Example: { [ x 1 ], [ x 1, x 2 ], [ x 1, x 3 ], … } is not a chain

66 (Least) Upper Bound u Given a subset Y of S, an upper bound of Y is an element z of S such that z is larger than all elements of Y u Consider now the set Z (subset of S) of all the upper bounds of Y u If Z has a least element u, then u is called the least upper bound (lub) of Y u The least upper bound, if it exists, is unique u Note: u might not be in Y (if it is, then it is the largest value of Y)

67 Complete Partial Order u Every chain in S has a least upper bound u Because of this property, S is called a Complete Partial Order u Notation: if C is a chain, we indicate the least upper bound of C by lub( C ) u Note: the least upper bound may be thought of as the limit of the the chain

68 Processes u Process: function from a p-tuple of sequences to a q-tuple of sequences F : S p -> S q u Tuples have the induced point-wise order: Y = ( y 1, …, y p ), Y’ = ( y’ 1, …, y’ p ) in S p :Y <= Y’ iff y i <= y’ i for all 1 <= i <= p u Given a chain C in S p, F( C ) may or may not be a chain in S q u We are interested in conditions that make that true

69 Continuity and Monotonicity u Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) exists and F( lub( C ) = lub( F( C ) ) F( lub( C ) = lub( F( C ) ) u Similar to continuity in analysis using limits u Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ X F( X ) F( X ) <= F( X’ ) u Continuity implies monotonicity s intuitively, outputs cannot be “withdrawn” once they have been produced s timeless causality. F transforms chains into chains

70 Least Fixed Point semantics u Let X be the set of all sequences u A network is a mapping F from the sequences to the sequences X = F( X, I ) u The behavior of the network is defined as the unique least fixed point of the equation  If F is continuous then the least fixed point existsLFP = LUB( { F n ( , I ) : n >= 0 } )

71 From Kahn networks to Data Flow networks u Each process becomes an actor : set of pairs of s firing rule (number of required tokens on inputs) s function (including number of consumed and produced tokens) u Formally shown to be equivalent, but actors with firing are more intuitive u Mutually exclusive firing rules imply monotonicity u Generally simplified to blocking read

72 Examples of Data Flow actors u SDF: Synchronous (or, better, Static) Data Flow t fixed input and output tokens u BDF: Boolean Data Flow t control token determines consumed and produced tokens FFT merge select TF FT

73 Static scheduling of DF u Key property of DF networks: output sequences do not depend on time of firing of actors u SDF networks can be statically scheduled at compile-time s execute an actor when it is known to be fireable s no overhead due to sequencing of concurrency s static buffer sizing u Different schedules yield different s code size s buffer size s pipeline utilization

74 Static scheduling of SDF u Based only on process graph (ignores functionality) u Network state: number of tokens in FIFOs u Objective: find schedule that is valid, i.e.: s admissible (only fires actors when fireable) s periodic (brings network back to initial state firing each actor at least once) u Optimize cost function over admissible schedules

75 Balance equations u Number of produced tokens must equal number of consumed tokens on every edge u Repetitions (or firing) vector v S of schedule S: number of firings of each actor in S u v S (A) n p v S (B) n c u v S (A) n p = v S (B) n c must be satisfied for each edge npnp ncnc AB

76 Balance equations BC A u Balance for each edge: s 3 v S (A) - v S (B) = 0 s v S (B) - v S (C) = 0 s 2 v S (A) - v S (C) = 0

77 Balance equations u M v S = 0 iff S is periodic u Full rank (as in this case) t no non-zero solution t no periodic schedule (too many tokens accumulate on A->B or B->C) M = BC A

78 Balance equations u Non-full rank t infinite solutions exist (linear space of dimension 1) u Any multiple of q = |1 2 2| T satisfies the balance equations u ABCBC and ABBCC are minimal valid schedules u ABABBCBCCC is non-minimal valid schedule M = BC A

79 Static SDF scheduling u Main SDF scheduling theorem (Lee ‘86): s A connected SDF graph with n actors has a periodic schedule iff its topology matrix M has rank n-1 s If M has rank n-1 then there exists a unique smallest integer solution q to M q = 0 u Rank must be at least n-1 because we need at least n-1 edges (connected-ness), providing each a linearly independent row u Admissibility is not guaranteed, and depends on initial tokens on cycles

80 Admissibility of schedules u No admissible schedule: BACBA, then deadlock… u Adding one token (delay) on A->C makes BACBACBA valid u Making a periodic schedule admissible is always possible, but changes specification... BC A

81 Admissibility of schedules u Adding initial token changes FIR order * c1 +o i * c2 i(-1) i(-2)

82 From repetition vector to schedule u Repeatedly schedule fireable actors up to number of times in repetition vector q = |1 2 2| T q = |1 2 2| T u Can find either ABCBC or ABBCC u If deadlock before original state, no valid schedule exists (Lee ‘86) BC A

83 From schedule to implementation u Static scheduling used for: s behavioral simulation of DF (extremely efficient) s code generation for DSP s HW synthesis (Cathedral by IMEC, Lager by UCB, …) u Issues in code generation s execution speed (pipelining, vectorization) s code size minimization s data memory size minimization (allocation to FIFOs) s processor or functional unit allocation

84 Compilation optimization u Assumption: code stitching (chaining custom code for each actor) u More efficient than C compiler for DSP u Comparable to hand-coding in some cases u Explicit parallelism, no artificial control dependencies u Main problem: memory and processor/FU allocation depends on scheduling, and vice-versa

85 Code size minimization u Assumptions (based on DSP architecture): s subroutine calls expensive s fixed iteration loops are cheap (“zero-overhead loops”) u Absolute optimum: single appearance schedule e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C) t may or may not exist for an SDF graph… t buffer minimization relative to single appearance schedules (Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)

86 u Assumption: no buffer sharing u Example: q = | | T u Valid SAS: (100 A) (100 B) (10 C) D t requires 210 units of buffer area u Better (factored) SAS: (10 (10 A) (10 B) C) D t requires 30 units of buffer areas, but… t requires 21 loop initiations per period (instead of 3) Buffer size minimization CD 110 A B 1 1

87 Dynamic scheduling of DF u SDF is limited in modeling power s no run-time choice t cannot implement Gaussian elimination with pivoting u More general DF is too powerful s non-Static DF is Turing-complete (Buck ‘93) t bounded-memory scheduling is not always possible u BDF: semi-static scheduling of special “patterns” s if-then-else s repeat-until, do-while u General case: thread-based dynamic scheduling (Parks ‘96: may not terminate, but never fails if feasible)

88 Example of Boolean DF u Compute absolute value of average of n samples >n TF TF TF TF T TTF <0 TF 00 In Out

89 Example of general DF u Merge streams of multiples of 2 and 3 in order (removing duplicates) u Deterministic merge (no “peeking”) ordered merge * 2* 2 dup 1 * 3* 3 1 A B O out a = get (A) b = get (B) forever { if (a > b) { put (O, a) a = get (A) } else if (a < b) { put (O, b) b = get (B) } else { put (O, a) a = get (A) b = get (B) }

90 Summary of DF networks u Advantages: s Easy to use (graphical languages) s Powerful algorithms for t verification (fast behavioral simulation) t synthesis (scheduling and allocation) s Explicit concurrency u Disadvantages: s Efficient synthesis only for restricted models t (no input or output choice) s Cannot describe reactive control (blocking read)