1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model.

1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model

2 Discrete Event u Explicit notion of time (global order…) u Events can happen at any time asynchronously u As soon as an input appears at a block, it may be executed u The execution may take non zero time, the output is marked with a time that is the sum of the arrival time plus the execution time u Time determines the order with which events are processed u DE simulator maintains a global event queue (Verilog and VHDL) Drawbacks s global event queue => tight coordination between parts s Simultaneous events => non-deterministic behavior s Some simulators use delta delay to prevent non-determinacy

3 Simultaneous Events in DE ABC t t Fire B or C? ABC t ABC t t B has 0 delay B has delta delay Fire C once? or twice? t+ Fire C twice. Still have problem with 0-delay (causality) loop Can be refined E.g. introduce timing constraints (minimum reaction time 0.1 s)

5 Co-Design Finite State Machines: Combining FSM and Discrete Event u Synchrony and asynchrony u CFSM definitions s Signals & networks s Timing behavior s Functional behavior u CFSM & process networks u Example of CFSM behaviors s Equivalent classes

6 Codesign Finite State Machine u Underlying MOC of Polis and VCC u Combine aspects from several other MOCs u Preserve formality and efficiency in implementation u Mix s synchronicity t zero and infinite time s asynchronicity t non-zero, finite, and bounded time u Embedded systems often contain both aspects

7 Synchrony: Basic Operation u Synchrony is often implemented with clocks u At clock ticks s Module reads inputs, computes, and produce output s All synchronous events happen simultaneously s Zero-delay computations u Between clock ticks s Infinite amount of time passed

8 Synchrony: Basic Operation (2) u Practical implementation of synchrony s Impossible to get zero or infinite delay s Require: computation time <<< clock period s Computation time = 0, w.r.t. reaction time of environment u Feature of synchrony s Functional behavior independent of timing t Simplify verification s Cyclic dependencies may cause problem t Among (simultaneous) synchronous events

9 Synchrony: Triggering and Ordering u All modules are triggered at each clock tick u Simultaneous signals s No a priori ordering s Ordering may be imposed by dependencies t Implemented with delta steps computation continuous time ticks delta steps

10 Synchrony: System Solution u System solution s Output reaction to a set of inputs u Well-designed system: s Is completely specified and functional s Has an unique solution at each clock tick s Is equivalent to a single FSM s Allows efficient analysis and verification u Well-designed-ness s May need to be checked for each design (Esterel) t Cyclic dependency among simultaneous events

11 Synchrony: Implementation Cost u Must verify synchronous assumption on final design s May be expensive u Examples: s Hardware t Clock cycle > maximum computation time u Inefficient for average case s Software t Process must finish computation before u New input arrival u Another process needs to start computation

12 Pure Asynchrony: Basic Operation u Events are never simultaneous s No two events have the same tag u Computation starts at a change of the input u Delays are arbitrary, but bounded

13 Asynchrony: Triggering and Ordering u Each module is triggered to run at a change of input u No a priori ordering among triggered modules s May be imposed by scheduling at implementation

14 Asynchrony: System Solution u Solution strongly dependent on input timing u At implementation s Events may “appear” simultaneous s Difficult/expensive to maintain total ordering t Ordering at implementation decides behavior t Becomes DE, with the same pitfalls

15 Asynchrony: Implementation Cost u Achieve low computation time (average) s Different parts of the system compute at different rates u Analysis is difficult s Behavior depends on timing s Maybe be easier for designs that are insensitive to t Internal delay t External timing

16 Asynchrony vs. Synchrony in System Design u They are different at least at s Event buffering s Timing of event read/write u Asynchrony s Explicit buffering of events for each module t Vary and unknown at start-time u Synchrony s One global copy of event t Same start time for all modules

17 Combining Synchrony and Asynchrony u Wants to combine s Flexibility of asynchrony s Verifiability of synchrony u Asynchrony s Globally, a timing independent style of thinking u Synchrony s Local portion of design are often tightly synchronized u Globally asynchronous, locally synchronous s CFSM networks

18 CFSM Overview u CFSM is FSM extended with s Support for data handling s Asynchronous communication u CFSM has s FSM part t Inputs, outputs, states, transition and output relation s Data computation part t External, instantaneous functions

19 CFSM Overview (2) u CFSM has: s Locally synchronous behavior t CFSM executes based on snap-shot input assignment t Synchronous from its own perspective s Globally asynchronous behavior t CFSM executes in non-zero, finite amount of time t Asynchronous from system perspective u GALS model s Globally: Scheduling mechanism s Locally: CFSMs

20 12/09/1999 Network of CFSMs: Depth-1 Buffers CFSM2 CFSM3 C=>G CFSM1 C=>F B=>C F^(G==1) (A==0)=>B C=>A CFSM1 CFSM2 C=>B F G C C B A C=>G C=>B u u Globally Asynchronous, Locally Synchronous (GALS) model

21 Introducing a CFSM u A Finite State Machine u Input events, output events and state events u Initial values (for state events) u A transition function à Transitions may involve complex, memory-less, instantaneous arithmetic and/or Boolean functions à All the state of the system is under form of events u Need rules that define the CFSM behavior

22 CFSM Rules: phases u Four-phase cycle: ¶ Idle · Detect input events ¸ Execute one transition ¹ Emit output events u Discrete time s Sufficiently accurate for synchronous systems s Feasible formal verification u Model semantics: Timed Traces i.e. sequences of events labeled by time of occurrence

23 CFSM Rules: phases u Implicit unbounded delay between phases u Non-zero reaction time (avoid inconsistencies when interconnected) u Causal model based on partial order (global asynchronicity) s potential verification speed-up u Phases may not overlap u Transitions always clear input buffers (local synchronicity)

24 Communication Primitives u Signals s Carry information in the form of events and/or values t Event signals: present/absence t Data signals: arbitrary values u Event, data may be paired s Communicate between two CFSMs t 1 input buffer / signal / receiver s Emitted by a sender CFSM s Consumed by a receiver CFSM by setting buffer to 0 s “Present” if emitted but not consumed

25 Communication Primitives (2) u Input assignment s A set of values for the input signals of a CFSM u Captured input assignment s A set of input values read by a CFSM at a particular time u Input stimulus s Input assignment with at least one event present

26 Signals and CFSM u CFSM s Initiates communication through events s Reacts only to input stimulus t except initial reaction s Writes data first, then emits associated event s Reads event first, then reads associated data

27 CFSM networks u Net s A set of connections on the same signal s Associated with single sender and multiple receivers s An input buffer for each receiver on a net t Multi-cast communication u Network of CFSMs s A set of CFSMs, nets, and a scheduling mechanism s Can be implemented as t A set of CFSMs in SW (program/compiler/OS/uC) t A set of CFSMs in HW (HDL/gate/clocking) t Interface (polling/interrupt/memory-mapped)

28 Scheduling Mechanism u At the specification level s Should be as abstract as possible to allow optimization s Not fixed in any way by CFSM MOC u May be implemented as s RTOS for single processor s Concurrent execution for HW s Set of RTOSs for multi-processor s Set of scheduling FSMs for HW

29 Timing Behavior u Scheduling Mechanism s Globally controls the interaction of CFSMs s Continually deciding which CFSMs can be executed u CFSM can be s Idle t Waiting for input events t Waiting to be executed by scheduler s Executing t Generate a single reaction t Reads its inputs, computes, writes outputs

30 Timing Behavior: Mathematical Model u Transition Point s Point in time a CFSM starts executing u For each execution s Input signals are read and cleared s Partial order between input and output s Event is read before data s Data is written before event emission

31 Timing Behavior: Transition Point u A transition point t i s Input may be read between t i and t i+1 s Event that is read may have occurred between t i-1 and t i+1 s Data that is read may have occurred between t 0 and t i+1 s Outputs are written between t i and t i+1 u CFSM allow loose synchronization of event & data s Less restrictive implementation s May lead to non intuitive behavior

32 Event/Data Separation Sender S Receiver R t1t1 t i-1 t2t2 titi t3t3 t4t4 t i+1 Read EventRead Value Write v1EmitWrite v2Emit u Value v1 is lost even though s It is sent with an event s Event may not be lost u Need atomicity

33 Atomicity u Group of actions considered as a single entity u May be costly to implement u Only atomicity requirement of CFSM s Input events are read atomically t Can be enforced in SW (bit vector) HW (buffer) t CFSM is guaranteed to see a snapshot of input events u Non-atomicity of event and data s May lead to undesirable behavior s Atomicized as an implementation trade-off decision

34 Non Atomic Data Value Reading u Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4) u X=4 Y=5 never occurs u Can be remedied if values are sent with events s still suffers from separation of data and event Sender S Receiver R1 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 Receiver R2 X:=4 Y:=4 X:=5 Y:=5 Read X Read Y

35 Atomicity of Event Reading u R1 sees no events, R2 sees X, R3 sees X, Y u Each sees a snapshot of events in time u Different captured input assignment s Because of scheduling and delay Sender S Receiver R1 t1t1 t2t2 t3t3 t4t4 t5t5 Receiver R2 Receiver R3 Emit XEmit YRead

36 Functional Behavior u Transition and output relations s input, present_state, next_state, output u At each execution, a CFSM s Reads a captured input assignment s If there is a match in transition relation t consume inputs, transition to next_state, write outputs s Otherwise t consume no inputs, no transition, no outputs

37 Functional Behavior (2) u Empty Transition s No matching transition is found u Trivial Transition s A transition that has no output and no state changes s Effectively throw away inputs u Initial transition s Transition to the init (reset) state s No input event needed for this transition

38 CFSM and Process Networks u CFSM s An asynchronous extended FSM model s Communication via bounded non-blocking buffers t Versus CSP and CCS (rendezvous) t Versus SDL (unbounded queue & variable topology) s Not continuous in Kahn’s sense t Different event ordering may change behavior u Versus dataflow (ordering insensitive)

39 CFSM Networks u Defined based on a global notion of time s Total order of events s Synchronous with relaxed timing t Global consistent state of signals is required t Input and output are in partial order

40 Buffer Overwrite u CFSM Network has s Finite Buffering s Non-blocking write t Events can be overwritten u if the sender is “faster” than receiver u To ensure no overwrite s Explicit handshaking mechanism s Scheduling

41 Example of CFSM Behaviors u A and B produce i1 and i2 at every i u C produce err or o at every i1,i2 u Delay ( i to o ) for normal operation is n r, err operation 2n r u Minimum input interval is n i u Intuitive “correct” behavior s No events are lost A B C i i1 i2 erro

42 Equivalent Classes of CFSM Behavior u Assume parallel execution (HW, 1 CFSM/processor) u Equivalent classes of behaviors are: s Zero Delay t n r = 0 s Input buffer overwrite t n i  n r s Time critical operation t n i /2  n r  n i s Normal operation t n r  n i /2

43 Equivalent Classes of CFSM Behavior (2) u Zero delay: n r = 0 s If C emits an error on some input t A, B can react instantaneously & output differently s May be logically inconsistent u Input buffers overwrite: n i  n r s Execution delay of A, B is larger than arrival interval t always loss of event t requirements not satisfied

44 Equivalent Classes of CFSM Behavior (3) u Time critical operation: n i /2  n r  n i s Normal operation results in no loss of event s Error operation may cause lost input u Normal operation: n r  n i /2 s No events are lost s May be expensive to implement u If error is infrequent s Designer may accept also time critical operation t Can result in lower-cost implementation

45 Equivalent Classes of CFSM Behavior (4) u Implementation on a single processor s Loss of Event may be caused by t Timing constraints u n i <3n r t Incorrect scheduling u If empty transition also takes n r ACBC round robin will miss event ABC round robin will not

46 Some Possibility of Equivalent Classes u Given 2 arbitrary implementations, 1 input stream: s Dataflow equivalence t Output streams are the same ordering s Petri net equivalence t Output streams satisfy some partial order s Golden model equivalence t Output streams have the same ordering u Except reordering of concurrent events t One of the implementations is a reference specification s Filtered equivalence t Output streams are the same after filtered by observer

47 Conclusion u CFSM s Extension: ACFSM: Initially unbounded FIFO buffers t Bounds on buffers are imposed by refinement to yield ECFSM s Delay is also refined by implementation s Local synchrony t Relatively large atomic synchronous entities s Global asynchrony t Break synchrony, no compositional problem t Allow efficient mapping to heterogeneous architectures

49 Data-flow networks u A bit of history u Syntax and semantics s actors, tokens and firings u Scheduling of Static Data-flow s static scheduling s code generation s buffer sizing u Other Data-flow models s Boolean Data-flow s Dynamic Data-flow

50 Data-flow networks u Powerful formalism for data-dominated system specification u Partially-ordered model (no over-specification) u Deterministic execution independent of scheduling u Used for s simulation s scheduling s memory allocation s code generation for Digital Signal Processors (HW and SW)

51 A bit of history u Karp computation graphs (‘66): seminal work u Kahn process networks (‘58): formal model u Dennis Data-flow networks (‘75): programming language for MIT DF machine u Several recent implementations s graphical: t Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven) t SPW (Cadence), COSSAP (Synopsys) s textual: t Silage (UCB, Mentor) t Lucid, Haskell

52 Data-flow network u A Data-flow network is a collection of functional nodes which are connected and communicate over unbounded FIFO queues u Nodes are commonly called actors u The bits of information that are communicated over the queues are commonly called tokens

53 Intuitive semantics u (Often stateless) actors perform computation u Unbounded FIFOs perform communication via sequences of tokens carrying values s integer, float, fixed point s matrix of integer, float, fixed point s image of pixels u State implemented as self-loop u Determinacy: s unique output sequences given unique input sequences s Sufficient condition: blocking read (process cannot test input queues for emptiness)

54 Intuitive semantics u At each time, one actor is fired u When firing, actors consume input tokens and produce output tokens u Actors can be fired only if there are enough tokens in the input queues

55 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2 i(-1)

61 Intuitive semantics u Example: FIR filter s single input sequence i(n) s single output sequence o(n) s o(n) = c1 i(n) + c2 i(n-1) * c1 +o i * c2

65 Questions u Does the order in which actors are fired affect the final result? u Does it affect the “operation” of the network in any way? u Go to Radio Shack and ask for an unbounded queue!!

66 Formal semantics: sequences u Actors operate from a sequence of input tokens to a sequence of output tokens u Let tokens be noted by x 1, x 2, x 3, etc… u A sequence of tokens is defined as X = [ x 1, x 2, x 3, …] u Over the execution of the network, each queue will grow a particular sequence of tokens u In general, we consider the actors mathematically as functions from sequences to sequences (not from tokens to tokens)

67 Ordering of sequences u Let X 1 and X 2 be two sequences of tokens. u We say that X 1 is less than X 2 if and only if (by definition) X 1 is an initial segment of X 2 u Homework: prove that the relation so defined is a partial order (reflexive, antisymmetric and transitive) u This is also called the prefix order u Example: [ x 1, x 2 ] <= [ x 1, x 2, x 3 ] u Example: [ x 1, x 2 ] and [ x 1, x 3, x 4 ] are incomparable

68 Chains of sequences u Consider the set S of all finite and infinite sequences of tokens u This set is partially ordered by the prefix order u A subset C of S is called a chain iff all pairs of elements of C are comparable u If C is a chain, then it must be a linear order inside S (otherwise, why call it chain?) u Example: { [ x 1 ], [ x 1, x 2 ], [ x 1, x 2, x 3 ], … } is a chain u Example: { [ x 1 ], [ x 1, x 2 ], [ x 1, x 3 ], … } is not a chain

69 (Least) Upper Bound u Given a subset Y of S, an upper bound of Y is an element z of S such that z is larger than all elements of Y u Consider now the set Z (subset of S) of all the upper bounds of Y u If Z has a least element u, then u is called the least upper bound (lub) of Y u The least upper bound, if it exists, is unique u Note: u might not be in Y (if it is, then it is the largest value of Y)

70 Complete Partial Order u Every chain in S has a least upper bound u Because of this property, S is called a Complete Partial Order u Notation: if C is a chain, we indicate the least upper bound of C by lub( C ) u Note: the least upper bound may be thought of as the limit of the the chain

71 Processes u Process: function from a p-tuple of sequences to a q-tuple of sequences F : S p -> S q u Tuples have the induced point-wise order: Y = ( y 1, …, y p ), Y’ = ( y’ 1, …, y’ p ) in S p :Y <= Y’ iff y i <= y’ i for all 1 <= i <= p u Given a chain C in S p, F( C ) may or may not be a chain in S q u We are interested in conditions that make that true

72 Continuity and Monotonicity u Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) exists and F( lub( C ) = lub( F( C ) ) F( lub( C ) = lub( F( C ) ) u Similar to continuity in analysis using limits u Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ X F( X ) F( X ) <= F( X’ ) u Continuity implies monotonicity s intuitively, outputs cannot be “withdrawn” once they have been produced s timeless causality. F transforms chains into chains

73 Least Fixed Point semantics u Let X be the set of all sequences u A network is a mapping F from the sequences to the sequences X = F( X, I ) u The behavior of the network is defined as the unique least fixed point of the equation  If F is continuous then the least fixed point existsLFP = LUB( { F n ( , I ) : n >= 0 } )

74 From Kahn networks to Data Flow networks u Each process becomes an actor : set of pairs of s firing rule (number of required tokens on inputs) s function (including number of consumed and produced tokens) u Formally shown to be equivalent, but actors with firing are more intuitive u Mutually exclusive firing rules imply monotonicity u Generally simplified to blocking read

75 Examples of Data Flow actors u SDF: Synchronous (or, better, Static) Data Flow t fixed input and output tokens u BDF: Boolean Data Flow t control token determines consumed and produced tokens + 1 1 1 FFT 1024 101 merge select TF FT

76 Static scheduling of DF u Key property of DF networks: output sequences do not depend on time of firing of actors u SDF networks can be statically scheduled at compile-time s execute an actor when it is known to be fireable s no overhead due to sequencing of concurrency s static buffer sizing u Different schedules yield different s code size s buffer size s pipeline utilization

77 Static scheduling of SDF u Based only on process graph (ignores functionality) u Network state: number of tokens in FIFOs u Objective: find schedule that is valid, i.e.: s admissible (only fires actors when fireable) s periodic (brings network back to initial state firing each actor at least once) u Optimize cost function over admissible schedules

78 Balance equations u Number of produced tokens must equal number of consumed tokens on every edge u Repetitions (or firing) vector v S of schedule S: number of firings of each actor in S u v S (A) n p v S (B) n c u v S (A) n p = v S (B) n c must be satisfied for each edge npnp ncnc AB

79 Balance equations BC A 3 1 1 1 2 2 1 1 u Balance for each edge: s 3 v S (A) - v S (B) = 0 s v S (B) - v S (C) = 0 s 2 v S (A) - v S (C) = 0

80 Balance equations u M v S = 0 iff S is periodic u Full rank (as in this case) t no non-zero solution t no periodic schedule (too many tokens accumulate on A->B or B->C) 3-10 01-1 20-1 M = BC A 3 1 1 1 2 2 1 1

81 Balance equations u Non-full rank t infinite solutions exist (linear space of dimension 1) u Any multiple of q = |1 2 2| T satisfies the balance equations u ABCBC and ABBCC are minimal valid schedules u ABABBCBCCC is non-minimal valid schedule 2-10 01-1 20-1 M = BC A 2 1 1 1 2 2 1 1

82 Static SDF scheduling u Main SDF scheduling theorem (Lee ‘86): s A connected SDF graph with n actors has a periodic schedule iff its topology matrix M has rank n-1 s If M has rank n-1 then there exists a unique smallest integer solution q to M q = 0 u Rank must be at least n-1 because we need at least n-1 edges (connected-ness), providing each a linearly independent row u Admissibility is not guaranteed, and depends on initial tokens on cycles

83 Admissibility of schedules u No admissible schedule: BACBA, then deadlock… u Adding one token (delay) on A->C makes BACBACBA valid u Making a periodic schedule admissible is always possible, but changes specification... BC A 1 2 1 3 2 3

84 Admissibility of schedules u Adding initial token changes FIR order * c1 +o i * c2 i(-1) i(-2)

85 From repetition vector to schedule u Repeatedly schedule fireable actors up to number of times in repetition vector q = |1 2 2| T q = |1 2 2| T u Can find either ABCBC or ABBCC u If deadlock before original state, no valid schedule exists (Lee ‘86) BC A 2 1 1 1 2 2 1 1

86 From schedule to implementation u Static scheduling used for: s behavioral simulation of DF (extremely efficient) s code generation for DSP s HW synthesis (Cathedral by IMEC, Lager by UCB, …) u Issues in code generation s execution speed (pipelining, vectorization) s code size minimization s data memory size minimization (allocation to FIFOs) s processor or functional unit allocation

87 Compilation optimization u Assumption: code stitching (chaining custom code for each actor) u More efficient than C compiler for DSP u Comparable to hand-coding in some cases u Explicit parallelism, no artificial control dependencies u Main problem: memory and processor/FU allocation depends on scheduling, and vice-versa

88 Code size minimization u Assumptions (based on DSP architecture): s subroutine calls expensive s fixed iteration loops are cheap (“zero-overhead loops”) u Absolute optimum: single appearance schedule e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C) t may or may not exist for an SDF graph… t buffer minimization relative to single appearance schedules (Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)

89 u Assumption: no buffer sharing u Example: q = | 100 100 10 1| T u Valid SAS: (100 A) (100 B) (10 C) D t requires 210 units of buffer area u Better (factored) SAS: (10 (10 A) (10 B) C) D t requires 30 units of buffer areas, but… t requires 21 loop initiations per period (instead of 3) Buffer size minimization CD 110 A B 1 1

90 Dynamic scheduling of DF u SDF is limited in modeling power s no run-time choice t cannot implement Gaussian elimination with pivoting u More general DF is too powerful s non-Static DF is Turing-complete (Buck ‘93) t bounded-memory scheduling is not always possible u BDF: semi-static scheduling of special “patterns” s if-then-else s repeat-until, do-while u General case: thread-based dynamic scheduling (Parks ‘96: may not terminate, but never fails if feasible)

91 Example of Boolean DF u Compute absolute value of average of n samples +1 + - >n TF TF TF TF T TTF <0 TF 00 In Out

92 Example of general DF u Merge streams of multiples of 2 and 3 in order (removing duplicates) u Deterministic merge (no “peeking”) ordered merge * 2* 2 dup 1 * 3* 3 1 A B O out a = get (A) b = get (B) forever { if (a > b) { put (O, a) a = get (A) } else if (a < b) { put (O, b) b = get (B) } else { put (O, a) a = get (A) b = get (B) }

93 Summary of DF networks u Advantages: s Easy to use (graphical languages) s Powerful algorithms for t verification (fast behavioral simulation) t synthesis (scheduling and allocation) s Explicit concurrency u Disadvantages: s Efficient synthesis only for restricted models t (no input or output choice) s Cannot describe reactive control (blocking read)

1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model.

Similar presentations

Presentation on theme: "1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model.

Similar presentations

Presentation on theme: "1Outline u Part 3: Models of Computation s FSMs s Discrete Event Systems s CFSMs s Data Flow Models s Petri Nets s The Tagged Signal Model."— Presentation transcript:

Similar presentations

About project

Feedback