Presentation is loading. Please wait.

Presentation is loading. Please wait.

 R. Ernst, TU Braunschweig 1 Embedded System Modeling Part 2 R. Ernst TU Braunschweig.

Similar presentations


Presentation on theme: " R. Ernst, TU Braunschweig 1 Embedded System Modeling Part 2 R. Ernst TU Braunschweig."— Presentation transcript:

1  R. Ernst, TU Braunschweig 1 Embedded System Modeling Part 2 R. Ernst TU Braunschweig

2  R. Ernst, TU Braunschweig 2 Lecture Overview 1Introduction and motivation 2Application modeling 3Target architecture modeling Part 1 Part 2

3  R. Ernst, TU Braunschweig 3 3 Target architecture modeling implementation language Simulink subsystem 2 input language 2 subsystem 3 1 IP UML application development architecture layer application layer application „articulation point“ Part 1: application models Part 2: architecture models implementation

4  R. Ernst, TU Braunschweig 4 Target architecture model application implementation language Simulink subsystem 2 input language 2 subsystem 3 1 IP UML application development architecture layer application layer application „articulation point“ Part 1: application models Part 2: architecture models implementation design under target architecture estimations analysis and result back annotation

5  R. Ernst, TU Braunschweig 5 Architecture modeling purpose implementation verification performance evaluation power consumption evaluation cost evaluation (not part of this lecture)

6  R. Ernst, TU Braunschweig 6 Implementation verification correct implementation of specified function HW/SW co-simulation (CVE, CoWare, CoCentric, VCC), verification correct target architecture parameters processor and communication performance adherence to timing requirements sufficient memory size no run-time dependent dead-locks general design problem challenge to heterogeneous ES design

7  R. Ernst, TU Braunschweig 7 Models for architecture evaluation performance evaluation –performance depends on component types and parameters system architecture clock frequency executed clock cycles transmitted data power consumption evaluation –power consumption depends on component types and parameters component clock frequency component voltage component and system state (sleep,...) executed clock cycles transmitted data architecture application architecture application

8  R. Ernst, TU Braunschweig 8 Embedded architectures different processing element types different interconnection networks and communication protocols different memory types different scheduling and synchronization strategies complex SW architecture M CoP M M PDSP M P

9  R. Ernst, TU Braunschweig 9 Complex run-time interdependencies run-time dependencies of independent components via communication influence on timing and power IP M P M P M

10  R. Ernst, TU Braunschweig 10 ES verification - state of the art current approach: Target architecture co-simulation  combines functional and performance validation  reuse component validation pattern for system integration and function test  reuse application benchmarks for target architecture function validation  visualization of system execution –extensive run-times reduction using models with different levels of abstraction

11  R. Ernst, TU Braunschweig 11 Model abstractions signal abstraction –analog signal, digital signal, value, tagged token, token temporal abstraction –continuous time, discrete time, clock cycle, system event, partially ordered events functional abstraction –logic, RTL, algorithm programming abstraction –instruction set, assembly code, programming language, programming objects see also: VSIA system level design model taxonomy, 98,

12  R. Ernst, TU Braunschweig 12 Co-simulation with different abstraction different component abstraction levels require different simulation techniques co-simulation principle e.g. [Gosh 95] –communication via shared variables (here: nets) –interface requires model adaptation (cp. “type casting” in SW) simulator 1 simulator n global simulation engine mindelay () maxdelay () schedule sim () report_internal_time () update_nets ()... run_until_time ()...

13  R. Ernst, TU Braunschweig 13 Co-simulation limitations identification of system performance corner cases –different from component performance corner cases –target architecture behavior unknown to the application function developer (cp. functional HW test)  test case definition and selection ? analysis of target architecture –confusing variety of run-time interdependencies –data dependent “transient” run-time effects –mixed in co-simulation  limited support of design space exploration  debugging challenge inclusion of incomplete application specifications  additional performance models required

14  R. Ernst, TU Braunschweig 14 Lecture part 2 objective better understanding of target architecture run-time effects propose models to improve and formalize analysis

15  R. Ernst, TU Braunschweig 15 Target architecture evaluation given –an application and its environment modeled by a set of communicating processes –a heterogeneous HW/SW target architecture –an implementation of the processes on the architecture model and evaluate –the target architecture information flow –system timing P P P M CoP M M PDSP M P

16  R. Ernst, TU Braunschweig 16 architecture component timing subsystem timing system timing IP M PMP M P1P1 P2P2 Timing parameters environment model

17  R. Ernst, TU Braunschweig 17 architecture component timing –process execution timing –communication timing IP M P MP M P1P1 P2P2 Architecture component modeling&analysis processing elements memories interconnect

18  R. Ernst, TU Braunschweig 18 Processing elements processing elements –fully programmable components (processors) –weakly programmable coprocessors components with selectable, predefined control sequences, possibly with chaining (FP coprocessor, graphics processor, DMA,...) –hard coded function components DSPRISC CC CAN bus interface ADC multi-channel module image coprocessor VLIW VLD coprocessor

19  R. Ernst, TU Braunschweig 19 Processing element timing processing element timing and communication determined by execution path –control data dependent –input data dependent function implementation –component architecture –software platform –compiler or synthesis if... then...else... for {.....}

20  R. Ernst, TU Braunschweig 20 Processing element timing - 2 process timing can be evaluated by –simulation/performance monitoring, e.g. using break points stimuli, e.g. from component design data dependent execution  upper and lower timing bounds –simulation challenges coverage? cache and context switch overhead due to run-time scheduling with process preemptions influence of run-time scheduling depending on external event timing –formal analysis of individual process timing serious progress in recent years  process timing can be approximated or just estimated (cp. VCC processor models)

21  R. Ernst, TU Braunschweig 21 Formal execution path timing analysis execution path timing then... else { send(..); receive (...);... } for {.....} if... b1b1 b2b2 b3b3 b4b4 F b i basic block (Li/Malik) or program segment (Ye/Ernst) t pe (b i,pe j ) execution time of b i on processing element pe j c(b i )execution frequency of b i path dependent worst/best case timing bounds solved e.g. as ILP problem with flow analysis (e.g. Li/Malik, Wolf/Ye/Ernst)

22  R. Ernst, TU Braunschweig 22 Implementation influence in t pe (b i ) t pe (b i, pe j ) determined by –processing element architecture –compiler / HW synthesis –software platform t pe (b i, pe j ) analysis can use –instruction execution table –abstract execution model –local b i simulation cache effects must be added requires compiler code analysis

23  R. Ernst, TU Braunschweig 23 Process communication execution path communication then... else { send(..); receive (...);... } for {.....} if... b1b1 b2b2 b3b3 b4b4 F s(b i )sent data in b i r(b i )received data in b i worst/best case communication analysis solved e.g. as ILP problem (Wolf/Ernst)

24  R. Ernst, TU Braunschweig 24 Implementation influence in r(b i ) and s(b i ) r(b i ) and s(b i ) determined by –data volume –data encoding –communication protocol

25  R. Ernst, TU Braunschweig 25 Interconnect timing interconnect timing can be evaluated by –simulation, cp. process element timing –statistical load data –simple formal models, e.g. for TDMA (e.g. MicroNetwork (Sonics))  process timing can be approximated or just estimated

26  R. Ernst, TU Braunschweig 26 Formal interconnect analysis word transfer packet transfer (simplified: fixed length pl) pe i pe k ce j send(..)receive(..)

27  R. Ernst, TU Braunschweig 27 Memories SRAM –equal access time –no control overhead FLASH, ROM, EPROM, EEPROM –asymmetric read and write –similar to SRAM otherwise DDRAM, RDRAM, SDRAM, SRAM,... –multiple banks –burst access (packets) Cache –various control mechanisms –burst access to background (cache lines) SDRAM SRAM Flash RAM I$ D$

28  R. Ernst, TU Braunschweig 28 Cache miss overhead cache miss overhead –data cache cm d –program cache cm p pe k SDRA M cm d cm p Flash RAM c dmiss (b i ): number data cache misses in b i c pmiss (b i ): number program cache misses in b i miss analysis uses data flow analysis, abstract interpretation,... pe i I$D$

29  R. Ernst, TU Braunschweig 29 Improved process timing model state and „context“ consideration modeled as process “mode” example: image filter

30  R. Ernst, TU Braunschweig 30 Image filter example image filter process –receives packet with header and image data –performs address match verification –filters picture data –forwards filtered picture data to pe 2 execution contexts (picture size): not considered, large picture, small picture execution contexts (address): not considered, address miss, address match pe 1 header image data filter algorithm pe 2 header

31  R. Ernst, TU Braunschweig 31 [ 25.0, 25.0 ] [ 0, 0 ] [ 24.4, 24.4 ] [ 6.2, 6.2 ][ 6.2, 6.2 ] [ 0, 0 ][ 5.9, 5.9 ][ 6.2, 25.0 ] [ 0, 0 ][ 5.9, 24.4 ] [ 25.0, 25.0 ] [ 0, 24.4 ] [ 6.2, 6.2 ] [ 0, 5.9 ] [ 6.2, 25.0 ] [ 0, 24.4 ] Image filter results Tight send and receive data rate intervals [min, max] of the filter process (results: SYMTA) Size notRec consideredSnd Large PictureRec Snd Small PictureRec Snd Receive Data [kB]Address not AddressAddress Send Data [kB] consideredmissmatch

32  R. Ernst, TU Braunschweig 32 [ 20, 39 ][ 265, 572 ] [ 6, 13 ][ 38, 64 ] [ 6, 40 ][ 38, 681 ] [ 19, 572 ] [ 5, 67 ] Size not considered Large Picture Small Picture Timing [ms]Address notAddressAddress consideredmissmatch [ 5, 681 ] Image filter intervals Timing intervals [min, max] for StrongARM architecture incl. caches (results: SYMTA)

33  R. Ernst, TU Braunschweig 33 Timing and communication model What timing and communication model is appropriate? –worst case? –min/max (interval)? –typical? –statistics?  more information needed

34  R. Ernst, TU Braunschweig 34 Subsystem & component summary analysis of individual process timing as a first step –used as a basis for activation and resource sharing model –approach borrowed from RTOS include local memories and local communication

35  R. Ernst, TU Braunschweig 35 Process execution modeling component data required for execution modeling –process execution time t pe (F, pe j ) –communication load r, s –communication timing t com (s, ce k ) –process activation function environment model

36  R. Ernst, TU Braunschweig 36 Component parameter model commercial model: VCC, based on communicating CFSMs research example: SPI - System Property Intervals –SPI Workbench as environment (TU Braunschweig, ETH Zurich, Univ. Paderborn)

37  R. Ernst, TU Braunschweig 37 architecture component timing –process execution timing –communication timing –activation -> environment model IP M P MP M P1P1 Timing parameters - environment environment model P2P2

38  R. Ernst, TU Braunschweig 38 Environment model periodic events periodic events with jitter events with minimum inter arrival times –burst events, packets, sporadic events, etc. t ei typically timer released tptp tptp t e1 t e2 t e3 t p - j/ 2 t e1 t e2 t e3 t int t min t e1 t e2 t e3 t en t en+1

39  R. Ernst, TU Braunschweig 39 Execution modeling summary event model defines the frequency and context of process activation event frequency given as interval, worst case or statistical value

40  R. Ernst, TU Braunschweig 40 Chip Bus core RTOS Software architecture I/OInt Bus- CTRL timer drivers RTOS-APIs application periphery cache mem private shared hardware software architecture application layered software architecture with API consider function call hierarchy ce 1 pe 1

41  R. Ernst, TU Braunschweig 41 architecture component timing –process execution timing –communication timing –activation -> environment model –timing includes all HW & SW components for process execution M P MP IPM environment model Timing parameters - SW architecture P1P1 core RTOS I/OInt Bus- CTRL timer drivers RTOS-APIs include software architecture

42  R. Ernst, TU Braunschweig 42 include SW architecture in process model resolve APIs, drivers, OS calls, memory accesses, etc. include multi-hop communication Release Airbag Application example Actuator chip-bus Sensor RTOS I/Oint bus- CTRL timer core drivers RTOS-APIs application cache MEM RTOS core drivers RTOS-APIs application I/Oint bus- CTRL timer cache MEM Crash P ctrl C P sens. P detc C C P act. C C reaction time of airbag after crash ? t sens t crash + t csens + t detc + t fbus + t cact + t airbag + t act ++ t ctrl t act + t airbag + physical delay t sens + t crash + physical delay

43  R. Ernst, TU Braunschweig 43 Actuator chip-bus Sensor RTOS I/Oint bus- CTRL timer core drivers RTOS-APIs application cache MEM RTOS core drivers RTOS-APIs application I/Oint bus- CTRL timer Release Airbag Timing refinement Crash P ctrl C P sens. P detc C C P act. C C Reaction time of airbag after crash ? t com + t drv = t API + t process + t API = t drv + t com + t drv = t API + t process + t API = t drv + t com = t sens t crash + t csens + t detc + t fbus + t cact + t airbag + t act ++ t ctrl t act + t airbag + physical delay t sens + t crash + physical delay t com + t drv t API + t process + t API t drv + t com + t drv t API + t process + t API t drv + t com t exe t com = f (code, core, cache, mem, etc.) = f (amount of data, width, speed, etc.) cache MEM

44  R. Ernst, TU Braunschweig 44 Modeling shared resources resource sharing requires –resource arbitration - scheduling static or dynamic order of processes preemptive (interrupt) or non-preemptive („run to completion“) –context switching on pe: process context switch on ce: packet or connection setup overhead in memory: similar to ce –context switching time t csw can be determined at design time –resource arbitration effects are run-time dependent

45  R. Ernst, TU Braunschweig 45 architecture component timing subsystem timing –resource sharing process scheduling communication scheduling Timing parameters - resource sharing IPM M PMP environment model P1P1 P2P2

46  R. Ernst, TU Braunschweig 46 Scheduling strategies static execution order time driven scheduling –fixed –dynamic priority driven scheduling –static priority assignment –dynamic priority assignment efficiency depends on environment model (periodic, jitter, burst)

47  R. Ernst, TU Braunschweig 47 Static execution order scheduling C1C1 P3P3 P1P1 P2P2 tptp t p : scheduling period P1P1 P3P3 C1C1 P4P4 C2C2 P5P5 P2P2 pe 1 pe 2 ce 1 pe 2 ce 1 architecture example P5P5 C2C2 P4P4 tcsw

48  R. Ernst, TU Braunschweig 48 Static execution order scheduling - 2 execution time: sum of process times + t CSW + t com best suited if timing and control are input data independent supports –interleaved resource utilization –buffer size optimization –compiler optimization across processes –process context exploitation application example: –DSP P1P1 P2P2 P 12 merged processes

49  R. Ernst, TU Braunschweig 49 Static execution order scheduling - 3 different event timing models –periodic input events  periodic output events with jitter –sporadic events  sporadic output events static order problems –dynamic environments: jitter, bursts –deadline requirements –processes/communication with context dependent timing

50  R. Ernst, TU Braunschweig 50 Time driven scheduling time division multiple access (TDMA) –periodic assignment of fixed time slots –applicable to pe or ce P1P2P3P4P1P2P3P4 t pTDMA P1P1 P 1 -P 4 P2P2 P4P4 P3P3 12 t P1 t P4 13 t pTDMA process preempted TDMA example t P2 10 t P

51  R. Ernst, TU Braunschweig 51 TDMA predictable and independent performance down scaling allows to merge individual solutions time slot size adaptable to different service levels supports all input event timing models generates output jitter as a result of execution times

52  R. Ernst, TU Braunschweig 52 TDMA predictable and independent performance made TDMA very popular –safety critical automotive systems: TTP, Flexray –Sonics SiliconBackplane µNetworks problems –utilization –extended deadlines

53  R. Ernst, TU Braunschweig TDMA scheduling example 12 P1P1 P2P2 P4P4 P3P P 1 -P 4 P2P2 t P1,response = t 10 idle resource t pTDMA Scheduling and idle times in TDMA t CSW omitted for simplicity

54  R. Ernst, TU Braunschweig 54 Dynamic time driven scheduling example: Round Robin scheduling –cyclic process execution –resource released when task is finished or not activated t P1,response = P1P1 P2P2 P4P4 P3P P 1 -P t t RR (1) t RR (2) t RR (3) cycle 1cycle 2cycle 3 Round Robin example

55  R. Ernst, TU Braunschweig 55 Round Robin scheduling no idle times - higher efficiency than TDMA guaranteed minimum resource assignment per process –appropriate e.g. for soft deadlines (“best effort”) and QoS requirements supports all input event timing models creates output jitter and possibly output bursts problems –process execution interdependency –timing analysis more difficult than TDMA

56  R. Ernst, TU Braunschweig 56 Priority driven scheduling Static priority assignment –model 1 multi rate periodic input events with jitter and deadlines at end of period –typical scheduling approach: Rate-monotonic scheduling (RMS, priority decreases with period) –optimal solution for single processors (Liu, Layland) –model 2 like model 1 but with process dependencies –solution for multiprocessors based on RMS: Yen, Wolf –model 3 like model 1 but arbitrary deadlines –timing analysis solution by Lehoczky –iterative algorithm for priority assignment (Audsley) –analysis for jitter and bursts (single processor) by Tindell

57  R. Ernst, TU Braunschweig 57 Model 3 - arbitrary deadlines scheduling with arbitrary deadlines - may create output bursts for periodic input events busy period T1T1 T2T2 T2T2 T2T2 T2T2 P3P3 P1P1 P2P2 priority Static priority scheduling with arbitrary deadlines

58  R. Ernst, TU Braunschweig 58 Burst generation P3P3 P1P1 burst P2P2 jitter burst priority output eventsinput events Output bursts in static priority scheduling with arbitrary deadlines and periodic input events

59  R. Ernst, TU Braunschweig 59 Dynamic priority assignment priority assignment at run time optimal priority assignment: Earliest Deadline First (EDF) EDF adapts to input event timing problem: –requires run-time scheduler task  overhead –requires deadline determination (e.g. Ziegenbein, ICCAD 2000) –difficult to analyze - active research topic (e.g. load models)

60  R. Ernst, TU Braunschweig 60 Resource sharing - summary individual component and process data required formal analysis for numerous scheduling strategies, event models and constraints available (even for manual calculation) results can be used to calculate –communication and processor load –timing, e.g. response times, output event models –memory requirements –power consumption

61  R. Ernst, TU Braunschweig 61 architecture component timing subsystem timing system timing –timing of coupled subsystems Architecture and global analysis M PMP environment model P1P1 P2P2 SB 1 schedule SB 2 schedule SB 1 SB 2 IPM

62  R. Ernst, TU Braunschweig 62 Subsystem coupling independently scheduled subsystems are coupled by data flow  subsystems coupled by event streams  coupling corresponds to event propagation SB 1 scheduling SB 1 P2P2 P1P1 SB 2 scheduling SB 2 P4P4 P3P3

63  R. Ernst, TU Braunschweig 63 System timing analysis approaches analysis scope extension to several subsystems event model generalization for a set of scheduling strategies event model adaptation

64  R. Ernst, TU Braunschweig 64 Analysis scope extension coherent analysis („holistic“ approach) example: Tindell 94, Pop/Eles (DATE 2000): TDMA + static priority problem: large number of combinations possible! P2P2 P1P1 T TTP bus interface P3P3 P4P4 D TTP bus interface queue RTOS TTP bus (TDMA) static priority process scheduling static priority queueing T: Transmitter process

65  R. Ernst, TU Braunschweig 65 ParseSearchModifySchedule paket processing characteristics –dynamic load –interleaved flows of packets –flow-specific task chains –event bursts –variable data sizes Event model generalization Example: Network processor design (Thiele et al.)

66  R. Ernst, TU Braunschweig 66 Network process generalized event model Arrival/service curves (L. Thiele) SB 1 scheduling SB 1 P2P2 P1P1 upper bound number of packets in any interval of length 4 number of packets in time interval [0,4] lower bound number of packets in any interval of length 4 arrival service  u ( 

67  R. Ernst, TU Braunschweig 67 Load analysis with interval event model Load analyis: Addition propagation of computation and load intervals - min/max algebra (time dependent) e.g. buffer size: difference between input load and processed events HW and SW resources resource bounds input stream bounds remaining resourcesprocessed packet streams source: L. Thiele, ETH Zurich L. Thiele, ETH Zurich

68  R. Ernst, TU Braunschweig 68 Event model adaptation idea: –leverage on the many efficient scheduling strategies available for different domains –utilize their corresponding analysis techniques –adapt their event models for combination –combine local results to global analysis SB 1 scheduling SB 1 P2P2 P1P1 SB 2 scheduling SB 2 P4P4 P3P3 output event models input event models

69  R. Ernst, TU Braunschweig 69 Event model adaptation - 2 adaptation for compatible input and output event models (Richter 02) –simple mathematical transformation to adapt analysis Event Model InterFace (EMIF) –no added function, no run-time effect

70  R. Ernst, TU Braunschweig 70 Event model adaptation - 3 compatibility is non symmetric –example: output model periodic  input model jitter compatible (jitter =0) output model jitter  input model periodic incompatible compatible model adaptation can be „lossy“ –some of the event bounds get lost –example: output model periodic  output model sporadic  only minimum event distance is translated (= period), maximum event distance is lost

71  R. Ernst, TU Braunschweig 71 Event model adaptation - 4 compatible event models compatible lossy

72  R. Ernst, TU Braunschweig 72 Event model adaptation - 5 adaptation for non-compatible input and output event models –insert Event Adaptation Function (EAF) to transform models –EAF describes interface buffer function  shows that buffer is required to couple subsystems pe 1 pe 2 transformation: T 2 = T 1 /n periodic with burstperiodic EAF T1T1 e1e1 enen e n+1 T2T2 e1e1

73  R. Ernst, TU Braunschweig 73 BUS CPU 2 CPU 1 CPU 4 CPU 3 P1P1 P2P2 C2C2 C1C1 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 burst sporadic periodic EMIF EAF EMIF EAF EMIF EAF EMIF EAF lossless lossy periodic burst Identification of complex interdependencies X-talk

74  R. Ernst, TU Braunschweig 74 Putting it all together buses modeled as communication tasks bus C2C2 C1C1 P1P1 P5P5 P2P2 P6P6 P7P7 P3P3 P8P8 P4P4 EAF

75  R. Ernst, TU Braunschweig 75 Finally - timing and communication model What timing and communication model is appropriate? –worst case? –min/max (interval)? –typical? –statistics?

76  R. Ernst, TU Braunschweig 76 Timing and communication model - 2 statistical or „typical“ timing and communication load –summarizes context dependent timing and communication variations typical or average communication load and time can simply be accumulated replaces discrete simulation by statistical analysis –challenges: (buffer) memory overflow not fully covered communication link overload not fully covered critical deadlines not covered  not safe

77  R. Ernst, TU Braunschweig 77 Timing and communication model - 3 worst case timing and communication load –conservative –risk of overdesign precise worst case required combination with context dependent execution useful but: worst case timing alone is not safe, too!

78  R. Ernst, TU Braunschweig 78 Worst-case-timing-only problems bus load calculation –faster process execution causes faster output event sequence  higher transient bus load (burst)  minimum execution time defines maximum bus load multiprocessor scheduling anomaly fast process blocks critical process  only interval models for communication and time are safe

79  R. Ernst, TU Braunschweig 79 Conclusions HW/SW platforms show complex run-time behavior due to optimized resource sharing implementation creates dependencies which are not reflected in the system function influence on timing and memory usage may compromize correct system behavior simulation for performance analysis is increasingly inadequate proposed systematic approach to platform analysis exploiting knowledge from RTOS reviewed different techniques to investigate subsystem compositions techniques can also be used for estimation

80  R. Ernst, TU Braunschweig 80 Acknowledgement The following persons contributed in developing this tutorial –Bettina Boettger –Marek Jersak –Kai Richter –Fabian Wolf –Dirk Ziegenbein

81  R. Ernst, TU Braunschweig 81 Literature see:

82  R. Ernst, TU Braunschweig 82 SPI - Parameters  data intervals on communication channels d C,init, d C  process execution modes consider execution context  activation functions A P 1, A P 2  latency time intervals lat P 1, lat P 2, lat C lat P 1 lat P 2 lat C  data rate intervals s C, r C d C,init, d C AP1AP1 AP2AP2 LCLC P1P1 P1P1 P2P2 P2P2 C C sCsC rCrC process mode dependent


Download ppt " R. Ernst, TU Braunschweig 1 Embedded System Modeling Part 2 R. Ernst TU Braunschweig."

Similar presentations


Ads by Google