Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictable Design of Embedded Systems using Networked Architectures Henk Corporaal www.ics.ele.tue.nl/~heco ASCI Winterschool on Embedded Systems Rockanje,

Similar presentations


Presentation on theme: "Predictable Design of Embedded Systems using Networked Architectures Henk Corporaal www.ics.ele.tue.nl/~heco ASCI Winterschool on Embedded Systems Rockanje,"— Presentation transcript:

1 Predictable Design of Embedded Systems using Networked Architectures Henk Corporaal www.ics.ele.tue.nl/~heco ASCI Winterschool on Embedded Systems Rockanje, March 2006

2 ASCI Winterschool 2006Henk Corporaal(2) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design  Proposed design flow  Open issues Note: this lecture is not about a solved problem

3 ASCI Winterschool 2006Henk Corporaal(3) Outline  Trends and design problems Embedded systems everywhere Design practice Design complexity Memory wall  Unpredictability  Platforms  Predictable design  Design flow  Open issues

4 ASCI Winterschool 2006Henk Corporaal(4) Embedded systems everywhere  Convergence of 3 Cs computers, communications and consumer electronics  The computer enters the 3rd fase computing power - networking - intelligent processing  The world is 1 network wherever, whenever, all information and communication available We get a smart environment

5 ASCI Winterschool 2006Henk Corporaal(5) Integration Task System people C ASM Software people vhdl verilog Hardware people Paper spec Design practice: Informal system specification

6 ASCI Winterschool 2006Henk Corporaal(6) Design practice Logic System Algorithm R/T circuit Behavioral specification Structure description Physical realization Y-Chart (Gajski-Kuhn)  Design Flow is path in Y chart  Till RT-level largely manual flow

7 ASCI Winterschool 2006Henk Corporaal(7) Design complexity problem 10 3 10 2 10 1 481216 year complexity HW gap SW gap Process technology + 58% HW design productivity +21 % SW productivity + 8 %

8 ASCI Winterschool 2006Henk Corporaal(8) Hitting the memory wall µProc: 55%/yea r CPU DRAM: 7%/year DRAM 1 10 100 1000 19801985199019952000 Processor-Memory Performance Gap: (grows 50% / year) Performance Time “Moore’s Law” [Patterson] 2005

9 ASCI Winterschool 2006Henk Corporaal(9) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design  Proposed design flow  Open issues

10 ASCI Winterschool 2006Henk Corporaal(10) Unpredictability at all levels architectures DSM VLSI design applications Uncertainty increases at all levels

11 ASCI Winterschool 2006Henk Corporaal(11) Application: Two forms of unpredictability Txt gen memHSRCVSRC Video In1 NRHSRCVSRC mem mix100HzPeakMatrix Video In2 NRHSRCVSRC mix mem  Applications can be data dependent  Applications may have different scenarios

12 ASCI Winterschool 2006Henk Corporaal(12) In addition: dynamic changing set of applications Multi-standard modem operation  Several applications have to be activated simultaneously Too many combinations for an analysis at design time (non deterministic events) CPICH search RAKE sym-rate proc. UMTS connected SCH RAKE chip-rate processing UMTS connected/ WLAN acquisition CPICH search RAKE sym-rate proc. SCH RAKE chip-rate processing WLAN acquisition CPICH search Initial acquisition SCH WLAN connected/ UMTS monitoring WLAN receiver CPICH search SCH Inter-system handover time SCH = SCH search 25 50 75 100 125 Compute load  [Philips EVP]

13 ASCI Winterschool 2006Henk Corporaal(13) ………… interconnect mem arb. interconnect ext. mem IP cpu$ $ Architecture unpredictability Local schedulers:  OS task switching interrupts  cache strategy cache pollution  interconnect busses, bridges networks  memory controllers external memory e.g. RR, TDMA, FCFS, LRU, EDLF, FIFO, priority, … What is the global behavior (end-to-end), composed of interacting local solutions ?

14 ASCI Winterschool 2006Henk Corporaal(14) DSM VLSI Unpredictability  Global wiring delay becomes dominant over gate delay (timing closure)

15 ASCI Winterschool 2006Henk Corporaal(15) DSM VLSI Unpredictability Other DSM problems:  Clock distribution, skew  VDD and VSS voltage drop  Signal integrity, cross-talk  Variance in process parameters increases Length of Isosynchronous zone as function of frequency

16 ASCI Winterschool 2006Henk Corporaal(16) Unpredictability: Design Closure problems Design closure =  a realization meets all requirements, including functionality, speed, power, area, yield, etc., without design iterations application architecture FPGA realizationVLSI realization mapping & scheduling placement & routing Closure problem at all levels

17 ASCI Winterschool 2006Henk Corporaal(17) Unpredictability: Design Closure problems 0% 200% 400% 600% 800% 1000% 1200% Orders of Magnitude Time → Computational Requirements → Mapping with performance guarantees looks impossible !!

18 ASCI Winterschool 2006Henk Corporaal(18) Solution ingredients:  Higher abstraction levels  SW and HW IP reuse / PnP principle Standards  Avoid large design iterations Design correct by synthesis  Avoid worst case resource requirements How do we achieve all of this?

19 ASCI Winterschool 2006Henk Corporaal(19) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design  Design flow  Open issues

20 ASCI Winterschool 2006Henk Corporaal(20) What is a platform? Definition: A platform is a generic, but domain specific information processing (sub-)system Generic means that it is flexible, containing programmable component(s). Platforms are meant to quickly realize your next system (in a certain domain). Single chip?

21 ASCI Winterschool 2006Henk Corporaal(21) Platforms, why? -Reuse -Short Time-to-Market -High Quality Flexible and Programmable Large software component Standardization Optimized for specific domain and you do not have to solve this design closure problem !!

22 ASCI Winterschool 2006Henk Corporaal(22) Platforms separate the design communities ! Platform Enabling technologies Applications Design technology SDT system design technology PDT platform design technology

23 ASCI Winterschool 2006Henk Corporaal(23) Platform examples: Digital camera Sanyo [Okada99]

24 ASCI Winterschool 2006Henk Corporaal(24) TI OMAP Up to 192Mbyte off-chip memory 64Kb dual port (8x 4K x 16b) 96Kb single port (12x 4k x 16b) 32Kb ROM 16Kb (2-way) 8Kb mem (2x 4K) 8Kb data cache (2-way, 512 lines of 16 bytes) Write buffer (17 elements) 16Kb (2-way) 192Kbyte shared SRAM

25 ASCI Winterschool 2006Henk Corporaal(25) SpaceCake (Philips research)  Homogeneous: set of equal tiles  Per tile e.g.: n * MIPS m * TriMedia Accelerators k * L2 Cache bank Shared memory Cache coherency Big interconnect switch  Inter Tile: Router Message passing Working on inter tile cache coherence switch L2 cache memory banks Single tile

26 ASCI Winterschool 2006Henk Corporaal(26) IMAGINE Stream Processor (Stanford)  IMAGINE = SIMD of VLIWs  It is controlled by a host processor, which send it stream instructions (Load, store, receive, send, VLIW op, load microcode)

27 ASCI Winterschool 2006Henk Corporaal(27) Hybrid FPGAs: Xilinx Virtex 4-Pro ReConfig. logic PowerPCs Courtesy of Xilinx (Virtex II Pro) PowerPC Reconfigurable logic blocks Memory blocks & Multipliers GHz IO: Up to 16 serial transceivers

28 ASCI Winterschool 2006Henk Corporaal(28) Fundamental platform design decisions  Homogeneous versus Heterogeneous ?  Bus versus Network ?  Shared memory versus Message passing ?  QoS support, Guarantees built-in ?  Generic versus Application specific ?  What types of parallelism to support ? ILP, DLP, TLP  Focus on Performance, Power or Cost ?  Memory organisation ?  HW or SW reconfigurable ? And further:  OS support, Middleware ?  Mapping support?

29 ASCI Winterschool 2006Henk Corporaal(29) Homogeneous or Heterogeneous  Homogenous: replication effect memory dominated any way solve realization issues once and for all less flexible

30 ASCI Winterschool 2006Henk Corporaal(30) Homogeneous or Heterogeneous  Heterogeneous more flexible better fit to application domain smaller increments no tile reuse

31 ASCI Winterschool 2006Henk Corporaal(31) Homogeneous or Heterogeneous  Middle of the road approach Flexibile tiles Fixed tile structure at top level tile router

32 ASCI Winterschool 2006Henk Corporaal(32) HW or SW reconfigurable? Data path granularity finecoarse Reconfiguration time 1 cycle Subword parallelism loopbuffer context reset Spatial mapping Temporal mapping FPGAVLIW configuration bandwidth

33 ASCI Winterschool 2006Henk Corporaal(33) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design Current practise Predictability Architecture consequences Design consequences  Design flow  Open issues

34 ASCI Winterschool 2006Henk Corporaal(34) How should we design ?  Trajectory, from Idea to Realization  Desicions based on models Abstract from implementation details (not all known yet) Relatively cheap to create, validate and simulate Generate Ideas Construct Models Evaluate Properties Make Design Decisions “Steers” Design Problem Design Time Concepts Requirements Realization Idea

35 ASCI Winterschool 2006Henk Corporaal(35) Current practice Mapping, easy, but...........  Given reference C code for application e.g. MPEG-4 Motion Estimation platform: SUPERDUPER-LX50  Task map application on architecture  But … wait a moment me@work> CC –o2 mpeg4_me mpeg4_me.c Thank you for running SUPERDUPER-LX50 compiler. Your program uses 257321886 bytes memory, 78 Watt, 428798765291 clock cycles a=b*5+d; for (...) {.. } Idea

36 ASCI Winterschool 2006Henk Corporaal(36) Current design process  Post analysis: check constraints after mapping  Simulation based  Does it still work for other data ?  Does it still work when other applications are active ?  Too many iterations Easy to program, hard to tune  Can this be improved ? e.g. Constraints = input mapping OK ? constraints no yes application

37 ASCI Winterschool 2006Henk Corporaal(37) Predictable design What is it?  Being able to reason at a high level about a design (in terms of functional and non-functional properties) and  Being able to realize this design without time consuming iterations in the design flow (design closure) How:  Predictable architecture Making resources predictable Proper modeling of less predictable elements  Predictable design flow Compositionality Composability Design time analysis  Run time analysis

38 ASCI Winterschool 2006Henk Corporaal(38) Making architectures predictable  Getting rid of all unpredictable elements  Caches ? No problem, but WCET estimation may be big and unacceptable ! Software controlled locked cache lines non-cachable memory controlled replacement  Shared memory  Communication

39 ASCI Winterschool 2006Henk Corporaal(39) IP Network Interface R R R RRRR R R Network Interface IP Network Interface IP Router Network Router provides both guaranteed throughput (GT) and best effort (BE) services to communicate with IPs. Combination of GT and BE leads to efficient use of bandwidth and simple programming model. Making architectures predictable: NoC Philips AETHEREAL

40 ASCI Winterschool 2006Henk Corporaal(40) Making the NoC predictable: how to support GT traffic? Time wheel concept  control injection traffic at network interface 1 2 3 4 5 6 7 8 time

41 ASCI Winterschool 2006Henk Corporaal(41) Making the design flow predictable : Compositionality High level design Low level design x y z a b x y z a b P(x,y) if [P(a,b),...] ! P(x,y) if [P(a,b),...] ?

42 ASCI Winterschool 2006Henk Corporaal(42) Making the design flow predictable  Design time Determine of upper bounds on time and resources  pareto curves Scenario discovery: separate your application in parts for which upper bounds not too far from worst case Freq Load Sc1 Sc2 Sc3

43 ASCI Winterschool 2006Henk Corporaal(43) What do we want ? Design time analysis Single application Reasoning about end-to-end timing constraints (for given resources and quality) = predictability Which local arbitration mechanisms are needed ? How to translate this to the global level ? A1A2 P1P2 A3 A4 A5 Example: Given Comp. Resources Bandwidth Buffer size  Throughput  Pareto curve P3P4 1/Throughput Cost (resources) (q1,c1)

44 ASCI Winterschool 2006Henk Corporaal(44) Scenarios: MP3

45 ASCI Winterschool 2006Henk Corporaal(45) What do we want ? Composability  Multiple applications If app. 1 and app. 2 fit each individually, what can be said about the combination ? Concept of virtual platform A1A2 Proc1Proc2 A3A4

46 ASCI Winterschool 2006Henk Corporaal(46) Predictability: Composability Can we add Pareto points? Q Cost (resources) Q + application 1application 2 (q1,c1)(q2,c2) (q1+q2,c1+c2) ?

47 ASCI Winterschool 2006Henk Corporaal(47) Problem: Predictable Resource utilization? 50 A B P1P2P3 Mapping & Scheduling

48 ASCI Winterschool 2006Henk Corporaal(48) Problem – Predictable Resource utilization? 50 A B t 1 t 0 t 2 t 3 A B P1 P2 P3 Scheduling conflict! Only 50% processor utilization ! Add ordering dependences (edges)

49 ASCI Winterschool 2006Henk Corporaal(49) Where is the problem?  Different throughput obtained for different order of actors  Possibilities of overall graph increases exponentially with number of actors and individual graphs  Very difficult to do a complete analysis to obtain an optimal order  Hard to model and analyze different arbitration strategies realistically

50 ASCI Winterschool 2006Henk Corporaal(50) Problem – Too many possibilities! A B C

51 ASCI Winterschool 2006Henk Corporaal(51) So, what is Composability?  The degree to which we can analyze the applications in isolation: Throughput, Latency, Resource utilization, Deadlock, Switching / reconfiguration overhead, etc.  Design time analysis for complete system is too expensive and often infeasible  Each job should be executed as if it had access to its own dedicated resources – Virtualization  Consider applications separately and then reason about the behavior of overall system

52 ASCI Winterschool 2006Henk Corporaal(52) Providing a Bound for Resources  Arbitration strategy plays an important role in determining resource requirement  A naive strategy leads to over-estimation of resources  Worst-case estimate is not always possible  Need predictable arbitration mechanism More ‘realistic’ worst case bounds Handle dynamism in the system  An overall quality versus resources Pareto curve needed

53 ASCI Winterschool 2006Henk Corporaal(53) Local manager Application n Making the design flow predictable: Run-time aspects  Scalable applications  QoS management Local manager Global manager Application n / Scenario m Platform QoS protocol

54 ASCI Winterschool 2006Henk Corporaal(54) Match quality with resources Quality -1 → Computational Requirements →

55 ASCI Winterschool 2006Henk Corporaal(55) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design  Design flow  Open issues

56 ASCI Winterschool 2006Henk Corporaal(56) BDF Design flow Idea Requirements spec Models Reactive Process NetworkKahn Process Network (YAPI) SDF Platform correct by synthesis Spec POOSL/SystemC C

57 ASCI Winterschool 2006Henk Corporaal(57) RPN (Reactive Process Networks): events and streaming Soft Real-time Compute intensive Special hardware Processing of events Finite State Machine Controlling host-CPU (e.g. ARM) RTOS; hard real-time ‘classical’ SW complexity status Event_inEvent_out Stream_inStream_out mode

58 ASCI Winterschool 2006Henk Corporaal(58) POOSL Modeling Language  Mathematically defined semantics  Allows formal analysis of model properties  Can formally describe: concurrency synchronous communication timing ( delay statements) functionality P1P2 delay 1;

59 ASCI Winterschool 2006Henk Corporaal(59) POOSL: Phases of Model Execution model time Asynchronous actions execution Synchronous time passage State space

60 ASCI Winterschool 2006Henk Corporaal(60) From Model to Realization a()(); sel delay d1; b()(); or c()(); delay d2; les; Possible execution (timed) traces: (S1, t1), (S2, t1), (S3, t1+d1), (S5, t1+d1) (S1, t1), (S2, t1), (S4, t1+d2), (S6, t1+d2) (S1, t1), (S2, t1+wcet(a)), (S3, t1+d1), (S5, t1+d1+wcet(b)) (S1, t1), (S2, t1+wcet(a)), (S4, t1+wcet(a)+wcet(c)), (S6, t1+d2)

61 ASCI Winterschool 2006Henk Corporaal(61)  -Hypothesis: property preservation  If the time-deviation between two timed execution traces is less than , then, if one trace satisfies a real- time property, that property, weakened upto , is preserved in the second one as well ε 1, ε 2 < ε

62 ASCI Winterschool 2006Henk Corporaal(62) Extending SDF SADF: Scenario Aware Data Flow  Can deal with dynamism  Still possible to reason about deadlock, resource utilization, latency and throughput  Currently implemented in POOSL

63 ASCI Winterschool 2006Henk Corporaal(63) SADF example: MPEG-2 Decoder  Pipelined MPEG-2 decoder for I and P frames VLD and IDCT fire per macro-block MC and RC fire per frame FD (frame detector) models control part of VLD that determines frame type Image size = 176x144  I-frame 99 macro-blocks No motion vectors  P x -frame x macro-blocks Motion vectors from VLD to MC Previous frame from RC to MC  P 0 -frame (still video) Copy previous frame  FD model based on occurrence probability of frame types  Execution time distributions of kernels determined with profiling tool RateIP0P0 PxPx a001 b00x c991x d101 e 0x x = {30, 40, 50,60, 70, 80, 99}

64 ASCI Winterschool 2006Henk Corporaal(64)  Time unit = 1 kCycle  Small extension of current tools would enable analyzing deadline miss probability for RC Results for MPEG-2 Decoder ProcessThroughput VLD0.063 rel. error ≤ 0.036% IDCT0.063 rel. error ≤ 0.036% MC0.00106 rel. error ≤ 0.190% RC0.00106 rel. error ≤ 0.191% Process Max. Latency between Successive Firings Average Latency between Successive Firings Variance in Latency between Successive Firings VLD71015.99 rel. error ≤ 0.031%75.38 rel. error ≤ 0.18% IDCT69815.99 rel. error ≤ 0.031%56.45 rel. error ≤ 4.99% MC3305940.3 rel. error ≤ 0.017%2.4·10 5 rel. error ≤ 3.46% RC2216940.3 rel. error ≤ 0.017%1.5·10 5 rel. error ≤ 4.99% Accuracy results based on confidence levels of 0.95 Channel Memory between Processes Maximum Occupancy Time-Average OccupancyTime-Variance in Occupancy VLD and IDCT91.910 rel. error ≤ 0.064%0.528 rel. error ≤ 1.99% IDCT and RC15460.19 rel. error ≤ 0.178%671.8 rel. error ≤ 4.55% VLD and MC13334.73 rel. error ≤ 0.517%698.4 rel. error ≤ 4.39% MC and RC10.577 rel. error ≤ 0.561%0.244 rel. error ≤ 3.27%

65 ASCI Winterschool 2006Henk Corporaal(65) Design flow  Run-time Combine pareto points exploit pareto algebra QoS management / scalable application

66 ASCI Winterschool 2006Henk Corporaal(66) Mapping multiple jobs Multiple jobs can be active simultaneously.  When can a second job start ?  Are the requested resources available ?  If not, can the quality level be lowered ?  If not, can other jobs go for a lower quality ?  If yes, independent from other jobs ?  How to give guarantees? T1T2T0 resources time 100% reconfiguration

67 ASCI Winterschool 2006Henk Corporaal(67) Combining Pareto points Application 1 Cycle Budget Application 2 Cycle Budget Application 3 Cycle Budget Cost 80100 + A new thread frame coming 20 cycle budgets available Cost

68 ASCI Winterschool 2006Henk Corporaal(68) Combining Pareto points Application 1 Cycle Budget Application 2 Cycle Budget Application 3 Cycle Budget Cost 80100 20 Cost feasible, but optimal?

69 ASCI Winterschool 2006Henk Corporaal(69) Combining Pareto points Application 1 Cycle Budget 80 Application 2 Cycle Budget 80 Application 3 Cycle Budget Cost 40 Cost 100 20 11  2 >  1 cost increase cost decrease and a better solution

70 ASCI Winterschool 2006Henk Corporaal(70) Outline  Trends and design problems  Unpredictability  Platforms  Predictable design  Design flow  Open issues

71 ASCI Winterschool 2006Henk Corporaal(71) Open issues  Gap between specification and architecture modeling  High level modeling use of modeling pattern library  Incorporate multiple pareto solutions into DSE Pareto Algebra  Get synthesis correct for control applications including compute intensive tasks mapping to multi-processor  Managing QoS Scenario detection, merging, prediction and exploitation Runtime resource manager optimizing overall quality Measuring overall quality

72 ASCI Winterschool 2006Henk Corporaal(72) Open issues (cont'd)  Architecture modeling how to deal with local memory (scratch pad / cache)  Modeling scheduling and arbitration make things composable !  Definition NAL (run-time services)  Automatic partitioning e.g., SPRINT tool of IMEC is a good start (C to SystemC)  VLSI tiling  …. and many more ….. e.g. see: Ogras e.a.: Key research problems in NoC Design A holistic perspective CODES – ISSS 2005

73 ASCI Winterschool 2006Henk Corporaal(73)


Download ppt "Predictable Design of Embedded Systems using Networked Architectures Henk Corporaal www.ics.ele.tue.nl/~heco ASCI Winterschool on Embedded Systems Rockanje,"

Similar presentations


Ads by Google