Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

Similar presentations


Presentation on theme: "Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,"— Presentation transcript:

1 Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University, Sweden

2 2 GSM Phone:  Search  Radio Link Control  Talking GSM Phone:  Search  Radio Link Control  Talking MP3 player Digital Camera:  Take Photo  Restore Photo Digital Camera:  Take Photo  Restore Photo...  High performance  Low power  Predictable

3 3 Design Flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize Formal Simulation CPU0 ASIC0 CPU1 Bus for (i=0;i<99;i++) x=x+a[i]; for(j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Worst case execution times Task power      dl for (i=0;i<99;i++) x=x+a[i]; for (j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Implement Extract Task Parameters Optimize

4 4 Application Model       dl

5 5 Hardware Architecture Bus CPU Interrupt Device Private Memory Private Memory Private Memory Semaphore Device Shared Memory CACHE

6 6 Execution Model CPU 1 CPU 2 BUS Shared Mem Private Mem 1 Cache Private Mem 2 copy(s,y) use(y) 2:2: y Instructions  2   Original TG copy(x,s) comp(x) x Instructions  1 1:1: s

7 7 Task Model ii jj Original TG  wi  rj Explicit communication ii jj Extended TG

8 8 Motivational Example 11 22  wi WCET:  1 =60;  2 =25;  w2 =12  1 and  2 have a deadline at time 63 PMem 1 Bus CPU 1 CPU 2 ShMem PMem 2 11 22  wi

9 9 Motivational Example (2) CPU 1 CPU 2 BUS 11 22 Implicit communication  w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I1 I2I2 0 6915 0 61117 24 3339 36 57 Explicit communication dl=63 I5I5  w2 I4I4 I3I3

10 10  w2 I5I5 I4I4 I3I3 I2I2 Motivational Example (3) CPU 1 CPU 2 BUS 11 22  w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I1 0 691818 0 31121217 24 36364949 43 6767 dl=63 0 61218 24 31 Deadline violation ! 434349 Using a FCFS bus arbiter

11 11  w2 I5I5 I2I2 I3I3 I4I4 Motivational Example (4) CPU 1 CPU 2 BUS 11 22  w2 M1M1 M3M3 M2M2 I1I1 0 691818 0 32121217 2626 33939 39 5757 dl=63 0 69 2121 323249 1515 M4M4 M4M4 2626 3939 Using a bus schedule

12 12 Motivational Example Message In multiprocessor systems, the WCET depends on the bus load ! In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !

13 13 Implicit Communication BenchmarkBus UtilizationImpl.Communication GSM 1) 12%39% MP3 2) 26%42% MP3 3) 49%86% Setup: ARM7 cores, ST bus protocol 1) Icache: 4096b, Dcache: 1024b 2) Icache: 4096b, Dcache: 1024b 3) Icache: 16b, Dcache: 256b

14 14 WCET Analysis  Difficult both for single and multiprocessor systems  Single processor tools: Symta/P, Absint aiT  Handle instruction and data caches  Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one

15 15 WCET Analisys Flow source files analysis Data flow Instr. address extraction Program segment simulation Abstract syntax tree generation Data dependency analysis Data flow extraction Data address analysis Data cache binary file CFG construction Annotated CFG WCET Instruction cache Data cache Instr. Cache analysis

16 16 WCET Analysis: Example void foo() { int i, temp; for (i=0; i<100; i++) { temp=a[i]; a[temp]=0; }

17 17 WCET Analysis: CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<N; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4,9 id: 12 Lno:3,4,6 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11

18 18 WCET Analysis: CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 Control nodes: 2, 4, 11 Basic blocks: 12, 17, 13, 6 id: 4 Loop bound (for ex. N=100)

19 19 WCET Analysis with Instruction Cache  Generate the address traces for each program block  Assume always a miss at the beginning of each block  Use a cache simulator to get the cache rate/miss ratio for each block  We can do better

20 20 WCET Analysis with ICache: Unrolled CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<100; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6

21 21 WCET Analysis with ICache: Unrolled CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4

22 22 WCET Analysis: Multiprocessor  Cache miss penalty is constant in single processor case  Cache miss penalty is variable in the multiprocessor case

23 23 Predictable MPSoC Bus Access  Partition the bus period in bus slots (TDMA)  Assign bus slots to the processors  The bus arbiter grants the bus to a processor only during its allocated slots  Eliminates the bus interference  Not flexible: an idle bus slot can not be used by another processor

24 24 Analysis & Bus Access id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4 Bus schedule CPU1 CPU2 CPU1 CPU2 CPU1... 2432 0 816 42 52

25 25 Multiprocessor Analysis and Optimization In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !

26 26 55 Overall Approach CPU 1 CPU 2 CPU 3 BUS 11 22 33 CPU 1 :  1,  4 CPU 2 :  2 CPU 3 :  3,  5 44 11 33 11 22 33 22 44 22 33 44 44 22 55 22 55 44 44 44 55 55

27 27 Overall Approach starting at t for the time interval Select bus schedule B tasks from set  Determine WCET of the is the earliest time a tasks from set   finishes  Schedule new task at  time t>=  that are active at time t is the set of all tasks New task to schedule optimization Bus schedule

28 28 Overall Approach starting at t for the time interval Select bus schedule B tasks from set  Determine WCET of the is the earliest time a tasks from set   finishes  Schedule new task at  time t >=  that are active at time t is the set of all tasks New task to schedule optimization Bus schedule

29 29 Bus Schedule: BSA1 t0t0 t1t1 t3t3 CPU 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 1 CPU 2... over a period slot_start owner CPU 1 CPU 2 CPU 1... t2t2

30 30 Bus Schedule: BSA2 t0t0 owners 1, 2 12 seg_size seg_start owner size 1 3 CPU 1 CPU 2 Segment 1 Segment 2 over a period... t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... t4t4 owners 2, 1 7 seg_size seg_start owner size 2 5 CPU 1 CPU 2 CPU 1 t5t5 t6t6...

31 31 Bus Schedule: BSA3 t0t0 seg_start owners 1, 2 3 slot_size t4t4 2, 1 6... over a period Segment 1 Segment 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... CPU 2 CPU 1 t5t5 t6t6

32 32 Experimental Results BSA 4 BSA 3 BSA 2 BSA 1 Number of CPUs Normalized Schedule Length 1 1.5 2 2.5 3 3.5 4 2 4 6 8 10 12 14 16 18 20

33 33 Experimental Results Number of CPUs Normalized Schedule Length

34 34 Real-life Example  Smart phone  GSM voice codec (encoder+decoder) and Mp3 player  64 tasks, between 100-2000 lines of C code per task  4 ARM7 processors, interconnected via a bus

35 35 Real-life Example BSA_1BSA_2BSA_3BSA_4 1.171.331.311.62  GSM + Mp3  64 tasks  4 ARM7 processors

36 36 Conclusions  Realistic model for MPSoC  WCET analysis must be integrated in the system scheduling  Tool for system level scheduling and WCET  Tested on real applications

37 37 ARTIST LiU TU Brauschweig U. of Bologna Original SymtaP code Bus controller Implementation


Download ppt "Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,"

Similar presentations


Ads by Google