Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jim Anderson 1 Dagstuhl, Mar 2015 Adding Cache and Memory Management to the MC 2 (Mixed Criticality on Multicore) Framework Jim Anderson University of.

Similar presentations


Presentation on theme: "Jim Anderson 1 Dagstuhl, Mar 2015 Adding Cache and Memory Management to the MC 2 (Mixed Criticality on Multicore) Framework Jim Anderson University of."— Presentation transcript:

1 Jim Anderson 1 Dagstuhl, Mar 2015 Adding Cache and Memory Management to the MC 2 (Mixed Criticality on Multicore) Framework Jim Anderson University of North Carolina at Chapel Hill

2 Jim Anderson 2 Dagstuhl, Mar 2015 Outline Motivation for MC 2. Basic MC 2 design. »Per-level scheduling and schedulability guarantees. MC 2 with shared hardware management. »Cache and memory-bank partitioning (and sharing). »Isolating the OS. »Complexities such as shared libraries. Future research directions.

3 Jim Anderson 3 Dagstuhl, Mar 2015 Original Driving Problem Joint Work with Northrop Grumman Corp. Next-generation UAVs. »Pilot functions performed using “AI-type” software. »Causes certification difficulties. »Workload is dynamic and computationally intensive. »Suggests the use of multicore. »Research focus: How to support such workloads while making efficient use of the hardware platform? Image source:

4 Jim Anderson 4 Dagstuhl, Mar 2015 Starting Assumptions Modest core count (e.g., 2-8). »Quad-core in avionics would be a tremendous innovation.

5 Jim Anderson 5 Dagstuhl, Mar 2015 Starting Assumptions Modest core count (e.g., 2-8). Modest number of criticality levels (e.g., 2-5). »2 may be too constraining »  isn’t practically interesting. »These levels may not necessarily match DO-178B. »Perhaps 3 levels? –Safety Critical: Operations required for stable flight. –Mission Critical: Responses typically performed by pilots. –Background Work: E.g., planning operations.

6 Jim Anderson 6 Dagstuhl, Mar 2015 Starting Assumptions Modest core count (e.g., 2-8). Modest number of criticality levels (e.g., 2-5). Statically prioritize criticality levels. »Schemes that dynamically mix levels may be problematic in practice. –Note: This is done in much theoretical work on mixed criticality scheduling. »For example, support for resource sharing and implementation overheads may be concerns. »Also, practitioners tend to favor simple resource sharing schemes.

7 Jim Anderson 7 Dagstuhl, Mar 2015 Starting Assumptions Modest core count (e.g., 2-8). Modest number of criticality levels (e.g., 2-5). Statically prioritize criticality levels.

8 Jim Anderson 8 Dagstuhl, Mar 2015 Outline Motivation for MC 2. Basic MC 2 design. »Per-level scheduling and schedulability guarantees. MC 2 with shared hardware management. »Cache and memory bank partitioning (and sharing). »Isolating the OS. »Complexities such as shared libraries. Future research directions.

9 Jim Anderson 9 Dagstuhl, Mar 2015 Basic MC 2 Design We assume four criticality levels, A-D. »Originally, we assumed five, like in DO-178B. We statically prioritize higher levels over lower ones. We assume: »Levels A & B require HRT guarantees. »Level C requires SRT guarantees in the form of bounded deadline tardiness. »Level D is non-RT. »All tasks are implicit-deadline periodic/sporadic.

10 Jim Anderson 10 Dagstuhl, Mar 2015 Level-A Scheduling “Absolute, Under All Circumstances, HRT” Partitioned scheduling is used, with a cyclic executive on each core. »Cyclic executive: offline-computed dispatching table. »Time-triggered scheduling is the de facto standard in avionics. »Preferred by certification authorities. »But somewhat painful from a software-development perspective. Require: »Total per-core level-A utilization  1 (assuming level- A execution costs). »Sufficient to ensure that tables can be constructed (no level-A deadline is missed).

11 Jim Anderson 11 Dagstuhl, Mar 2015 So Far… CE Level A Level B Level C Level D Core 0 Core 1 Core 2 Core 3 higher (static) priority lower (static) priority

12 Jim Anderson 12 Dagstuhl, Mar 2015 Level-B Scheduling “HRT Unless Something Really Unexpected Happens” Use P-EDF scheduling. »Overhead-aware UNC have shown that P- EDF excels at HRT scheduling. –Harmonic periods  can instead use partitioned rate-monotonic (P-RM). »Eases the software-development pain of CEs. Require: »Total per-core level-A+B utilization  1 (assuming level-B execution costs). ( * ) »Each level-B period is a multiple of the level-A hyperperiod (= LCM of level-A periods). ( ** ) –More on this later. »( * ) and ( ** )  if no level-A+B task ever exceeds its level-B execution cost, then no level-B task will miss a deadline.

13 Jim Anderson 13 Dagstuhl, Mar 2015 So Far… CE EDF/RM Level A Level B Level C Level D Core 0 Core 1 Core 2 Core 3 higher (static) priority lower (static) priority

14 Jim Anderson 14 Dagstuhl, Mar 2015 Level-C Scheduling “SRT” Use G-EDF (or, optionally, G-FL) scheduling. »Prior UNC has shown that G-EDF excels at SRT scheduling. Require: »Total level-A+B+C utilization  M = number of cores (assuming level-C execution costs). »Theorem [Prior UNC]: G-EDF ensures bounded deadline tardiness, even on restricted-capacity processors, if no over-utilization and certain task util. restrictions hold. »These restrictions aren’t very limiting if level-A+B utilization (assuming level-C execution costs) is not large. »Bottom Line: If no level-A+B+C task ever exceeds its level- C execution cost, can ensure bounded tardiness at level C.

15 Jim Anderson 15 Dagstuhl, Mar 2015 So Far… CE EDF/RM G-EDF Level A Level B Level C Level D Core 0 Core 1 Core 2 Core 3 higher (static) priority lower (static) priority

16 Jim Anderson 16 Dagstuhl, Mar 2015 Level-D Scheduling “Best Effort” Ready Level-D tasks run whenever there is no pending Level-A+B+C work on some processor. Seems suitable for potentially long- running jobs that don’t require RT guarantees. We won’t be considering Level D further in this talk…

17 Jim Anderson 17 Dagstuhl, Mar 2015 Full Architecture CE EDF/RM G-EDF Best Effort Level A Level B Level C Level D Core 0 Core 1 Core 2 Core 3 higher (static) priority lower (static) priority

18 Jim Anderson 18 Dagstuhl, Mar 2015 Remaining Issues Why the Level-B period restriction? Budgeting Slack shifting.

19 Jim Anderson 19 Dagstuhl, Mar 2015 Example Task System (All Tasks on the same Processor) Crit.T.pT.e A T.u A T.e B T.u B T1T1 A T2T2 A T3T3 B T4T4 B Σ1.0

20 Jim Anderson 20 Dagstuhl, Mar 2015 Example Task System (All Tasks on the same Processor) Crit.T.pT.e A T.u A T.e B T.u B T1T1 A T2T2 A T3T3 B T4T4 B Σ1.0 This violates our Level-B period assumption.

21 Jim Anderson 21 Dagstuhl, Mar 2015 Example Task System (All Tasks on the same Processor) Crit.T.pT.e A T.u A T.e B T.u B T1T1 A T2T2 A T3T3 B T4T4 B Σ1.0 Each Level-A task has a different execution cost (and hence utilization) per level. Execution costs (and hence utilizations) are larger at higher levels. Each Level-A task has a different execution cost (and hence utilization) per level. Execution costs (and hence utilizations) are larger at higher levels.

22 Jim Anderson 22 Dagstuhl, Mar 2015 “Level-A System” T 2 = (10,20) T 1 = (5,10) = job release = job deadline Higher levels have statically higher priority. Thus, Level-A tasks are oblivious of other tasks. Level A: T 1 & T 2 Level B: T 3 & T 4

23 Jim Anderson 23 Dagstuhl, Mar 2015 “Level-B System” T 2 = (6,20) T 1 = (3,10) = job release = job deadline T 3 misses some deadlines. This is because our period constraint for Level B is violated. We can satisfy this constraint by “slicing” each job of T 2 into two “subjobs”… T 4 = (8,40) T 3 = (2,10) Level A: T 1 & T 2 Level B: T 3 & T 4

24 Jim Anderson 24 Dagstuhl, Mar 2015 “Level-B System” T 2 = (3,10) T 1 = (3,10) = job release = job deadline All deadlines are now met. T 4 = (8,40) T 3 = (2,10) Level A: T 1 & T 2 Level B: T 3 & T 4

25 Jim Anderson 25 Dagstuhl, Mar 2015 Budgeting MC 2 optionally supports budget enforcement. »The OS treats a Level-X task’s Level-X execution time as a budget that cannot be exceeded Provisioned Without BE With BE Job 1 Job 2 Job 3

26 Jim Anderson 26 Dagstuhl, Mar 2015 Budgeting MC 2 optionally supports budget enforcement. »The OS treats a Level-X task’s Level-X execution time as a budget that cannot be exceeded. »Makes “job slicing” easier.

27 Jim Anderson 27 Dagstuhl, Mar 2015 Slack Shifting Provisioned execution times at higher criticality levels may be very pessimistic. »At Level A, WCETs (worst-case execution times) may be much higher than actual execution times. Thus, higher levels produce “slack” that can be consumed by lower levels. »Could lessen tardiness at Level C… »… or increase throughput at Level D.

28 Jim Anderson 28 Dagstuhl, Mar 2015 Slack Shifting (Cont’d) Slack Shifting Rule: »When a Level-X job under-runs its Level-X execution time, it becomes a “ghost job.” »When a ghost job is scheduled, its processor time is donated to lower levels. Such donations do not invalidate correctness properties.

29 Jim Anderson 29 Dagstuhl, Mar 2015 A (Very) Simple Example Level B Level C … … Donated execution time (at Level B)

30 Jim Anderson 30 Dagstuhl, Mar 2015 Without Slack Shifting Level B Level C … … finishes later Remember, could be huge. execution time (at Level B)

31 Jim Anderson 31 Dagstuhl, Mar 2015 Using Virtual Time at Level C To deal with potential overloads at Level C, we recently introduced scheduling based on virtual time at that level [IPDPS ‘15]. Virtual time may “tick” faster or slower than real time. »To dissipate an overload, slow down virtual time for a while. –Jobs are “slowed down” but they are not dropped. »To our knowledge, this is the first work to apply virtual time under global scheduling.

32 Jim Anderson 32 Dagstuhl, Mar 2015 Outline Motivation for MC 2. Basic MC 2 design. »Per-level scheduling and schedulability guarantees. MC 2 with shared hardware management. »Cache and memory-bank partitioning (and sharing). »Isolating the OS. »Complexities such as shared libraries. Future research directions.

33 Jim Anderson 33 Dagstuhl, Mar 2015 Hardware Platform Freescale i.MX 6Quad 1 GHz ARM®Cortex™- A9 processor. Caches: »32 KB L1 I-cache per core. »32 KB L1 D-cache per core. »1 MB shared L2 cache. –Cache line size =32 B, 2048 Sets, 16 Ways. 1 GB DDR3 SDRAM up to 533 MHz memory. »8 Banks, each 128 MB.

34 Jim Anderson 34 Dagstuhl, Mar 2015 Way 0 Way 1 Way 2 … Way 15 Color 0 Color 1 Color 2 Color 15 … Address Bits [31:0] Cache Bits [15:12] [0010] Cache Partitioning (of the Shared L2) Option 1: Set Partitioning, i.e., Page Coloring

35 Jim Anderson 35 Dagstuhl, Mar 2015 Way 0 Way 1 Way 2 … Way 15 Color 0 Color 1 Color 2 Color 15 … L2 Cache Lockdown Register [ ] Lockdown bits [15:0] CPU 0 Lockdown Register CPU 0 Cache Partitioning Option 2: Way Partitioning

36 Jim Anderson 36 Dagstuhl, Mar 2015 Way 0 Way 1 Way 2 … Way 15 Color 0 Color 1 Color 2 Color 15 … Can Combine these Approaches

37 Jim Anderson 37 Dagstuhl, Mar 2015 One Possibility for the Whole System Level A+B Core 0 Level A+B Core 1 Level A+B Core 2 Level A+B Core 3 Level C + OS Questions: How to precisely size these partitions? How should the Level A+B partitions be configured? Questions: How to precisely size these partitions? How should the Level A+B partitions be configured? Level C is provisioned on the average case and would likely benefit from a shared cache. Level C is provisioned on the average case and would likely benefit from a shared cache.

38 Jim Anderson 38 Dagstuhl, Mar 2015 DRAM Banks Bank DRAM Row Decoder Column Decoder Row Buffer Data Bus Address

39 Jim Anderson 39 Dagstuhl, Mar 2015 Address Decoding Cache Index Bank Interleaving on Bank Index Bank Interleaving off Bank Index [14:12] [15:12] [29:27] Our current thinking: Turn interleaving off. Give dedicated banks to Levels A and B. Let Level C and statically allocated kernel pages share banks. Our current thinking: Turn interleaving off. Give dedicated banks to Levels A and B. Let Level C and statically allocated kernel pages share banks.

40 Jim Anderson 40 Dagstuhl, Mar 2015 Current Implementation Status MC 2 is implemented as a LITMUS RT plugin. We currently support set- and way-based partitioning and DRAM partitioning. »For set-based, we re-color (almost) all task pages before task execution. »Two limitations: –Each task has a signal handling page (which should rarely be accessed) that is not colored. –We don’t color shared pages (but can handle shared libraries through static linking).

41 Jim Anderson 41 Dagstuhl, Mar 2015 OS Isolation To our knowledge, we are the first to consider isolating the OS from application tasks in mixed-criticality work. The OS can be isolated from Levels A and B w.r.t. the L2 cache and w.r.t. DRAM banks, except for dynamically allocated OS pages in DRAM banks. »Dynamically allocated OS pages could be dealt with in principal, but this would require major kernel modifications.

42 Jim Anderson 42 Dagstuhl, Mar 2015 To Share or Not To Share? Interesting issues arise when considering when different criticality levels should or should not share hardware resources. For example, consider the Level A+B cache partitions…

43 Jim Anderson 43 Dagstuhl, Mar 2015 Consider One Per-Core Level A+B Partition Two options: »Spatial Isolation: A partition is only used by one criticality level on one core. –No cache-related impacts of A on B or vice versa. »Temporal Isolation: A partition is only used by one criticality level and one core at a time. –More cache space overall for A and B. –More flexibility in adjusting partition parameters. A and B disjoint A is a subset of B Level B Level A Level B Level A

44 Jim Anderson 44 Dagstuhl, Mar 2015 The “Big Picture” Many options exist (and this assumes we pre- allocate a region of the cache to Level C and the OS)… By WaysBy SetsArbitrarily A0 Level C + OS B0 A0 B1 A1 B2 A2 B3 A3 B0 A1 B1 A2 B2 A3 B3 A0 A1 A3 A2 B1B1 B2B2 B3 Level C + OS Level C + OS

45 Jim Anderson 45 Dagstuhl, Mar 2015 Research Issues/Questions We are currently working on algorithms to define these partitions. Some questions: »Which is better, way- or set-based partitioning? »In determining partitions, what should be the objective? Minimizing utilization? At which level? »Should Level A even use the cache at all (since its provisioning is presumably very conservative)?

46 Jim Anderson 46 Dagstuhl, Mar 2015 Research Issues/Questions (Cont’d) More questions: »We assume in experiments: –Level-A and -B tasks are “fly weight”: we model them using the Mälardalen benchmark suite. –Level-C tasks are more “heavy weight”: we model them using the PARSEC benchmark suite. »Is this OK?

47 Jim Anderson 47 Dagstuhl, Mar 2015 Research Issues/Questions (Cont’d) More questions: »We assume in experiments: –Level-C execution times are based on the observed average case. –Level-B execution times are based on the observed worst case. –Level-A execution times are X times Level-B times. »What should X be? –This represents a “fudge factor.” What are common fudge factors assumed in industry?

48 Jim Anderson 48 Dagstuhl, Mar 2015 Research Issues/Questions (Cont’d) More questions: »How should overheads be dealt with analytically? –Larger partitions yield lower execution costs but higher CPMDs (cache-related preemption and migration delays). –Levels matter: e.g., perhaps when analyzing level A, it makes no difference if A gets any cache space or not, but when analyzing Level C, the average-case costs of the level-A tasks are too much since they never use the cache.

49 Jim Anderson 49 Dagstuhl, Mar 2015 Research Issues/Questions (Cont’d) More questions: »In experiments, what fraction of the overall workload should be assumed to be at Level A, B, or C? »We assume “fraction” is defined w.r.t. Level-C (average-case) utilizations. –Is this OK?

50 Jim Anderson 50 Dagstuhl, Mar 2015 Outline Motivation for MC 2. Basic MC 2 design. »Per-level scheduling and schedulability guarantees. MC 2 with shared hardware management. »Cache and memory bank-partitioning (and sharing). »Isolating the OS. »Complexities such as shared libraries. Future research directions.

51 Jim Anderson 51 Dagstuhl, Mar 2015 Future Work Our future plans include: »Determining whether we can fully isolate the OS (even w.r.t. dynamically allocated DRAM pages). »Applying recent work on stochastic provisioning to Level C. »Enabling dynamic adaptations at Level C. »Enabling precedence constraints everywhere. »Extending page coloring to more efficiently deal with shared libraries and to deal with writable shared data.

52 Jim Anderson 52 Dagstuhl, Mar 2015 MC 2 Papers (All papers available at J. Anderson, S. Baruah, and B. Brandenburg, “Multicore Operating-System Support for Mixed Criticality,” Proc. of the Workshop on Mixed Criticality: Roadmap to Evolving UAV Certification, »A “precursor” paper that discusses some of the design decisions underlying MC 2. M. Mollison, J. Erickson, J. Anderson, S. Baruah, and J. Scoredos, “Mixed Criticality Real-Time Scheduling for Multicore Systems,” Proc. of the 7th IEEE International Conf. on Embedded Software and Systems, »Focus is on schedulability, i.e., how to check timing constraints at each level and “shift” slack. J. Herman, C. Kenna, M. Mollison, J. Anderson, and D. Johnson, “RTOS Support for Multicore Mixed-Criticality Systems,” Proc. of the 18th RTAS, »Focus is on RTOS design, i.e., how to reduce the impact of RTOS-related overheads on high- criticality tasks due to low-criticality tasks. B. Ward, J. Herman, C. Kenna, and J. Anderson, “Making Shared Caches More Predictable on Multicore Platforms,” Proc. of the 25th ECRTS, »Adds shared cache management to a two-level variant of MC 2. The approach in today’s talk is different. J. Erickson, N. Kim, and J. Anderson, “Recovering from Overload in Multicore Mixed-Criticality Systems,” Proc. of the 29 th IPDPS, »Adds virtual-time-based scheduling to Level C.

53 Jim Anderson 53 Dagstuhl, Mar 2015 Thanks! Questions?


Download ppt "Jim Anderson 1 Dagstuhl, Mar 2015 Adding Cache and Memory Management to the MC 2 (Mixed Criticality on Multicore) Framework Jim Anderson University of."

Similar presentations


Ads by Google