Presentation on theme: "1 CSC 714 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Missing in Action: Timing."— Presentation transcript:
1 CSC 714 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Missing in Action: Timing Analysis and Soft Error Protection
2 CSC 714 Example: A380 Overheat Detection w/ Hamilton Sundstrand/United Techn. Overall system has 54 sensors When too hot, isolate air channels — Close valves over AFDX network Avoids overheating upon leakage — plane’s hull is hybrid carbon/metal can burn hole into it! SW has to adhere to RTCA DO-178B standard — Level A: conditional decision, branch/decision/stmt coverage — Level B: branch/decision/statement coverage — Level C: statement coverage SW is written as cyclic executives
3 CSC 714 Requirements SW standard requirements – some examples: — All switch statements must have a default case — Single entry and single exit functions only — Strict type checking required SW certification requirements — Qualified tools to check for adherence to standard — Simulation environment for testing functionality — Explicit tests for every low level requirement — Programmer independence — New: Timing guarantees (required by Airbus!) worst-case execution time (WCET) analysis
4 CSC 714 Missing in Action 1: Timing Analysis WCET: Worst-case execution time — needed for schedulability analysis WCET bounds: determined by timing analysis — should be safe and tight — derived by tools: only semi-automated, small programs — restrictions: loop bounds, no heap, no func pointers — predictable architecture Problems: — WCET >> actual execution time under-utilization — Complexity wall: –timing analysis tools lagging behind architectural innovation –not getting closer (maybe even loosing) Tools and methods lag behind What to do?
5 CSC 714 Timing Analysis: Status Quo and Needs Capabilities of static timing analysis — In-order scalar pipeline, static branch prediction, split I/D $ Contemporary processors — Out-of-order, multiple issue, dynamic branch prediction, multi-level caches, deep speculation, etc. Analyzability fundamental to design of safe systems — excludes contemporary microarchitectures — Long-term implications Complexity wall need new methods for timing analysis Promote hybrid HW/SW solution — Timings on actual processor in special execution mode — Steer execution through SW realistic! (ARM) Rigorous methodology and tools needed!
6 CSC 714 Another Failure: Single Event Upset Radiation from space due to solar flare can cause bit flips — Heavy ion strikes flip-/flop, RAM, … — Issue in higher atmosphere planes over flying over poles Typically sufficient to consider single (bit) event upset (SEU) — Multiple bits statistically too rare to care for — Also caused by smaller fabs smaller noise ratios errors Protect RAM w/ ECC Caches/processors unprotected — Unless radiation hardened expensive Examples: solar flares — Many failed servers in 1999 — Nozomi Mars Probe rendered inoperable IBM has built-in checks for 80% of server-chip circuits
7 CSC 714 SEU on the Airbus 380 Uses PowerPC 750CXe — Off-the-shelve — RAM has ECC — L2 has ECC but L1 does not — No protection against SEU in processor core Options: — Do not use L1 and best effort to “code against” SEU — Use EDDI: error detection by duplicating instructions –But who wants to pay the overhead? — Selective use of fault (SEU) resilient development techniques –Pure software or hybrid (minimal HW support + SW) –Protection only where needed in code Rigorous methodology and tools needed!
8 CSC 714 Conclusion Off-the-shelve processors everywhere — Airbus 380, Boeing 787 — Automotive industry (waking up!) Lack of predictability and protection New methods for timing analysis — Increasing complexity gap — Promote hybrid HW/SW solution –Timings on actual processor in special execution mode –Steer execution through SW realistic! (ARM) New methods for soft error protection — Either pure software or hybrid (min. HW + SW) — Fault (SEU) resilient software development, selective Missing in action: methods and tools needed today / yesterday !!!