Updating RT Embedded Software in the Field Lui Sha Real Time Systems Laboratory Department of CS, UIUC October, 2002.

Slides:



Advertisements
Similar presentations
The Leading Edge of Real-Time and Embedded Solutions Real Time OSGi Glenn Coates, Sr. Design Consultant.
Advertisements

Reliable Scripting Using Push Logic Push Logic David Greaves, Daniel Gordon University of Cambridge Computer Laboratory Reliable Scripting.
The System-Level Simplex Architecture Stanley Bak Olugbemiga Adekunle Deepti Kumar Chivukula Mu Sun Marco Caccamo Lui Sha.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
CMPT 300: Operating Systems I Dr. Mohamed Hefeeda
Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Operating System Support Focus on Architecture
Lui Sha, Summer Overview – A recipe for successful research: –how to think independently, differently and boldly (lectures 1 - 2) –how to analyze.
1: Operating Systems Overview
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
Chapter 1 Introduction. Chapter Overview Overview of Operating Systems Secure Operating Systems Basic Concepts in Information Security Design of a Secure.
Memory Management 2010.
University of Illinois at Urbana-Champaign 1 Complexity and Stability in Modern Avionics Lui Sha, October 3, 2006.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
1 From Ill-formed to Well-formed ! One of the most important skills in research is problem formulation – transforming an interesting idea/question into.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
A Type System for Expressive Security Policies David Walker Cornell University.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Computer Organization and Architecture
Router modeling using Ptolemy Xuanming Dong and Amit Mahajan May 15, 2002 EE290N.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
On-Chip Control Flow Integrity Check for Real Time Embedded Systems Fardin Abdi Taghi Abad, Joel Van Der Woude, Yi Lu, Stanley Bak, Marco Caccamo, Lui.
Stack Management Each process/thread has two stacks  Kernel stack  User stack Stack pointer changes when exiting/entering the kernel Q: Why is this necessary?
Language Evaluation Criteria
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
1 Feedback Based Real-Time Fault Tolerance Issues and Possible Solutions Xue Liu, Hui Ding, Kihwal Lee, Marco Caccamo, Lui Sha.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
1 Introduction to Software Engineering Lecture 1.
Survival by Defense- Enabling Partha Pal, Franklin Webber, Richard Schantz BBN Technologies LLC Proceedings of the Foundations of Intrusion Tolerant Systems(2003)
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
G53SEC 1 Reference Monitors Enforcement of Access Control.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
RELIABILITY ENGINEERING 28 March 2013 William W. McMillan.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
1 Software Reliability in Wireless Sensor Networks (WSN) -Xiong Junjie
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
Protecting Memory What is there to protect in memory?
Protecting Memory What is there to protect in memory?
John Backes, Rockwell Collins Dan DaCosta, Rockwell Collins
Protecting Memory What is there to protect in memory?
William Stallings Computer Organization and Architecture
Real-time Software Design
Main Memory Management
Security in Java Real or Decaf? cs205: engineering software
Chapter 2: Operating-System Structures
John Backes, Rockwell Collins Dan DaCosta, Rockwell Collins
Chapter 2: Operating-System Structures
In Today’s Class.. General Kernel Responsibilities Kernel Organization
Presentation transcript:

Updating RT Embedded Software in the Field Lui Sha Real Time Systems Laboratory Department of CS, UIUC October, 2002

RT embedded systems have a long life span. How to develop real time systems that can: be easily changed in the field, even on the fly? maintain stability and controllability in spite of arbitrary errors in the new software? malicious attack by insiders disguised as upgrades?

Interactive Demo on the Web click project, click drii, click telelab download Win98/NT * important Win98/NT * important LynxOS Simpl ex annotated, pre-recorded presentation (e.g. HTML) (in case of communication failures) A/V Streams Win98/NT * important : Telelab Screen Shot

Some Initial Application Interest. “By providing protection from faults, Simplex enables such functionality to be applied on a mission. … Joint Strike Fighter (JSF)—the JSF mission software architecture builds on the architectural principles developed under the INSERT project” “The Space and Naval Warfare Systems Command (SPAWAR) has initiated a process to transition SIMPLEX technology … The technology will be transitioned to the Surface Combatant for the 21st Century (SC21), the Next Generation Carrier (CV(X)), and other Navy systems.” Currently, DoD’s Open Systems Joint Task Force (OS-JTF) “is extending the Simplex approach for safe insertion of COTS software”.

Job 1 is Robust Against Bugs We shall begin with an investigation on the principle of developing software systems that are robust against bugs. Leaving them alone, bugs may destroy: Correctness Performance Reliability Security … any software property that you care.

The Software Reliability Conundrum If history is any guide, formal methods can only handle software with moderate complexity in the foreseeable future. How about using software tolerance based on diversity? But wait. What if the fault tolerance system is itself too complex to verify and have faults? For example, the Six Western States Blackout incident in US was triggered by the shorting of 1 power line at Oregon spread by the flawed “self healing” architecture at the time

Complexity, Diversity and Reliability To build a robust software system that can tolerant arbitrary application software faults, we must understand the relations between software Complexity: the root cause of software faults Diversity: a necessary condition for software fault tolerance. Reliability: a function of complexity and diversity We shall begin with postulates based self-evident facts

Software Development Postulates We assert that the following postulates self-evident P1: Complexity Breeds Bugs: Everything else being equal, the more complex the software project is, the harder it is to make it reliable. P2: All Bugs are Not Equal: You fix a bunch of obvious bugs quickly, but finding and fixing the last few bugs is much harder. P3: All Budgets are Finite: There is only a finite amount of effort (budget) that we can spend on any project. How can we model “software complexity”?

Logical Complexity Computational complexity => the number of steps in computation. Logical complexity => the number of steps in verification. A program can have different logical and computational complexities. Bubble-sort: lower logical complexity but higher computational complexity. Heap sort: the other way around. Residue logical complexity. A program could have high logical complexity initially. However, if it has been verified and can be used as is, then the residue complexity is zero…

The Implications of the 3 Postulates P1: Complexity Breeds Bugs: For a given mission duration t, the reliability of software decreases as complexity increases. P2: All Bugs are Not Equal: for a given degree of complexity, the reliability function has a monotonically decreasing rate of improvement with respect to development effort. P3: Budgets are finite: Diversity is not free. That is, if we go for n version diversity, we must divide the available effort n-ways. One simple model that satisfies P1, P2 and P3 Sum of efforts used in diversity = available effort Reliability function: e ─ k (complexity / effort ) t

Diversity, Complexity and Reliability. 3-version programming 1-version programming A reliable core with 10x complexity reduction Analysis shows that what really counts is not the degree of diversity. Rather it is the existence of a simple and reliable core that can guarantee the stability of the system. This result is also robust against change of model assumptions. --- Using Simplicity to Control Complexity, IEEE Software 7/8, 2001, L. Sha

Putting the Principle to Work Complexity is The side effect of features and performance The root cause of software faults It is kind of like money … a source of many evils but something we cannot live without. So let’s find a way to control complexity, instead of letting it control our systems.

An Example Once upon a time, there was an exam on sorting programs. Grades are given as follows: A: Correct and fast: n log (n) in worst case B: Correct but slow F: Incorrect Joe can verify his bubble sort, but has only 50% chance to write Heap Sort correctly. What is his optimal strategy?

Requirement Decomposition Often, requirements can be decomposed into Critical (correctness) requirements Sorting: output numbers in correct order; TSP: visit every city exactly once Control: stable and controllable Performance optimization Sorting: faster TSP: shorter path Control: less time/error/energy Joe can exploit software he cannot verify safely … Heap SortBubble Sort

Stability Control Stability control is a mechanism that ensures that errors are bounded in a way that satisfies the preconditions for the recovery operations. Stability control must be simple or it will be self defeating. What if the untrusted sorting program alters an item in the input list? 1.Create a verified simple primitive called “permute” 2.Untrusted sorting software is not allowed to touch the input list except use the permute primitive. 3.Enforce the restriction using an object with (only) method “permute” Under stability control, the untrusted Heap-sort can only produce “out of order” application errors.

Stability Control for Control Systems Having a reliable controller, we identify the recovery region within which the controller can operate successfully. Recovery region is a subset of the states that are admissible with respect to operational constraints The largest recovery region can be found using LMI. This approach is applicable to any linearizable systems. They cover most of the practical control systems. operational constraints Recovery Region Stability envelope The system under new complex controller must stay within recovery region

Simplex Architecture for Control Trusted simple and reliable controller Online upgradeable complex controller Data Flow Block Diagram Plant Stability Monitoring Simplex architecture for control systems allows the online upgrade of control systems without shutting down the operation. It also maintains control in spite of arbitrary application errors in the upgrade process. To try an interactive demonstration, see www-drii.cs.uiuc.edu/download.

Dynamic Component Replacement Hardware Operating System Complex feature Rich components eSimplex middleware Simple & reliable component Monitoring and switching logic Application layer Runtime Component Replacement Middleware

Intrusion Tolerance An untrusted software may contain not just application level faults or attacks. It may contains attacks aiming at corrupting the system. 1.Overuse system memory and CPU resources 2.Corrupt other program’s code or data 3.Usurp supervisory control privileges The first two can be handled by Address space protection via, e.g., process abstraction Memory and temporal resource restrictions

Prevent Untrusted Code Usurping Privileges To handle the third, we begin with restricting available system calls to memory allocation only, and do not allow the use embedded assembly. Under above constraints, to usurp privileges one has to violate code safety constraints, e.g., Jump to data areas to execute data: hidden or synthesized machine codes Jump to system code areas and run system codes

C Code Safety Checks Due to the large installed base of C, we working with colleagues to define a subset of C, called Control_C, that can be statically checked for safety and expressive enough for control and signal processing. + { strong-typing } + { Java-style pointers } + { region-based heap with only 1 region } + { “bounded” arrays } – { system calls except memory allocation } – {embedded assembly } CodeCompiler AnalysisGCC Ensure Code Safety without Runtime Checks for Real Time Control Systems, Kowshik, Dhurjati, & Adve, CASE 2002

Technology Integration in eSimplex Middleware Code Safety Checks Attack on Exec env appl. Logic Bugs + attacks Safety Controller + Stability Control RT Resource Management Resource Depletion attacks Development Environment Appl. Domain Technology Middleware

How to integrate real time, fault tolerance, compiler and control technologies into a middlleware for real time, fault and intrusion tolerant upgrades in the field? How can we maximize performance of special purpose streaming applications such as sonar by co-design protocols for cache, bus, CPU and communication? How to integrate queueing model based feed forward and control theory based feedback to suppress performance variations in distributed command and control networks? How can we integrate legacy control software components with modern real- time control software components in a way that minimizes the need for recertification? How to perform quality driven RT communication in wireless sensor networks? How to handle physical constraints such as heat & power in multi-function phase array radars real time search and tracking? UIUC Real Time Systems Lab

Using Simplicity to Control Complexity The high assurance control subsystem Application level: well-understood controllers to keep the control software simple. System software level: certified OS kernels Hardware level: well-established and fault tolerant hardware System development: high assurance process, e.g. DO178B Requirement management: critical properties and essential services. The high performance control subsystem Application level: advanced control technologies, System software level: COTS OS and middleware Hardware level: standard industrial hardware System development: standard industrial development processes. Requirement management: features, performance & rapid innovation

Intrusion Tolerance When attacks are disguised as upgrade, it can attack the system by Malicious control logics: countered by analytically redundant controller and recovery region Resources depletion attacks: countered by static memory allocation and temporal firewalls from real time schedulers Corrupt other applications’ code and data: countered by address space protection. Usurp system management authority: to be discussed next

Examples

Language & Compiler Support for Security Current languages are too general (Java, SafeC, PCC, Modula-3+). Safety requires extensive runtime checks + garbage collection Control_C: A language for safe, upgradeable, real-time control C + { strong-typing } + { Java-style pointers } + { region-based heap with only 1 region } + { “bounded” arrays } – { system calls }

The Stability Bounds We cannot use the boundary of admissible states as switching rule due to the inertia of the physical plant. Recovery region is closed with respect to the operations of simple controller. It is Lyapunov function inside the polytope. The largest recovery region can be found using LMI. State constraints Recovery Region Lyapunov function State Constraints and the switching rule (Lyapunov function)

new Compiler Detection of Violations Attack: Write beyond ends of a buffer or array Compiler solution: check for array bounds violations (or runtime checks) Attack: Jump to illegal code within data area Compiler solution: check for jumps to non-label type Attack: Illegal pointer usage corrupts data Compiler solution: region-based protection with a single region Stack bottom &new &new + 2 Return add. killcode