Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems 818-227-5105 Embedded Operating Systems: The State.

Similar presentations

Presentation on theme: "1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems 818-227-5105 Embedded Operating Systems: The State."— Presentation transcript:

1 1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems Embedded Operating Systems: The State of the Art QNX is a leading provider of real time operating system (RTOS) software, development tools, and services for mission critical embedded applications.

2 2 Role of the Embedded OS Traditional –Permit sharing of common resources of the computer (disks, printers, CPU) –Provide low-level control of I/O devices that may be complex, time dependent, and non-portable –Provide device-independent abstractions (e.g. files, filenames, directories) Additional Roles –Prevent common causes of system failure and instability; minimize impact when they occur –Extend system life cycles –Isolate problems during development and at runtime

3 3 Architecture Comparison REAL TIME EXECUTIVE Advantage: single address space Disadvantage: single address space, different binary images Failure: means reboot MONOLITHIC KERNEL Advantage: apps run in own memory space Disadvantage: kernel not protected, kernel testing Failure: might mean reboot TRUE MICROKERNEL Advantage Modules run in own memory space Add/replace services on the fly Reusable modules Direct hardware access Disadvantage: context switching Failure: usually does not mean reboot

4 4 Microkernel X86, PPC, MIPS, SH4, ARM, StrongARM, XScale App Photon GUI Flash fsys Audio driver TCP/IP Serial driver Http server Java Process Manager Dynamic architecture makes hot-start and upgrades easy, even with drivers Philosophy: a trusted kernel running a system of untrusted software components Processes provide a reusable component model with well defined message interfaces Processes communicate via messages or other methods, such as shared memory. Permits loose inter-module coupling. No requirement for filesystem, GUI, etc. MicroKernel – Neutrino

5 5 Process 1Process 2 Pipes Process address map Shared memory object map Process address map Shared Memory msg 5msg 2msg 3msg 4 Process 1Process 2 Message Queues Typical Forms of IPC Mailboxes Kernel

6 6 Which Architecture for me? Depends on your application and processor! Simple apps (such as single control loops) generally only need a real-time executive As system becomes more complex, typically need a more complex operating system architecture Need to look at factors such as scalability and reliability Do standards matter?

7 APIs Two most common standards Advantages of standards Portability of code Hiring of programmers

8 8 Less than 1 second response? Less than 1 millisecond response? Less than 1 microsecond response? Do I need Real-Time? What is Real Time? Maybe...

9 9 Real-Time "A real-time system is one in which the correctness of the computations not only depends upon the logical correctness of the computation but also upon the time at which the result is produced. If the timing constraints of the system are not met, system failure is said to have occurred." Donald Gillies (comp.realtime FAQ)

10 10 A Simple Example... it doesnt do you any good if the signal that cuts fuel to the jet engine arrives a millisecond after the engine has exploded Bill O. Gallmeister - POSIX.4 Programming for the Real World

11 11 ATM Hard vs. Soft Real Time Hard –absolute deadlines –late responses cannot be tolerated and may have a catastrophic effect on the system –example: flight control Soft –systems which have reduced constraints on "lateness; e.g. late responses may still have some value –still must operate very quickly and repeatably –example: cardiac pacemaker

12 12 Real-time OS Requirements Operating system factors that permit real-time: –Thread Scheduling –Control of Priority Inversion –Time Spent in Kernel –Interrupt Processing

13 13 Factor #1: Scheduling Non real-time scheduling –round-robin –FIFO –adaptive Real-time scheduling –priority based –sporadic

14 14 Sequence: 1. Low priority task acquires bus mutex to transfer data 2. High priority task blocks until mutex released 3. Medium priority task pre-empts low priority task 4. Watchdog timer resets since Bus Manager has not run in some time Factor #2: Priority Inversion Source: Embedded Systems Programming Information Bus Manager Meteorological Data Gathering Task Communications Task

15 15 Factor #3: Kernel Time Kernel operations must be pre-emptible –if they are not, an unknown amount of time can be spent in the kernel performing an operation on behalf of a user process –can cause real-time process to miss deadline All kernels have some window (or multiple windows) of time where pre-emption cannot occur Some operating systems attempt to provide real- time capability by adding checkpoints within the kernel so they can be interrupted at these points

16 16 int KER iret Entry a few opcodes Interrupts off Unlocked Kernel Operation which may include message pass u secs to msecs Pre-emptable Exit a few opcodes Interrupts off Locked u secs No pre-emption Interrupts on Unlocked u secs Pre-emptable A Kernel call is a software interrupt Example

17 Split Out Long Operations Process Manager Thread Sync Message Sched Signal Channel Clock Timer Intr Fork Exec Pathname Spawn Mmap Waitpid Session UID/GID Debug NtoProc

18 18 Factor #4: Interrupts This is broken down into the following areas: Method of handling the interrupt processing chain Handling of Nested Interrupts

19 19 Interrupt Processing Chain ISR INT x ISR INT y IST IST scheduled whenever queue emptied, non- deterministic ISR INT x ISR INT y IST IST scheduled by normal OS scheduling, deterministic

20 20 Conventional OS Real-time kernel Problems –different APIs –real-time layer proprietary –existing OS apps not R/T –poor communication between operating systems –loss of control issue Can I Make Any Conventional OS Real-Time Method –Add real-time layer below conventional OS, running conventional OS as a low priority real-time process –Add real-time layer to hardware service layer

21 21 Title of presentation Title 2 Scalability

22 22 Scaling Solution #1: Single Board, Single Node CPU Bridge Mem. Bus PCI Peripherals The only scaling possible is a CPU replacement

23 23 Scaling Solution #2: Single Board, Multiple Nodes Relatively simple to implement Allows scaling-on-demand Suitable if nodes have independent work Inter-node IPC slower than memory access Complexity in maintaining global view of data Difficult to break-up computationally-intensive tasks CPU Bridge Mem. Bus PCI Peripherals CPU Bridge Bus PCI Peripherals Node 1 Node 2

24 24 Scaling Solution #3: Single Board, Multiple Processors CPU 0 Bridge Mem. Bus PCI Peripherals CPU 1 Tightly-coupled symmetric multiprocessing (SMP) All processors have a symmetric and consistent view of physical memory and peripherals Scales processing power Need software (RTOS) support

25 25 The SMP OS Dilemma SMP systems to date use desktop operating systems; not responsive enough for real-time requirements Application servers Databases Web servers Typical real-time operating systems (home-built or commercial), such as are commonly used in routers and switches today, do not have SMP support SMP capable real-time operating systems run the CPUs as independent processors with independent operating systems

26 26 SMP Support True (tightly coupled) SMP support Only the kernel needs SMP awareness Transparent to application software and drivers - identical binaries for UP and SMP systems Automatic scheduling across all CPUs

27 27 Thread Running CPU 0 Process CPU 1 Thread Process Ready queues 63 Priority Thread Blocked states Thread QNX True SMP STATE_RUNNING thread on each processor Priority-based ready queues Each thread can be locked to a specific CPU by using a processor affinity mask Scheduler remembers last CPU thread ran on –Minimize thread migration –Optimize cache usage Highest-priority READY thread always immediately scheduled

28 28 Why Is Cache Important? Cache efficiency is probably the single largest determinant of performance on SMP Coherent view of physical memory is maintained using cache snooping Cache snooping is done at the CPU bus level and so operates at lower speeds than core Coherency is invisible to software

29 29 Performance Implications Snoop traffic expected on SMP Cache hits generally cause no bus transaction Multiple processors writing to same location degrades performance (ping-pong effect) Performance degrades when large amount of data modified on one processor and read on the other Sometimes it is better to have specific threads in a process run on same CPU

30 30 Designing for SMP: One Big task Single thread Giant App Will not work with SMP

31 31 Designing for SMP: Single Threaded Tasks App 1 Single thread App 2 Single thread Works with SMP Process data can be shared with shared memory Good concurrency, some complexity IPC not usually as efficient as memory sharing

32 32 Designing for SMP: Scaling Software with Threads Threads Server Single copy server All process data is implicitly shared and accessible Can achieve good concurrency with less complexity POSIX synchronization used Mutexes Semaphores Condition variables Usually more efficient than inter-process synchronization Note: SMP finds concurrency problems fast!

33 33 Optimizing Compute-intensive Applications Main thread Threads Application Worker thread Pool of worker threads Dispatch work to worker threads Scales very well with SMP The tricky part is breaking up the problem

34 34 CPU 0 CPU 1 IRQ 7 IRQ 8 IRQ 9 IRQ 10 IRQ CPU ISR IST Interrupt processed on CPU that was targeted Can distribute load by handling interrupts on different processors Sometimes not the optimal strategy due to cache effects Interrupt Handling

35 35 Scaling Solution #4: Multiple Processors/Nodes CPU 0 Bridge Mem. Bus PCI Peripherals CPU 1 CPU 0 Bridge Bus PCI Peripherals CPU 1 Node 2 Node 1

36 36 Network Chassis Network... High-speed interconnect Low-speed bus Line card Example

37 QNET Messages flow transparently through QNET from one message bus to another. LAN or Internet or Backplane QNET Microkernel App All applications and servers become network distributed without any special code. Flash Fsys CDROM Fsys TCP/IP Audio PhotonApp Process Manager The QNET MicroNetwork

38 38 LinecardLinecardControlcard QNX Qnet Manager Extends message passing across multiple QNX microkernels Over anything with a packet driver: –Ethernet, RapidIO, 3GIO, InfiniBand, Stargen, etc. Class of service Use symbolic prefixes to make client code independent of location of resource manager

39 39 Line card Control card Line card One or multiple links can connect different nodes. QNET Class of Service

40 40 Data is sent out the link which will deliver it the fastest. This is based upon link speed and queue length for each link. Line card Control card Line card QNET: Load-Balanced Distribution

41 41 Data is sent out a primary link. If it fails, data is diverted to a secondary link. The primary link is probed and when it comes back online, data is diverted back to it. Line card Control card Line card QNET: Ordered Distribution

42 42 Data is sent out both links at the same time. A failure on either of the links is handled gracefully. Line card Control card Line card QNET: Parallel Distribution

43 43 Designing for Networked SMP: Single/Multi Threaded Tasks App 1 Multiple threads App 2 Single thread Different processes necessary for different nodes Works with SMP Process data can be shared with shared memory IPC for networked communication

44 44 Client /service Client Node A B /net/a/dev/service /net/b/dev/service Simple link provides transparent redirection Process has to monitor status of link Switch over is not transparent to client Transparent Redirection

45 45 Client Client Node A B /net/a/dev/service /net/b/dev/service Service mgr Service manager acts as a proxy Monitors health of and/or load on services/nodes Switch over is transparent to client /dev/service Transparent Redirection

46 46 Client Client Node A B /net/a/dev/service /net/b/dev/service Service mgr /dev/service Requests serviced redundantly First/majority/best result Different implementations Redundant Links

47 FLASH FSYS TCP/IP App Blue Tooth Qnet MOST BUS FLASH FSYS Graphics Browser Audio Photon Qnet CDROM FSYS Graphics Browser Audio Photon Qnet

48 FLASH FSYS TCP/IP App Blue Tooth FLASH FSYS Graphics Browser Audio Photon Qnet CDROM FSYS Graphics Qnet MOST BUS Browser

49 49 Title of presentation Title 2 Reliability and Availability

50 50 Why? Embedded systems are different! Failure in an embedded system can have severe effects - like death … Pilots really hate to be told they have to reboot their plane while in flight Walter Shawlee

51 51 Definitions MTBF: Mean Time Between Failure –The average number of hours between failures for a large number of components over a long time. (e.g. MIL-HDBK-217) MTTR: Mean Time To Repair –Total amount of time spent performing all corrective maintenance repairs divided by the number of repairs MTBI: Mean Time Between Interruptions. –The average number of hours between failures while a redundant component is down.

52 52 Defining HA Quantified by failure rate (MTBF) Time to resume service after failure is MTTR Reliability Allows for failure, with quick service restoration. As MTTR 0, Availability 100% Availability % uptime) Assume faults exist: design to contain, notify, recover and restore rapidly 5 Nines

53 53 Source: Gartner Group ($13,000/minute Cross-industry Average) Annual Cost of Downtime versus Availability Costs speak for themselves

54 54 Availability via Reliability and Repair low MTTR -> high availability –System is composed of reliable components, that are protected from each other, and that communicate ONLY through well known interfaces. this leads to –fault isolation –speedy recovery –reset a component not a board/system –dynamic control stop/start upgrade

55 55 Software vs Hardware HA Hardware HA –utilizes redundancy of key components a single fault cannot cause all redundant components to fail (No SPOF). e.g. mirrored disks, multiple system boards, I/O cards –Active/active, active/spare, active/standby Software is a Significant Cause of Downtime But thats only part of the problem!!!

56 56 Comparison

57 57 High Level Look at a Core Router/Switch One or more control elements

58 58 Handling Failures Isolate Fault to a Board Switch to Backup

59 59 Route Manager TCP/IP stack SNMP Manager Application Flash Drivers Device Manager Network Manager RTOS Application Hardware Application Isolate fault to a SW component May not be in the Hardware

60 60 Route Manager TCP/IP stack SNMP Manager Application Flash Drivers Device Manager Network Manager RTOS Application Faulty Software Component Isolate and contain Repair (e.g. restart) Notify Diagnose Upgrade Ideal: Identify and Fix

61 61 Component-level recovery rarely done Lack of suitable protection and isolation Lack of modularity Tight component coupling Few dynamic capabilities Software failures normally handled by: Hardware watchdogs Redundant boards

62 62 Repair Time Board Replacement Hours Reboot Minutes Failover to Standby Seconds SW Component Restart 10s Milliseconds SW Failover Milliseconds

63 63 TCP/IP HA Manager restarts service FLASH FSYS DISK FSYS Microkernel TCP/IP HA Manager ATM Process Memory ViolationKernel notifies HA Manager Dump file for post-mortem analysis High Availability Manager

64 64 Driver HAM Guardian HAM Checkpointed State Stack App Checkpointed State HA Manager (HAM) monitors components, sends notification of component failure Heart-beat services detect component hangs Core file on crash can be created for debugging and analysis Checkpointing permits recovering current state Notification and Recovery

65 65 A second shadow server attaches to the same name Recovery

66 66 A second shadow server attaches to the same name If primary faults, new clients connect to shadow server Old clients can re-connect to shadow server. Recovery

67 67 Start a new shadow server Recovery

68 68 Server v 1.0 Client /dev/service Server v 1.1 New Client Service Upgrades New version of server attaches to same name New clients connect to new server Old server exits when all old clients have exited

69 69 QNX Momentics Tools

70 70 Design Goals Tools needed to be easy to learn Tools which could take advantage of QNX Tools which could integrate tools from other vendors, company designed tools, and industry specific tools and have them work with our tools and each other Tools needed to be customizable to the user or the company

71 71 Windows, Solaris, QNX Neutrino IDE Workbench (Eclipse framework) IDE Workbench (Eclipse framework) Source debugger Java code developer Target information System builder Profiler Photon app builder Memory analysis C/C++ code developer Target agent Target agent Photon microGUI Flash fsys Flash fsys TCP/IP Http server Http server Java Ethernet, Serial, JTAG, ROMulator Microkerne l Command- line tools BSPs DDKs Neutrino runtime 3 rd -Party Tools Virtio Invoke command-line tools QNX ® Neutrino ® RTOS Rational …TBA XScale QNX ® Momentics The Best Tools and the Best RTOS

72 72 IBM donated Framework Java IDE 200 person-years of effort Open Source Consortium founding members include QNX IDE: Standards based

73 73 System Profiling

74 74 Protocol TCP/IP Device Driver Application Instrumented MicroKernel Trace System Event Log System Events interrupts, scheduler, messages, system calls System Characterization Performance analysis Field diagnostic Live or post-mortem Printer Data display Statistical & Numerical Analysis Systems Analysis Toolkit

75 Providing Technology for Today… Architecture for Tomorrow Irvine Office David Weintraub - Regional Sales Manager Woodland Hills Office Jeff Schaffer - Sr. Field Applications Engineer

Download ppt "1 Presented by: Jeff Schaffer Sr. Field Applications Engineer QNX Software Systems 818-227-5105 Embedded Operating Systems: The State."

Similar presentations

Ads by Google