GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Software Watchdog Steve Mazzoni Stanford Linear Accelerator.

Slides:



Advertisements
Similar presentations
State Machines An approach to assembler coding. Intro State Machines are an integral part of software programming. State machines make code more efficient,
Advertisements

System Integration and Performance
68HC11 Polling and Interrupts
GLAST LAT Instrument 1 Summary of Progress  Completed TVAC with no additional reboots  Ran refresh rate test showing that the refresh rate was not an.
GLAST LAT Project Online Peer Review – July 20, Integration and Test R. Claus 1 GLAST Large Area Telescope: I&T Integration Readiness Review.
1 GLAST Large Area Telescope Monthly Mission Review LAT Flight Software Status June 6, 2007 Jana Thayer Stanford Linear Accelerator Center Gamma-ray Large.
GLAST LAT Project I&T Integration Kickoff Meeting 03/09/04 Online 1 GLAST Large Area Telescope: I&T Integration Kickoff Meeting EGSE Hardware March 9th,
GLAST Large Area Telescope Instrument Flight Software F2F Meeting March 17, 2005 Jeff Fisher FSW Manager Stanford Linear Accelerator Center Gamma-ray Large.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
ISOC Peer Review - March 2, 2004 Section GLAST Large Area Telescope ISOC Peer Review Test Bed Terry Schalk GLAST Flight Software
What do operating systems do? manage processes manage memory and computer resources provide security features execute user programs make solving user.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
Windows Server 2008 Chapter 11 Last Update
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Event Filtering J.J. Russell Stanford Linear Accelerator.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Telecommand and Telemetry Data Dictionary Services Sergio.
Interrupts. 2 Definition: An electrical signal sent to the CPU (at any time) to alert it to the occurrence of some event that needs its attention Purpose:
Booting in Windows XP Presented and Designed By: Luke Ladd.
Computing hardware CPU.
GLAST Large Area Telescope LAT Flight Software System Checkout TRR Current Status Sergio Maldonado FSW Test Team Lead Stanford Linear Accelerator Center.
A Simple Tour of the MSP430. Light LEDs in C LEDs can be connected in two standard ways. Active high circuit, the LED illuminates if the pin is driven.
Input and Output Computer Organization and Assembly Language: Module 9.
ENTC-489 Embedded Real Time Software Development Embedded Real Time Software Development Week 10 Real Time System Design.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 LAT Housekeeping Sergio Maldonado Stanford Linear Accelerator.
Event Management & ITIL V3
GLAST LAT ProjectDOE/NASA Peer Review, March 19-20, 2003 GLAST Large Area Telescope: Electronics, Data Acquisition & Instrument Flight Software Flight.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Primary Boot Code (PBC) D. Wood Naval Research Laboratory.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Diagnostics Framework James Swain Stanford Linear Accelerator.
GLAST LAT ProjectDOE/NASA Peer Review, March 19-20, 2003 GLAST Large Area Telescope: Electronics, Data Acquisition & Instrument Flight Software Flight.
Time Management.  Time management is concerned with OS facilities and services which measure real time, and is essential to the operation of timesharing.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Thermal Control System Steve Mazzoni Stanford Linear.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Electronic Analog Computer Dr. Amin Danial Asham by.
RBSP Radiation Belt Storm Probes RBSP Radiation Belt Storm Probes 12/25/20151 Flight Software Template for Instrument Critical Design Review Gary M. Heiligman.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
GLAST LAT Project LAT System Engineering 1 GLAST Large Area Telescope: LAT System Engineering Pat Hascall SLAC System Engineering Manager
GLAST Large Area Telescope LAT Flight Software System Checkout TRR Systems Engineering Mike DeKlotz GSFC Stanford Linear Accelerator Center Gamma-ray Large.
GLAST Large Area Telescope LAT Flight Software System Checkout TRR Test Suites (Backup) Stanford Linear Accelerator Center Gamma-ray Large Area Space Telescope.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Instrument Configuration James Swain Stanford Linear.
GLAST LAT ProjectISOC CDR, 4 August 2004 Document: LAT-PR-04500Section 3.21 GLAST Large Area Telescope: Instrument Science Operations Center CDR Section.
Structure and Role of a Processor
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
GLAST LAT ProjectCDR/CD-3 Review May 12-16, 2003 Document: LAT-PR Section 5 IOC Subsystem 1 GLAST Large Area Telescope: IOC Subsystems WBS: 4.1.B.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
GLAST Large Area Telescope Instrument to Spacecraft Interface Simulator (ISIS) Test Readiness Review 15 December 2004 Jana Thayer Eric Hansen Stanford.
GLAST Large Area Telescope LAT Flight Software System Checkout TRR FSW Overview Sergio Maldonado FSW Test Team Lead Stanford Linear Accelerator Center.
GLAST Large Area Telescope LAT Flight Software System Checkout TRR Test Environment Sergio Maldonado FSW Test Team Lead Stanford Linear Accelerator Center.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Secondary Boot Code (SBC) D.Wood Naval Research Laboratory.
GLAST Large Area Telescope LAT Flight Software System Checkout TRR Software Quality Assurance Kelly Burlingham SQE Stanford Linear Accelerator Center Gamma-ray.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Instrument Configuration by File James Swain Stanford.
GLAST Large Area Telescope Instrument Flight Software Flight Unit Peer Review 16 September 2004 Software Architecture J. J. Russell Stanford Linear Accelerator.
Operational Flight Software
Do-more Technical Training
Lesson Objectives Aims Key Words Interrupt, Buffer, Priority, Stack
Introduction of microprocessor
Enabling System Elements
Processor Fundamentals
MARIE: An Introduction to a Simple Computer
BIC 10503: COMPUTER ARCHITECTURE
BRX Technical Training
Components of a CPU AS Computing - F451.
CSE 451: Operating Systems Autumn 2003 Lecture 2 Architectural Support for Operating Systems Hank Levy 596 Allen Center 1.
CSE 451: Operating Systems Autumn 2001 Lecture 2 Architectural Support for Operating Systems Brian Bershad 310 Sieg Hall 1.
Interrupt handling Explain how interrupts are used to obtain processor time and how processing of interrupted jobs may later be resumed, (typical.
GLAST Large Area Telescope
GLAST Large Area Telescope
GLAST Large Area Telescope
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
<Your Team # > Your Team Name Here
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

GLAST Large Area Telescope Instrument Flight Software Flight Unit Design Review 16 September 2004 Software Watchdog Steve Mazzoni Stanford Linear Accelerator Center Gamma-ray Large Area Space Telescope

16 September 2004 Flight Unit Peer Review - Software Watchdog 2 Software Watchdog: Requirements Flight Software General Requirements: –Watchdog ( ) Once booting of a unit is complete, the FSW shall provide a heartbeat to a hardware watchdog that reboots the unit if the heartbeat is not received within 10 seconds.

16 September 2004 Flight Unit Peer Review - Software Watchdog 3 Software Watchdog: Functional Components Functional Inputs –The Watchdog registers callbacks, through which other system tasks report their status/progress –The Watchdog can accept telecommands to set operational parameters or instruct the Watchdog to force a timed software or hardware reboot of a CPU Functional Processing –The Watchdog checks progress indicators reported by other system tasks, and decides whether to (1) initiate a software reboot or (2) update the hardware watchdog to prevent a hardware reboot Functional Outputs –At regular intervals, the Watchdog function writes a value into the hardware watchdog timer present on the CPU to prevent the hardware watchdog from resetting the CPU –The Watchdog function sends telemetry reporting the Watchdog’s current operation mode, number of tasks checked, and a timestamp for the last time a watchdog cycle was run –For debugging purposes, records the identity of tasks failing their progress check and the identity of the last task checked prior to reboot or reset

16 September 2004 Flight Unit Peer Review - Software Watchdog 4 Design Overview The software Watchdog is a very simple task, run at high priority. –It wakes periodically and resets the hardware watchdog register to a value greater than the software watchdog wake-up period –The task itself knows nothing about other tasks in the system. The software watchdog simply provides a uniform facility for tasks to monitor themselves. Any task can register a callback routine (and a pointer) with the software Watchdog. –When it wakes up, the Watchdog simply runs down the list of registered routines and calls them back in turn –The return code from the callback is examined by the Watchdog to determine whether the task targeted by the callback is making progress If any callback indicates that the task is not making progress, the software watchdog will initiate a software reboot once all callbacks have been called If all tasks report progress, the software watchdog will reset the hardware watchdog

16 September 2004 Flight Unit Peer Review - Software Watchdog 5 FSW Architecture (Watchdog-specific) 1553 Rx service Spacecraft Interface Unit Q Q Q Q 1553 Tx service LCB Tx service Watchdog LCB Rx service Q Q Slaves Event Processing Unit(s) Q LCB Tx service LCB Rx Q Slaves Event Builder (EB) output side. The EB is an element of the GASU. To EPU(s)To SSR Event Builder (EB) input side. The EB is an element of the GASU. Command/Response Unit (CRU). The CRU is an element of the GASU. From SIUFrom EPU(s) Event Assembly Solid State Recorder Spacecraft 1553 LAT Instrument Legend Telecommand (SC to LAT) Telemetry (LAT to SC) Master to slave Slave to master Physics data from LAT Data to SSR Command/Response Discretes (to SIU PIDs) To SIU Q Watchdog Q Masters

16 September 2004 Flight Unit Peer Review - Software Watchdog 6 Software Watchdog: Operational Modes The Watchdog supports the following modes. Transitions (caused by telecommand) are shown below. –Active “Standard” mode, in which task progress is checked and –Passive Check task progress and always reset the WDT –Timed Software Reset Load time at which software reboot should be performed –Timed Hardware Reset Load specified period into hardware watchdog and never reset Active Hardware Reset Software Reset Passive

16 September 2004 Flight Unit Peer Review - Software Watchdog 7 Software Watchdog: Masters and Slaves The Watchdog tasks collect and evaluate information from other tasks. If the information indicates the presence of a problem, they call a "bug check" mechanism. In addition, they issue the CPU's "heartbeat". If the hardware watchdog does not receive this on a timely basis, it will initiate a hard reset of the CPU's crate. The master task: –receives requests for software watchdog change-of-state requests, redistributing them to each targeted slave. Each slave task: –Implements the software watchdog function for its CPU (e.g., interrogating registered tasks for progress). –In passive mode, the slave always refreshes the hardware watchdog. –In active mode, the slave only refreshes the hardware watchdog if all subsystems report progress; otherwise, it performs a "warm boot".

16 September 2004 Flight Unit Peer Review - Software Watchdog 8 Diagnosing the Watchdog In Flight LSW will store information that persists across software reboots or hardware resets that can assist in determining the task responsible for the reboot. The information may be accessed by a memory dump. The persistent storage items are: –Circular (FIFO) buffer containing the name of 5 tasks that were not making progress during the last check cycle, i.e. the buffer is cleared after at the start of each progress check cycle. –Name of last task checked. This name is stored immediately preceding the call of each task callback. If a task “hangs”, its name will be saved in persistent storage, which can be downloaded when the system is started up again.

16 September 2004 Flight Unit Peer Review - Software Watchdog 9 Watchdog Telecommands Telecommands –Though commands and telemetry definitions have not yet been coded using the LCAT tool, the following general types of telecommands will be provided for use by developers and test personnel on the ground; they are not expected to be used in flight. Set software watchdog to active mode: This is the bread and butter case. If any task reports a lack of progress, initiate a software reboot. Set software watchdog to passive mode: This one is for developers. When debugging a task, it might appear to the software watchdog that the task is not making forward progress. In this mode, the software watchdog will perform all its normal functions, but will always reset the hardware watchdog. Set software watchdog to software reset in xxx milliseconds: This is for testers. The software watchdog will follow the software reboot logic after the requested interval, even if all tasks are reporting forward progress.

16 September 2004 Flight Unit Peer Review - Software Watchdog 10 Watchdog Telecommands (cont’d) Set software watchdog to hardware reset in xxx milliseconds: This is another one for testers. The software watchdog will set up the hardware watchdog to expire after the requested interval and then do nothing to reset it. Set software watchdog refresh period to xxx milliseconds and load hardware watchdog with period yyy milliseconds (yyy > xxx): This is another way for developers to put off any software/hardware resets into the far future. It also means that there’s no rush to figure out what the best periods for these are. –Diagnostics/debugging data in persistent storage retrieved using memory dump telecommands (not part of the LSW package) –Telecommands are forwarded by the SIU Watchdog master task to all Watchdog slave tasks on all CPUs.

16 September 2004 Flight Unit Peer Review - Software Watchdog 11 Watchdog Telemetry Telemetry –Provided by LSW to signify normal operation. The persistent storage structure is used for debugging after a reboot or reset. Telemetry from LSW includes: Current mode. Number tasks checked. A timestamp for the last time a watchdog cycle was run.

16 September 2004 Flight Unit Peer Review - Software Watchdog 12 LSW Software Package Organization and Configuration Management The LSW Package contains the following components: –Shareables liblsw – provides the watchdog functionality on both SIUs and EPUs LSW directly uses the following packages/constituents: –PBSPBS –MSGMSG