Controls Zheqiao Geng Oct. 12, 2010 1 Autosave Additions/Upgrades and Experiences at SLAC Zheqiao Geng Controls Department SLAC National Accelerator Laboratory.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

CLS Process Variable Database By: Diony Medrano. CLS PV Database - Topics Background Design Constraints Design and Implementation Benefits and Future.
1 1999/Ph 514: Channel Access Concepts EPICS Channel Access Concepts Bob Dalesio LANL.
EPICS Base R and beyond Andrew Johnson Computer Scientist, AES Controls Group.
UPLOAD YOUR S2 SCANNER. To upload your Scanner means: Sending the data of the scans you made from your Scanner to the worldwide Nu Skin server. Benefits:
LCLS Control System Overview Mike Zelazny SLAC ICD Software Department Accelerator Directorate.
SNS Integrated Control System Running IOC Core on Windows and Linux Dave Thompson Wim Blokland Ernest Williams.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA UNCLASSIFIED Managing IOCs with Local Filesystems Scott A. Baily.
Channel Access Protocol Andrew Johnson Computer Scientist, AES Controls Group.
Controls Group New Channel Access Nameserver Joan Sage 12/4/01.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Automatic Plagiarism detection Charlie Daly Jane Horgan Dublin City University.
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
April, 2005 EPICS Collaboration Controls Group Alarm Management at Jefferson Lab A New Configuration and Extended Capabilities B. Bevins, M. Joyce, J.
SPEAR PV RDB Database EPICS Collaboration Meeting April 27, SPEAR PV RDB Database Clemens Wermelskirchen.
Backup and Recovery Part 1.
Oracle9i Database Administrator: Implementation and Administration
A U.S. Department of Energy Office of Science Laboratory Operated by The University of Chicago Argonne National Laboratory Office of Science U.S. Department.
Perforce (Version Control Software). Perforce is an enterprise version management system in which users connect to a shared file repository. Perforce.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
1 CS503: Operating Systems Part 1: OS Interface Dongyan Xu Department of Computer Science Purdue University.
ORNL is managed by UT-Battelle for the US Department of Energy EPICS State Notation Language (SNL), “Sequencer” Kay Kasemir, SNS/ORNL Many slides from.
Scan System Kay Kasemir, Xihui Chen Jan Managed by UT-Battelle for the U.S. Department of Energy Automated Experiment Control “Scan” should be.
Scan System: Experiment Automation Kay Kasemir, Xihui Chen RAL EPICS Meeting, May 2013.
ICS – Software Engineering Group 1 The SNS General Time Timestamp Driver Sheng Peng & David Thompson.
WWW8 - Toronto "A Runtime System for Interactive Web Services" May 12, 1999 A Runtime System for Interactive Web Services Claus Brabrand, Anders Møller,
Imperial College Tracker Slow Control & Monitoring.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
EPICS devSNMP Extensions Euan Troup, CSIRO Australia Telescope National Facility ASKAP Project Paul Wild Observatory.
Redundancy. 2. Redundancy 2 the need for redundancy EPICS is a great software, but lacks redundancy support which is essential for some highly critical.
Dec 8-10, 2004EPICS Collaboration Meeting – Tokai, Japan MicroIOC: A Simple Robust Platform for Integrating Devices Mark Pleško
ICS – Software Engineering Group 1 GeneralTime Proposal Status at SNS and Ideas for EPICS base David Thompson Sheng Peng Kay-Uwe Kasemir.
ActiveX CA Server/Client Update Nov Kay-Uwe Kasemir, LANL.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
ORNL is managed by UT-Battelle for the US Department of Energy CS-Studio PVTable and Autosave Kay Kasemir Oct
Time Management.  Time management is concerned with OS facilities and services which measure real time, and is essential to the operation of timesharing.
1 EPICS Bus Errors & Power PC u VME Bus Error can generate a Machine Check Exception (MCE) u The WRS mv2700 BSP doesn’t have code to do this u Write cycles.
EPICS Collaboration Meeting Fall PAL October 22 ~ 26, 2012 LCLS Timing System (pattern design, evGUI, and high level) Mike Zelazny for LCLS Timing.
Managed by UT-Battelle for the Department of Energy EPICS Sequencer Kay Kasemir, SNS/ORNL Many slides from Andrew Johnson, APS/ANL Feb
BMTRY 789 Lecture 11: Debugging Readings – Chapter 10 (3 rd Ed) from “The Little SAS Book” Lab Problems – None Homework Due – None Final Project Presentations.
Writing a Channel Access Client in EPICS Bob Dalesio, April 5, 2000.
Writing a Channel Access Client in EPICS Bob Dalesio, April 5, 2000.
Reliability/ Secure IOC / Outlook M. Clausen / DESY 1 CA-Put Logging BurtSave Warm Reboot Matthias Clausen DESY/ MKS.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
DEPARTEMENT DE PHYSIQUE NUCLEAIRE ET CORPUSCULAIRE JRA1 Parallel - DAQ Status, Emlyn Corrin, 8 Oct 2007 EUDET Annual Meeting, Palaiseau, Paris DAQ Status.
ICS – Software Engineering Group 1 IOC Operations at SNS Carl Lionberger.
1 1999/Ph 514: Flow of Control EPICS Flow of Control Marty Kraimer APS.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
CS333 Intro to Operating Systems Jonathan Walpole.
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY), ICALEPCS, 2007 XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser Redundant EPICS.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
Advanced Photon Source Channel Access, CaSnooper, and CASW Kenneth Evans, Jr. Presented November 7, 2003 Argonne National Laboratory.
(1) Introduction to Continuous Integration Philip Johnson Collaborative Software Development Laboratory Information and Computer Sciences University of.
Data Coordinators Conference – 2014 Laura Marroquin CASEWORKER/JCMS Specialist Everything New Data Coordinators Should Know.
B. Dalesio, N. Arnold, M. Kraimer, E. Norum, A. Johnson EPICS Collaboration Meeting December 8-10, 2004 Roadmap for IOC.
Stanford Linear Accelerator Center Michael Zelazny EPICS Collaboration Meeting Dec 3&4, Channel Watcher Bumpless Reboot Replacement Related Web Page:
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
EPICS and LabVIEW Tony Vento, National Instruments
1 EPICS Flow of Control: EPICS Workshop at IHEP, Beijing, August 2001 EPICS Flow of Control Marty Kraimer APS.
MCast Errors and HV Adjustments Multicast Errors (seen on the DATA ERIS connection) have caused a disruption of a HV Adjustment due to a timeout (since.
The BaBar Online Detector Control System Upgrade Matthias Wittgen, SLAC.
From VME/RTEMS to Industrial PC/LinuxRT: A Migration Story Mitch D’Ewart May EPICS Collaboration Meeting May 2015.
Fix: Windows 10 Error Code 0x in Mail App u/6/b/ /alexwaston14/reimage-system-repair/ /pages/Reimage-Repair-Tool/
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Using COTS Hardware with EPICS Through LabVIEW – A Status Report EPICS Collaboration Meeting Fall 2011.
Jonathan Walpole Computer Science Portland State University
Introduction to Computers
Oracle9i Database Administrator: Implementation and Administration
The Lua Chunk Vault, an enhancement to epics base
Presentation transcript:

Controls Zheqiao Geng Oct. 12, Autosave Additions/Upgrades and Experiences at SLAC Zheqiao Geng Controls Department SLAC National Accelerator Laboratory EPICS Collaboration Meeting Fall 2010 Oct. 12, 2010

Controls Zheqiao Geng Oct. 12, Outline Introduction to Autosave Problems reported at SLAC during operation Additions/upgrades of Autosave Conclusion

Controls Zheqiao Geng Oct. 12, Introduction to Autosave

Controls Zheqiao Geng Oct. 12, Autosave Autosave automatically saves the values of EPICS process variables (PVs) to files on a server, and restores those values when the IOC is rebooted. - From autosave manual

Controls Zheqiao Geng Oct. 12, Save and Restore Methods Supported Data save methods: Periodic: save data periodically with defined period Triggered: save data on the trigger of the CA events of a defined PV Monitored: check any changes of the PVs at the saving list periodically, and save the data if there is any change Manual: save data by shell commands Data restore methods: Pass 0 restore: restore PV values at database initialization pass 0 Pass 1 restore: restore PV values at database initialization pass 1 Manual restore: restore data by shell commands

Controls Zheqiao Geng Oct. 12, Context of Autosave

Controls Zheqiao Geng Oct. 12, Threads Concern to Autosave Run-time data save thread IOC shell Callback threads

Controls Zheqiao Geng Oct. 12, Operation Experiences (problems reported) at SLAC

Controls Zheqiao Geng Oct. 12, IOCs Using Autosave at SLAC Soft IOCs running at Redhat Linux Hard IOCs running at RTEMS on MVME6100 Embedded IOCs running at RTEMS on ColdFire uC5282

Controls Zheqiao Geng Oct. 12, Reported Problems during Operation P1: Failed to write the.sav file in some cases, like after NFS server reboot P2: Stop attempting to write the.sav file with unknown reasons P3:.sav file is updated, but the new PV data is not written in P4: Bad file descriptors for soft IOC P5: [RTEMS] Failed to flush the saved data into the NFS disk P6: Status string sometimes can not correctly reflect the problems (example: after NFS reboot, the status string still show “Can’t open save file”) P7: The buffer size for saving list auto-generation from the info field of the record is too small

Controls Zheqiao Geng Oct. 12, Additions/Upgrades of Autosave

Controls Zheqiao Geng Oct. 12, Automatic NFS Remounting for RTEMS For P1 (Failed to write the.sav file in some cases, like after NFS server reboot) If there are too many file saving failures (such as failed to open, read or write the file), remount the NFS The file status is also checked after writing to the disk Implement operating system dependent codes for NFS mounting for different OS (vxWorks, RTEMS and Linux)

Controls Zheqiao Geng Oct. 12, Timeout Checking for Callback Functions For P2 (Stop attempting to write the.sav file with unknown reasons) By checking the existing code, we find that the method to activate the data saving for PERIODIC and MONITORED save is potentially risky: Callback routines are used to activate the saving; Callback is ONLY requested again after the saving is activated so as to introduce some delay So, if callback does not work even ONCE, the data saving will not be activated, and the callback will never be requested again, and data saving will never happen again So timeout checking is added for callback function, if timeout, force to activate the data saving to trigger the saving loop

Controls Zheqiao Geng Oct. 12, Retrying of CA Connection In the existing code, PVs are connected only at the start of the program. The functionality for retrying of the unconnected PVs is added. Include the temporary unreachable PVs into the saving list without rebooting the IOC

Controls Zheqiao Geng Oct. 12, Cleaning Up the Status String For P6 (Status string can not correctly reflect the problems (example: after NFS reboot, the status string still show “Can’t open save file”)) In the existing code, the status string always shows the most serious failures of different parts of the program The status string of “Can’t open save file” is generated if there is error during reboot restore with the highest serious level, which will mask the status report of all other parts during run-time Solution: only keep this status string as an separate reboot_status

Controls Zheqiao Geng Oct. 12, Increased Buffer Size for makeAutosaveFileFromDbInfo() For P7 (Too small buffer size for saving list auto-generation from the info field of the record) In some soft IOCs, there are tens of fields need to be saved for one record, the info field of the record should be large enough to contain all these field names Increase buffer size from 100 to 2048 bytes

Controls Zheqiao Geng Oct. 12, Other Problems P3 (.sav file is updated, but the new PV data is not written in) P4 (Bad file descriptors) P5 ([RTEMS] Failed to flush the saved data into the NFS disk) Not appear again until now Still need concrete examples and investigations to solve them

Controls Zheqiao Geng Oct. 12, Conclusion

Controls Zheqiao Geng Oct. 12, Conclusion and Outlook The modified version of Autosave has been running with several IOCs at LCLS for ~1 month, and the problems reported before did not appear up to now The additions/upgrades work smoothly More operation experiences are needed to improve and finalize the design Documents for the requirements/architecture of Autosave are also worked out (including reverse engineering from the source codes) We will submit both the modified source package and documents to the initial author (Tim Mooney) and the collaborations for review. Hope to hear about the experiences from other labs and we can make Autosave more robust for various platform together

Controls Zheqiao Geng Oct. 12, Thank you for your attention!