Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center.

Similar presentations


Presentation on theme: "Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center."— Presentation transcript:

1 Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center

2 Background August 23, 2007 –There was a failure of a Programmable Logic Controller (PLC) –The PLC was repaired using the incorrect version of software –The failure revealed systemic shortcomings within the safety systems group

3 Development R&D efforts –Two engineers –Ten architectures –Three working demonstrations –Numerous software revisions Design Effort –Two engineers –Three programmable systems –Two versions of software Test lab version (development) Field version (reviewed and approved)

4 Approved Architecture

5 Proposed Policy Software repository –Single location to retrieve approved software (Concurrent Versions System) Testing –Verification of approved software version –Black box certification Repair policy –Failed CPU requires system certification

6 Development Software Locations Not in CVS A:\Floppy Disks C:\Desk Top Drive D:\CD Burner F:\Flash Drives V:\Group Drive Z:\Employee Drive

7 Vendor Notification Allen-Bradley issued an engineering note –We determined that the application software was not affected by the problem –We determined that we did not wish to upgrade the operating system without an appropriate amount of testing –Our test and development systems did not exhibit any problems The engineering note did not specify that the operating system was affected by the problem

8 System in Operation The Laser Safety System is certified The Personnel Protection System is certified –The safety system operates as expected –The software is verified to be the correct version –The system is power cycled to verify power up cycle The internal registers of the CPU are reset The battery system installation is completed –The temporary power is removed –The battery power is turned on The safety systems are power cycled Four month pass with no problems –The system manager/engineer goes on vacation –System fails while manager is on vacation

9 Day One Two electrically isolated systems fail safe Staff – one technician –Technical depth Wiring Hardware Software (safety vs. process control) Debugging tools It is not possible to restore the system –The on-call tech has insufficient ability

10 Safety System Architecture

11 Day Two There were two simultaneous core dumps Staff – eight –Authority to authorize a repair –Technical ability to troubleshoot the system The recovered system reveals problems –The status reported on EPICS was not all correct One version of approved software in CVS Development software in numerous software locations Troubleshooting by black box methods

12 Day Three The recovered system requires certification Staff – one technician, one engineer –The manager has returned The certification document was not consistent with the previously executed version –A hardcopy was not available, so a document was printed from Microsoft Word –Track Changes in Microsoft Word Created two versions –Version Control

13 Shortcomings Failure to react to vendor notification –Not a root cause Would have delayed but not prevented an incident Personnel Resources –Training No qualified backup engineer with adequate knowledge of the system –Authorization No authorized backup engineer or technician Document Control –Controlled copies of procedures A controlled hardcopy of the certification procedure was unavailable –Written policies Lack of a robust and reliable procedure for retrieval of software –Released documentation Lack of a clear procedure for retrieval of current documentation

14 Root Causes Insufficient resources within the safety group Lack of skill sets –Need for another highly skilled safety system expert Inadequate peer review –Design –Software –Documentation –Procedures Insufficient document control process –Controls Department needed to define a document control process Lack of external verification and formal tracking of actions required by RSC –This should be done by a Controls Department member with authority to allocate resources

15 Corrective Actions Hire more staff Develop safety systems documentation Training and cross training Formal peer reviews Formal tracking and verification Department directives defining authority

16 Hire more staff Staffing Plan –Address the immediate need for another senior safety system engineer by arranging for a sabbatical visit by a senior engineer from another laboratory – A program is being put in place to foster cooperation with other labs. –Filled employee requisitions for key missing positions including A safety system manager A senior safety engineer A Documentation specialist An associate engineer

17 Documentation Created a web accessible site on the SLAC intranet for all safety system group documentation Created an accurate and complete document hierarchy and document catalog Created a document describing the documentation process Created procedures describing how to download from CVS and upload to A/B and Pilz PLCs –Deployed procedures on the SLAC intranet Created a Roles and Responsibilities document for the PPS group that has been approved by the Controls Department line management –The document provides Completed PPS Group training in the new documentation system and PLC code management system

18

19

20 More Documentation Develop software configuration management procedures and tools –Defined a configuration management process –Documented the process, including reviews-and- approvals process following major/minor changes –Developed a procedure for retrieving and deploying the current software version from CVS A tool for extracting the coded version number from an Allen- Bradley PLC was developed

21

22 Training and cross training Training of PPS Group members in new documentation and software procedures –Work has started on cross training in the PPS Group and with more formalized job assignments –This is the beginning of a formal training program –For the short term, training in the new documentation and software management procedures have been completed

23 Reviews, Tracking and Verification Reviews – Establishment a formal Life Cycle and process for PPS Reviews –Internal to the PPS group peer review of the design, software, and documentation –Internal to the Controls Department Tracking and external verification of the requirements mandated by the RSC Formal statement of policies on approval authorities, PLC operating system software upgrades, etc –External An annual review of safety systems practices at SLAC by experts from other laboratories

24 Department directives Organization –Re-organize the Safety Section in the Controls Department to establish an independent team focused on the operational issues, including procedures, documentation, liaison with Operation and RSC –The new safety section will consist of three branches including Engineering Operations, Procedures, Compliance, Liaisons with OPS, RSC, etc. Maintenance

25 Software Configuration Management Policy –A written policy must exist Execution –Staff must be aware of the policy –Staff must be trained to apply the policy Quality Control –Formal tracking and verification needs to be in place to insure that the policy is being followed

26 Q&A State Your –Name –Organization –Title –Question


Download ppt "Software Configuration Management Lessons Learned Patrick Bong Safety Systems Group Stanford Linear Accelerator Center."

Similar presentations


Ads by Google