Presentation is loading. Please wait.

Presentation is loading. Please wait.

EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN

Similar presentations


Presentation on theme: "EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN"— Presentation transcript:

1 EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN
THERAC 25 RAM MOHAN EE 585 : CASE STUDY

2 Background The most serious computer related accidents to date.
Therac 25 was a medical linear accelerator , a linac developed by Atomic Energy Of Canada Ltd(AECL). Therac 25 was a radio therapy machine used to destroy tumors using high energy beams. 11 Therac 25s were installed in US , 6 in Canada. For shallow tissue penetration, the electrons are used; and to reach deeper tissue, the beam was converted into x-ray form. RAM MOHAN EE 585 : CASE STUDY

3 Background(Contd..) Therac 25 was derived from its previous version Therac 6 and Therac 20. Differences from Therac 20 - Uses double pass technique which is absent in previous versions - Software is responsible for safety - Hardware safety interlocks removed - Less space and economic RAM MOHAN EE 585 : CASE STUDY

4 Modes Of Operation RAM MOHAN EE 585 : CASE STUDY

5 Set Up Of The Machine RAM MOHAN EE 585 : CASE STUDY

6 General Layout RAM MOHAN EE 585 : CASE STUDY

7 Therac-25 Turntable Field Light Mirror Counterweight
Beam Flattener (X-ray Mode) Turntable Scan Magnet (Electron Mode) RAM MOHAN EE 585 : CASE STUDY

8 Accidents 3 June 1985 – patient at Marietta GA received overdose
26 July – Hamilton ONT patient severely burned , died November 1985 December 1985 – patient in Yakima Wa receives overdose 21 March Tyler TX accident 11 April – 2nd Tyler TX accident 17 January Second Yakima WA Accident RAM MOHAN EE 585 : CASE STUDY

9 Responses 3 JUNE 1985 MARIETTA GA
not recognised as overdose until after tyler incident 26 JULY HAMILTON ONT operator overdose no dose indications not suspected of overdose until patient returned suspected microswitch malfunction-fixed DECEMBER YAKIMA WA not ascribed to overdose until second incident 21 MARCH TYLER TX malfunction 54 – operator override – “electrical surge” 11 APRIL1986 TYLER TX thought to be editing error – up arrow key disabled 17 JANUARY YAKIMA WA all systems shutdown – complete investigation and rework Manufacturer, government, and user response. On February 3, 1987, after interaction with the FDA and others, including the user group, AECL announced to its customers a new software release to correct both the Tyler and Yakima software problems, a hardware single-pulse shutdown circuit, a turntable potentiometer to independently monitor turntable position, and a hardware turntable interlock circuit. RAM MOHAN EE 585 : CASE STUDY

10 Why? The turntable was in the wrong position.
Patients were receiving x-rays without beam-scattering. No hardware safety interlocks Non descriptive error messages User override able error modes Software designed by only one person RAM MOHAN EE 585 : CASE STUDY

11 Cost of the Bug To users (patients): To developers (AECL):
Four deaths, two other serious injuries. To developers (AECL): One lawsuit Settled out of court Time/money to investigate and fix the bugs To product owners (11 hospitals): System downtime RAM MOHAN EE 585 : CASE STUDY

12 Corrective Action Plan
Numerous hardware and software changes All interruptions related to dosimetry not continuable independent hardware & software shutdowns potentiometer on turntable hardware interlocks “dead man switch” motion enable Fix documentation, messages, & user manuals All interruptions related to the dosimetry system will go to a treatment suspend, not a treatment pause. Operators will not be allowed to restart the machine without reentering all parameters. A software single-pulse shutdown will be added. An independent hardware single-pulse shutdown will be added. Monitoring logic for turntable position will be improved to ensure that the turntable is in one of the three legal positions. A potentiometer will be added to the turntable. It will provide a visible signal of position that operators will use to monitor exact turntable location. Interlocking with the 270-degree bending magnet will be added to ensure that the target and beam flattener are in position if the X-ray mode is selected. Beam on will be prevented if the turntable is in the field-light or an intermediate position. Cryptic malfunction messages will be replaced with meaningful messages and highlighted dose-rate messages. Editing keys will be limited to cursor up, backspace, and return. All other keys will be inoperative. A motion-enable foot switch will be added, which the operator must hold closed during movement of certain parts of the machine to prevent unwanted motions when the operator is not in control (a type of "dead man's switch"). Twenty-three other changes to the software to improve its operation and reliability, including disabling of unused keys, changing the operation of the set and reset commands, preventing copying of the control program on site, changing the way various detected hardware faults are handled, eliminating errors in the software that were detected during the review process, adding several additional software interlocks, disallowing changing to the service mode while a treatment is in progress, and adding meaningful error messages. The known software problems associated with the Tyler and Yakima accidents will be fixed. The manuals will be fixed to reflect the changes. RAM MOHAN EE 585 : CASE STUDY

13 Lessons Learned For complex interrupt-driven software ,timing is of critical importance Not to remove standard hardware interlocks when adding computer control Revalidate reused software Not to overrely on software In a 1987 paper, Miller, director of the Division of Standards Enforcement, CDRH, wrote about the lessons learned from the Therac-25 experiences.[6] The first was the importance of safe versus "user-friendly" operator interfaces - in other words, making the machine as easy as possible to use may conflict with safety goals. The second is the importance of providing fail-safe designs: The second lesson is that for complex interrupt-driven software, timing is of critical importance. In both of these situations, operator action within very narrow time-frame windows was necessary for the accidents to occur. It is unlikely that software testing will discover all possible errors that involve operator intervention at precise time frames during software operation. These machines, for example, have been exercised for thousands of hours in the factory and in the hospitals without accident. Therefore, one must provide for prevention of catastrophic results of failures when they do occur. I, for one, will not be surprised if other software errors appear with this or other equipment in the future. RAM MOHAN EE 585 : CASE STUDY

14 References An investigation of the Therac-25 Accidents Nancy Leveson
Clark S.Turner - RAM MOHAN EE 585 : CASE STUDY


Download ppt "EE 585 : FAULT TOLERANT COMPUTING SYSTEMS B.RAM MOHAN"

Similar presentations


Ads by Google