Presentation is loading. Please wait.

Presentation is loading. Please wait.

Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013.

Similar presentations


Presentation on theme: "Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013."— Presentation transcript:

1 Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013

2 The Atomic Age World War II ushered in the atomic age The start of the nuclear arms race In many countries… The question was how to harness this power for peaceful purposes 2

3 In Canada: AECL Atomic Energy of Canada Limited is a “Crown Corporation” Designed and implemented a Heavy Water nuclear reactor The CANDU system It also included AECL-Medical Harnessing the atom for medical reasons 3

4 AECL & CGR – Medical Accelerator Technology AECL-Medical and the French company: la Compagnie Générale de Radiologie (CGR) Worked together during the 1970s on using linear accelerators for radio-therapy High energy, low dose, Electron beams, or A stream of photons in the X-Ray spectrum The two companies’ partnership produced The 6 MeV, X-Ray only “Therac-6” The dual mode, 20 MeV “Therac-20” 4

5 Therac-6 & Therac-20 Stand-alone electro-mechanical units Operator could Set all settings manually Position beam devices manually Once everything was set, and system was “safe” – deliver the dose The system had an optional computer that allowed a simpler UI A Digital Equipment PDP-11 32 kilobytes of memory All assembly code 5

6 True Innovation: the Therac-25 AECL only – CGR partnership had dissolved Used a Double-Pass accelerator Halved the space that the Therac-6 & Therac-20 had occupied Made the computer the primary controller No stand-alone manual mode Shipped in 1983 Still used a DEC PDP-11 6

7 It was the best on the market… Except… It seriously injured 6 patients between 1985 and 1987 Killing 3 of those patients All because of software 7

8 Hubris When an engineer graduates in Canada, he/she attends The Ritual Calling of an Engineer And gets an Iron Ring Rudyard Kipling wrote the ceremony Instills a sense of professionalism And humility 8

9 Supreme Faith in Software It appears that this device had rigorous safety engineering on the hardware side Complete hazard analysis – fault tree On the software side, the likelihood of error was described in insanely low terms Fault probabilities on the order of 10 -9 and 10 -11 “Software does not degrade due to wear, fatigue or the reproduction process” They had no expectation that a bug could cause a problem 9

10 Malfunction 54 When there was a problem, the UI displayed the word “Malfunction” followed by a number 1-64 There was NO documentation of what these codes were in the user manual An internal AECL service manual described #54 as “dose input 2” and pointed out that this error code was only there for internal diagnostic reasons Under normal conditions, an operator might see as many as 40 malfunction codes in a day But Malfunction 54 was very rare They were easily dismissed by pressing [P] (for “Proceed”) 10

11 Electron Mode vs. X-Ray Mode In Electron Mode a low power beam is scanned across the patient In X-Ray mode a high power beam is aimed at a target, producing X-Rays, which then irradiate the patient The electron scanning mechanism and X-Ray target were mounted on a turntable The position was controlled by the computer 11

12 Usability User interface was a VT-100 Green Screen Contained the Prescription Entered by the operator Originally – on error, prescription had to be re- entered Usability studies changed this, near the end of the dev cycle Introduced a major error 12 PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITOR UNITS 50 50 200 TIME (MIN) 0.27 1.00 GANTRY ROTATION (DEG) 0.0 0 VERIFIED COLLIMATOR ROTATION (DEG) 359.2 359 VERIFIED COLLIMATOR X (CM) 14.2 14.3 VERIFIED COLLIMATOR Y (CM) 27.2 27.3 VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777 OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND:

13 A Race Condition – UI & Operations Threads In the Therac-25, the prescription information was entered The Electron/X-Ray mode Then a command to execute If the operator Entered an X-Ray command in error Re-edited the page and changed it to Electron Then executed the dose, all within 8 seconds Then the patient was given an X-Ray dose directly through the Electron turntable element 13 PATIENT NAME : JOHN DOE TREATMENT MODE : FIX BEAM TYPE: X ENERGY (MeV): 25 ACTUAL PRESCRIBED UNIT RATE/MINUTE 0 200 MONITOR UNITS 50 50 200 TIME (MIN) 0.27 1.00 GANTRY ROTATION (DEG) 0.0 0 VERIFIED COLLIMATOR ROTATION (DEG) 359.2 359 VERIFIED COLLIMATOR X (CM) 14.2 14.3 VERIFIED COLLIMATOR Y (CM) 27.2 27.3 VERIFIED WEDGE NUMBER 1 1 VERIFIED ACCESSORY NUMBER 0 0 VERIFIED DATE : 84-OCT-26 SYSTEM : BEAM READY OP.MODE: TREAT AUTO TIME : 12:55. 8 TREAT : TREAT PAUSE X-RAY 173777 OPR ID : T25VO2-RO3 REASON : OPERATOR COMMAND: Malfunction 54

14 Why Have One Deadly Bug? A second deadly bug was eventually found in the Therac-25 The system periodically tested if everything is positioned properly, setting a variable with the result of the test A zero indicated OK Instead of simply setting the value to 1 or 0, the program incremented the value And, the variable was a byte The result was that every 256 tests of the positioning, the system would falsely indicate that everything was ready to proceed. 14

15 Noteworthy: The Users Found the Bugs It’s worth noting that AECL’s reaction to the problems initially was denial Eventually, the got to the stage where they did piecemeal fixes Without the efforts of the staff at the East Texas Cancer Center in Tyler, AECL might never have acknowledged the first bug After two accidents – with the same operator – they spent time trying to recreate the race condition After the Therac-25, the FDA changed the way it evaluated software (and software engineering) in medical devices. 15

16 The Scorecard Total AccidentsDeaths Malfunction 54 Race Condition 32 Incorrect Increment Logic 31* Total63 16  One patient died of cancer, but would have died of radiation poisoning in a few weeks had the cancer not killed him

17 Not the Bugs – The Software Engineering All software systems have bugs Even Knuth hands out the occasional $2.56 check AECL coalesced their entire operator interface, control system and safety system into one program They apparently had very little in the way of formal requirements gathering, design or development standards All of the software was developed by one programmer Their reaction to the problems was to fix them one at a time 17

18 Software Reuse The Therac-20 reused some of the software from the Therac-6 The Therac-25 reused software from both of the previous models But The earlier models had hardware interlocks to prevent over-dosing The desire to reuse previous software resulted in a Home-made real-time operating system On an expensive, 10 year old computer system Running a program written entirely in assembly language That relied on global variables for inter-task communication – without synchronization 18

19 No Requirement to Separate Layers AECL architected the Therac-25’s software into a single point of failure This was far from accepted practice in the early 1980s Safety systems were migrating from hardware to software But… they were usually separate, simpler systems – e.g. PLCs By the early 80s, there were usually three distinct layers Safety and integrity Control and positioning Operator interface and supervisory 19

20 Testability – Auditing AECL’s task architecture and real time OS made adequate testing nearly impossible Look at the deadly errors – neither is discoverable through testing No auditing of operations, or failures was included in the system After all the issues with the Therac-25, a check was done on the Therac-20 system and the same bugs were found But, because that system had mechanical interlocks, no injuries resulted 20

21 References “Medical Devices – The Therac-25”, Levenson, Nancy. http://sunnyday.mit.edu/papers/therac.pdf http://sunnyday.mit.edu/papers/therac.pdf “An Investigation of the Therac-25 Accidents”, Levenson, Nancy and Turner, Clark S., IEEE Computer, Vol. 26, No. 7, July 1993, pp. 18-41 http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html “Fatal Dose - Radiation Deaths linked to AECL Computer Errors”, Rose, Barbara Wade, Saturday Night (magazine), June, 1994 http://www.ccnr.org/fatal_dose.html http://www.ccnr.org/fatal_dose.html “Safety-Critical Computing: Hazards, Practices, Standards, and Regulation”, Jacky, Jonathan, http://staff.washington.edu/jon/pubs/safety-critical.html http://staff.washington.edu/jon/pubs/safety-critical.html “Therac-25”, Wikipedia http://en.wikipedia.org/wiki/Therac-25 http://en.wikipedia.org/wiki/Therac-25 “PDP-11”, Wikipedia http://en.wikipedia.org/wiki/PDP-11 http://en.wikipedia.org/wiki/PDP-11 “PDP-11 architecture”, Wikipedia http://en.wikipedia.org/wiki/PDP-11_architecture http://en.wikipedia.org/wiki/PDP-11_architecture 21


Download ppt "Death by Software The Therac-25 Radio-Therapy Device Brian MacKay ESE6361 - Requirements Engineering – Fall 2013."

Similar presentations


Ads by Google