Presentation on theme: "Computational Methods for Finding Patterns of Human and System ‘Failure’ in Mishap Reports Chris Johnson University of Glasgow, Scotland."— Presentation transcript:
Computational Methods for Finding Patterns of Human and System ‘Failure’ in Mishap Reports Chris Johnson University of Glasgow, Scotland. UCD: 12 th December 2003
Johnson, Le Galo and Blaize; European Incident Reporting Requirements in Air Traffic Management, EUROCONTROL, 2000.
NASA safety managers complain that the Web Program Compliance Assurance and Status System is too cumbersome. Personnel use Lessons Learned Information System only on an ad hoc basis. Hazard reports rarely communicated effectively, nor are databases used by engineers and managers capable of translating operational experiences into effective risk management practices. (CAIB, p.189) “Centers and contractors used Problem Reporting and Corrective Action database differently, preventing comparisons across the database.
Probabilistic information retrieval: Avoids problem of codification; But issues of precision and recall. Conversational case based reasoning: Extended form of US Navy’s NACODAE system; Flexible precision & recall. Word sense disambiguation etc.
FAA GAIN lacks computational support. Someone must address this opportunity… Meta-Level Concerns for Aerospace
Linda, JavaSpaces and Middleware for Incident Reporting UK US Australia … … … Concurrency and distribution
UK US Australia … … … Overloading of matching operators Linda, JavaSpaces and Middleware for Incident Reporting
UK US Australia … … … Leases and persistence Linda, JavaSpaces and Middleware for Incident Reporting
So does the software say something new and useful?
Look, I’m not blaming you, I’m just suing you… Medical errors lead to: 45, ,000 deaths (US). RTA=43,000, Aids=16,000. Additional care $15 billion: –45% have some mishap. –17% prolonged hospital stay. Case Study 1: FDA Telemedicine
Courtesy: Univ. of Virginia, Office of Telemedicine SE Virginia medical centres: 1 nurse monitors system; 49 remote patients; 5 ICUs at 3 centres. Staff 50-80% of ICU budget. Courtesy: NASA Telemedicine Instrumentation Pack project
A: MDR Report Identifier B: Event Information E: Professional information F: Distributor Information G: Manufacturer Information H: Device Information MDR Report KeyMDR Event KeyReport Number Source CodeNumber of devices Date receivedNumber of patients Master Event Data File, Section A: MDR Report Identifier MDR Report KeyManufacturer’s Name Master Event Data File, Section G: Manufacturer Information Manufacturer’s Address Source Type Date Manufacturer received report MDR Report KeyMade when? Master Event Data File, Section H: Device Information Single use device? Remedial ActionUse codeCorrection number Event type Master Event Data File Format Identifier MDR Report KeyDevice Event Key Device Data File Device Seq. Number Device available for examination? Brand Name Generic NameAge?… MDR Report KeyPatient Seq. Number Patient Data File Date report received Sequence and treatment Patient Outcome MDR Report KeyText key Text Data File Text type Patient Seq. number Report date Text
Findings from MAUDE: Safety Culture and Telemedical Mishaps Introduction of telemedicine implies: –less clinical staff more technical staff; –technical staff don’t understand devices/procedures? Increasing reliance on vendor’s guidance: –vendors in turn rely on manufacturers; –communication often breaks down or is too slow. No common ‘safety culture’; –many incidents stem from poor communication; –Strong parallels with NASA (CAIB Chapter 7).
Cluster 1: Configuration EASI TM software provides 12-lead ECG data on 5-leads to patient. TECH NOTED EASI 12-LEAD DISPLAY ON CENTRAL STATION FROM TRANSMITTER THAT WASNT EASI CAPABLE. CUSTOMER REPLACED TRANSMITTER, RELOADED CENTRAL STATION SOFTWARE, CONFIRMED ALL SIGNALS WERE CORRECTLY TRANSMITTED AND LABELED. CUSTOMER DID NOT UNDERSTAND DIFFERENCE BETWEEN STANDARD ECG AND EASI. CUSTOMER WAS RETRAINED TO FURTHER THEIR UNDERSTANDING OF DIFFERENCE. (MDR TEXT KEY: ) Less electrodes reduce work for nurses, improves patient comfort.
Social implications: clinicians and support rely on suppliers’ explanations. Symptomatic of system safety problems: – manufacturers gain insights that should be caught earlier in development. Retraining is proposed, no idea of systemic causes of human ‘error’? DURING INVESTIGATION, ENGINEERS CONFIGURED A SYSTEM IN SAME SETUP AS CUSTOMER. FOUND MAINFRAME RECEIVERS CAN RECEIVE INCORRECT BIT TO MISIDENTIFY TRANSMITTER AS EASI CAPABLE … Report doesn’t state how to prevent mis-configuration. Cluster 1: Configuration
Cluster 2: Sub-contractors End-user frustrated by device unreliability and manufacturers’ response: SEVERAL UNITS RETURNED FOR REPAIR HAD FAN UPGRADES TO ALLEVIATE TEMP PROBLEMS. HOWEVER, THEY FAILED IN USE AGAIN AND WERE RETURNED FOR REPAIR … AGAIN SALESMAN STATED ITS NOT A THERMAL PROBLEM ITS A PROBLEM WITH X ’ s Circuit Board. X ENGINEER STATED Device HAS ALWAYS BEEN HOT INSIDE, RUNNING AT 68⁰C AND THEIR product ONLY RATED AT 70⁰C …. ANOTHER TRANSPONDER STARTED TO BURN … SENT FOR REPAIR. SHORTLY AFTER MONITOR BEGAN RESETTING FOR NO REASON … (MDR TEXT KEY: ) Manufacturers felt reports not safety-related: –“reports relate to end-user frustration regarding product reliability (not safety)”.
Telemedicine applications developed by groups of suppliers: –flexibility and cost savings during development, manufacture, marketing; –problems if incidents stem from sub-components not manufactured by suppliers; –incident reports must be propagated back along the supply chain. Manufacturer states problems stem from subcontractors circuit board: –more problems after faulty board replaced, customer returns unit again; –connectors to PCB not properly seated but still passes acceptance test? –connector not seated completely during initial repair and gradually loosens over time? Cluster 2: Subcontractors
“Fly-fix-fly” approach undermines attempts to improve patient safety. Confused dialogue between clinician, vendor, manufacturer… –End-user may see technical issues as form of excuse (eg PCB connectors)… Device repairs not only rectify problems, they introduce new ones: –compounds end-user uncertainty and distrust of device reliability; –communication fails and shared safety culture erodes over time. Cluster 2: Subcontractors
Cluster 3: Modification Induced Bugs IN SOFTWARE RELEASE VF2, IF PATIENT IN "AUTOADMIT" MODE, PARAMETER DATA AUTOMATICALLY COLLECTED AND STORED IN THE SYSTEMS DATABASE, IF THE PATIENT LATER REMOVED (BUT NOT DISCHARGED) FROM ORIGINAL BED/NETWORK LOCATION, DATA COLLECTION TEMPORARILY DEACTIVATED (EG DURING MOVE FOR TREATMENT). PROBLEM OCCURS WHEN NEW PATIENT ADMITTED TO SAME BED/NETWORK LOCATION BUT ORIGINAL PATIENT NOT DISCHARGED WHILE CONNECTED TO THAT LOCATION. NEW PATIENT ADMISSION STORES DATA IN DATABASE CORRECTLY. HOWEVER, IN PARALLEL, INCORRECTLY APPENDS NEW PATIENT DATA ON TOP OF OLD PATIENT'S RECORD … (MDR TEXT KEY: )
Safety Culture and Telemedical Mishaps Software identifies 40-50% more US telemedical mishaps in 6 months. Analysis of reports suggests no ‘quick fixes’ but: –Regulators need to focus on dialogue between manufacturers and users; –Consider detailed training requirements for telemedicine before approval; –Especially look at end-user maintenance and configuration issues; –Introduce training in safety and risk management for support staff? Joint US/UK AHRQ presentation in Washington. –Things are only going to get worse…
Da Vinci, 1st robotic aid approved by the FDA: New York Presbyterian Hospital uses it on atrial septal defects.
Case Study 2: Inter-Industry Comparisons
Cluster 1: Programming Errors Pilot didnt check 1st Officer programming FMC. “ATC informed us we were off course... it took minutes to figure out what happened. ATC vectored us back onto departure and gave us a climb clearance. ATC also pointed out traffic, but we never saw it. We arent sure if our error caused a conflict. First Officer programmed FMC. I checked the Route Page to see if it matched our clearance. It showed correct departure and transition. I did not check Legs Pages to see if all fixes were there. I will next time! We made an error programming the FMC, then became complacent… I should have done a more complete check of the First Officer's programming”
Computer flight plan was route ABC. ATC clearance was via route D-E-F. Original flight plan should have been destroyed, so as not to accidentally revert to old route. First Officer very experienced and I had complete trust that he was capable of loading correct waypoints, but both he and I failed to use a visible method of marking the computer flight plan. 99% of time, cleared route is same as computer flight plan, but not always, as I found out the hard way. ATC caught my error”. Cluster 1: Programming Errors
Container ship grounds, same route every week. 4 deck officers, good visibility, 2 radars and GPS. Charts had courses in black ink, couldnt be erased. At 0243 altered course to 237°, position plotted. 45 minutes later, ship grounds at full speed. Watch officer set auto steering to wrong course. 237 next to reciprocal 157 for return voyage. Cluster 1: Programming Errors
During the descent, we were doing some HF radio checks, and forgot to arm the altitude select mode on the flight director. As a result, we descended through our altitude.... We promptly returned to FL280. As a crew, we are very diligent and disciplined about altitude assignments. But in this case, because our attention was diverted from the task at hand, we flew through our assigned altitude. It was that classic trap: both crew members distracted by something and nobody flying the airplane. Cluster 2: Warnings as Safety Nets
3 on fishing vessel, 2 cook, pump bilges, maintain watch. Skipper asleep on the deck of the wheelhouse. Vessel’s planned track 0.35 miles from a rig. Automated radar alarm system set to 0.3 miles. VHF off; skipper said too much distracting traffic. Rig ask stand-by safety vessel for help, alongside boat. Nobody on bridge or deck even after sounding horns. ‘Abandon platform stations’ as precautionary measure. Skipper protests on being wakened, “under control”. Radar warning system is a safety net or final safeguard. Cluster 2: Warnings as Safety Nets
Conclusions Must make better use of lessons learned systems. Use Tuple Space and IR to search for key issues: –distributed and persistent architectures for retrieval; –avoids need for standardised formats; –can be used within and between industries. Caveats: –does it tell us anything new? –how valid are inter-industry comparisons? –how do we get from clusters to recommendations?