Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.

Similar presentations


Presentation on theme: "On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup."— Presentation transcript:

1 On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup Lee, Oleg Sokolsky, Lori Clarke, Lee Osterweil University of Pennsylvania Loyola University Maryland Columbia University University of Massachusetts Amherst

2 2 / 27 Overview Simulation software is used widely in the field of health care Simulators must not only accurately model the real world, but be free of software defects as well It is particularly hard to test simulation software because often there is no “test oracle” Our research shows that it is possible to detect defects if properties of the software are violated

3 3 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

4 4 / 27 Flow of Patients through ED Raunak et al., “Simulating patient flow through an emergency department using process-driven discrete event simulation”, SEHC’09

5 5 / 27 Glycemic Control (Insulin Pump) King et al., “Prototyping closed loop physiologic control with the Medical Device Coordination Framework”, SEHC’10

6 6 / 27 Problem Statement Partial oracles may exist for a limited subset of the input domain in simulation software Obvious errors (e.g., crashes) can be detected with certain inputs or testing techniques However, it is difficult to detect subtle computational defects in simulators without test oracles in the general case

7 7 / 27 What do I mean by “defect”? Deviation of the implementation from the specification Violation of a sound property of the software “Discrete localized” calculation errors  Off-by-one  Incorrect sentinel values for loops  Wrong comparison or mathematical operator Misinterpretation of specification  Parts of input domain not handled  Incorrect assumptions made about input

8 8 / 27 Research Goals Identify an approach for testing simulation software that is effective even without a test oracle  Reliably detect defects  Increase confidence that the software works Demonstrate feasibility of the approach Measure the effectiveness of the approach

9 9 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

10 10 / 27 Observation Many programs without oracles have properties such that certain changes to the input yield predictable changes to the output We can detect defects in these programs by looking for any violations of these “metamorphic properties” This is known as “metamorphic testing”  T.Y. Chen et al., HKUST Tech Report, 1998

11 11 / 27 Metamorphic Testing If new test case output f(t(x)) is as expected, it is not necessarily correct However, if f(t(x)) is not as expected, either f(x) or f(t(x)) – or both! – is wrong x f f(x) Initial test case t(x) f f(t(x)) New test case t f(x) and f(t(x)) are “pseudo-oracles” Transformation function based on metamorphic properties of f

12 12 / 27 Metamorphic Testing Example Consider a function to determine the standard deviation of a set of numbers abcdef Initial input cebafd New test case #1 2a2b2c2d2e2f New test case #3 s std_dev s ? 2s ? std_dev s ? New test case #2 a+2b+2c+2d+2e+2f+2

13 13 / 27 Related Work Verification of simulation models  O. Balci, 1997 Winter Simulation Conf.  R. Sargent, 2005 Winter Simulation Conf. Applying metamorphic testing to applications without test oracles  T.Y. Chen et al., Info. and Soft. Tech., 2002

14 14 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

15 15 / 27 Feasibility Study Goal: Demonstrate that metamorphic testing is feasible for testing simulation software We first identify metamorphic properties in the applications of interest  JSim: discrete event simulator (patients in ED)  GCS: glycemic control simulator (insulin pump) We then apply metamorphic testing and look for defects

16 16 / 27 Metamorphic Properties JSim: Flow of patients through ED  Increasing number of resources (e.g., beds) should not increase average patient length of stay  Increasing number of resources should not decrease other resources’ utilization rates  Multiplying the time necessary for each step by a positive constant c should increase the overall time by c GCS: glycemic control system (insulin pump)  A patient who weighs more should get more insulin  A patient who produces more endogenous glucose should get more insulin  The modeled insulin absorption rate should vary inversely with the insulin distribution volume

17 17 / 27 JSim Findings

18 18 / 27 Unexpected JSim Findings IDArrival Time Departure Time Length of Stay 12159157 28185177 314197183 420295275 526321295 217.4 IDArrival Time Departure Time Length of Stay 12159157 28185177 314194180 420312292 526321295 220.2 Average LOS with 1 nurse Average LOS with 2 nurses

19 19 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

20 20 / 27 Measuring Effectiveness Goal: Estimate the effectiveness of metamorphic testing at detecting defects in simulators We first systematically seed the software with defects We then measure the number that are detected

21 21 / 27 Methodology Mutation testing was used to seed defects into each application  Reverse comparison operators  Change math operators  Introduce off-by-one errors For each program, we created multiple versions, each with exactly one mutation We ignored mutants that yielded outputs that were obviously wrong, caused crashes, etc. Effectiveness is determined by measuring what percentage of the mutants were “killed”

22 22 / 27 Results ApplicationJSim GCS Control GCS Patient Mutants generated104306644 Usable mutants25237487 Mutants detected2558333 Effectiveness100%24.4%68.4%

23 23 / 27 Analysis: JSim “Statistical metamorphic testing” useful for killing mutants related to non-deterministic event timing If timing range is [A, B] and observed mean is μ, then mean μ’ for range [10A, 10B] should be around 10μ Because of mutant, range is actually [A, B-1] Over many executions, observed mean μ’ has statistically significant difference from expected mean 10μ

24 24 / 27 Analysis: GCS Metamorphic testing not as effective in control algorithm (rules for delivering insulin) Rules are usually of the form “if patient blood sugar is x then adjust infusion rate by y” Single mutants did not have much effect on overall insulin delivered These may be detected by more “straightforward” software testing approaches

25 25 / 27 Outline Motivating examples Overview of testing approach Study #1: Demonstrating feasibility Study #2: Measuring effectiveness Future work & conclusion

26 26 / 27 Future Work Formalizing the process of identifying metamorphic properties for simulators Consider the use of metamorphic testing for validation  If a property is violated, does that mean there is a defect, or is the property simply unsound?  If the property is unsound, is this simulator appropriate for the task it is meant to model?

27 27 / 27 Conclusion We have demonstrated that metamorphic testing is an effective technique for testing simulation software It can increase confidence in the implementation It also helps increase understanding of how the software behaves

28 On Effective Testing of Health Care Simulation Software Christian Murphy, University of Pennsylvania cdmurphy@cis.upenn.edu M.S. Raunak, Loyola University Maryland raunak@loyola.edu


Download ppt "On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup."

Similar presentations


Ads by Google