Presentation on theme: "Vose Software Making the best use of predictive microbiology (PM) data and models in food safety risk assessment David Vose Director Vose Software www.vosesoftware.com."— Presentation transcript:
Vose Software Making the best use of predictive microbiology (PM) data and models in food safety risk assessment David Vose Director Vose Software www.vosesoftware.com firstname.lastname@example.org www.vosesoftware.com email@example.com
Vose Software www.vosesoftware.com Experienced in risk analysis, risk management and supporting decision making under uncertainty Past experience of working with clients in many different industries Expert witness services and litigation support in high profile risk related disputes RISK ANALYSIS CONSULTING General Risk Analysis Risk Analysis in Business and Engineering Risk Analysis in Health and Epidemiology RISK ANALYSIS TRAINING Developers of ModelRisk (risk analysis software tool) Developers of risk-related bespoke applications RISK SOFTWARE SOLUTIONS What we do… 2
Vose Software www.vosesoftware.com 3 Government and academia Industry
Vose Software Spoiler Risk management applied for a rambling talker Risk assessment models are too complicated Clients ask for them Risk assessors say they can do them Risk assessors aren’t trained programmers, don’t have the debugging tools Risk assessments don’t deliver Too many assumptions Too few data Too much uncertainty Results carry too many caveats Simpler, more focused analyses often possible Lab-based PM data (eg ComBase) good enough already in risk assessment context Compared with all the other uncertainties A focus change in PM could help answer, or even outright answer, many risk questions Mechanical removal Location and importance of pathogens in carcasses Help rank pathogen concentrations in food in terms of risk
Vose Software What is food safety risk assessment? The analytical component of food safety risk management Attempt to quantify the risk and uncertainty in a food safety-related problem Give managers a better understanding of the impact of the different decision options they have available Quantification of risk (e.g. there is a 1% chance of X occurring) is potentially much more useful than saying “the risk of X is very low” Based on mathematical models A simplified representation of how the system is assumed to behave both now and after any interventions under consideration Simplified implies that our probability values are approximate Assumed implies that the numbers generated would only be true if the assumptions all turned out to be correct The more numerous and tentative the assumptions are, the less useful the numerical results will be Components of uncertainty – which we should try to minimise Assumptions Randomness Imprecise statistical inference from data Bad data
Vose Software Designing a risk assessment Should be a creative process Figure out the real problem with managers Find the quantitative information that managers will be able to use Do we need a model? Will a simple data analysis suffice? Should be pragmatic Focused on answering the most important questions Based on available data Adheres to constraints Achievable within a budget, timeframe Understandable, adaptable, auditable Should be believable – the hardest part Often so many assumptions Loads of maths few people can understand Experts tend to be defensive, stick to what they know/believe Models deal with national level issues, whilst data almost never have the same coverage
Vose Software From: Draft report 2001 Institute of Food Safety and Toxicology Division of Microbiological Safety Danish Veterinary and Food Administration Example of Farm-to-Fork model Campylobacter in poultry From a PM viewpoint, a much ‘simpler’ problem than usual since there are no growth or reservoir considerations outside the host animal for Campylobacter But we still have a lot of variation to consider: Between farms Between slaughter plants Between CP strains Between food products and their preparation Between consumer handling Between consumer vulnerabilities
Vose Software But the problem is more complicated … Campylobacter in poultry Source of exposure? Could be: Poultry Cattle (meat, milk) Sheep (meat, milk) Goats (meat, milk) Pigs Ducks Wild birds Dogs, Cats (from meat?) And their faeces in: Lakes Streams Vegetables Mud Fertilizer And in some countries: Poultry litter fed to cattle How many people get ill? “the true number of cases of illness is likely to be 10-100 times higher than the reported number” EFSA In summary: A lot of uncertainty about the cause and pathway, and even more about how many people get ill. Makes it difficult to calibrate the model. “[P]reparation and consumption of broiler meat may account for 20% to 30% of human cases of campylobacteriosis, while 50% to 80% may be attributed to the chicken reservoir as a whole.” EFSA
Vose Software Actually, it’s even more complicated … Campylobacter in poultry With all this uncertainty, are fancy models justified? I think we have to look at another approach From a limited data set, young adults, in water, 90% confident #CP to give 50% probability of illness is [1,~50000] Consumer behaviour Dose response Risk estimation Mostly a black box
Vose Software Big model example Salmonella in pigs Modelled: Farm-to-consumption of pigs Accounts for variability between and within Member States Very large model: Three groups involved, experienced risk analysts 100,000 lines of code in Matlab 150 parameters for each Member State + generic parameters An estimated 900-1000 parameters in total Checking: “[E]very effort was made in order to minimise the risk of … errors occurring and a long process of review was carried out” Reached model version 27 “The validation of the intervention analysis is particularly difficult as there are no validation data with which to compare the model results. In addition, with such a complex and nonlinear model, it is only really possible to assess whether the resulting trend is reasonable, rather than the absolute reduction” i.e. they had no way to check the numbers that came out
Vose Software Big model example Salmonella in pigs Struggling with data: Didn’t use the EFSA baseline survey data as required (not possible with simulation anyway) Used data from other countries Large farm/small farm management from one MS Used expert estimates to fill in gaps Used other bacteria for increase in bacteria during polishing Used chicken data for transfer during belly opening Small slaughterhouse parameters estimated from one Dutch slaughterhouse Don’t have representative machinery data for slaughter plant so “variability and uncertainty … is expected to be much larger” Meat production selection (cuts, minced, fermented) not representative No sensitivity analysis for the dose-response model Data on transport between farms and to slaughter are scarce Need data on attachment/removal of bacteria to/from surfaces Assumes Salmonella acts like E.coli in the scalding stage Used D-value (10 fold reduction time) from chicken Used transfer steel-surfaces to sponges and roasted chicken as surrogate for pig to knife Assumes even distribution of bacteria all over carcass Time and temperature from retail to home missing Assumed same human susceptibility for all MSs Dose-response data not representative for young, old, pregnant, immunocompromised, and data from much higher doses than modelled Ignored trade between MSs … Conclusion: “There are data gaps and critical assumptions of the model, and these should be considered when interpreting the results of the model. “ How?
Vose Software Big model example Salmonella in pigs Response to quantitative questions: TORProvidedBIOHAZ answer to European Commission 1Yes“Guesstimate” 2YesAn n-fold reduction in prevalence produces an n-fold reduction in illnesses 3Yes“Theoretically, according to the QMRA following scenarios appear possible” and then some fairly hard numbers 4NoDescriptive 5NoDescriptive 6Yes2 log (99%) reduction in carcass load “sufficient to reduce cases by over 90%” 7Yes90% reduction in herd prevalence “could theoretically results in a reduction in an order of magnitude of two thirds of … lymph node prevalence” 8YesSee 7. 9No 10No
Vose Software Big model example Salmonella in pigs The error: For MS #4, consumer travel time was modelled in hours not minutes (60x too big) Salmonella case estimated as 29,901, corrected after to 2,686 Unfortunately, first adopted report used MS #4 as representative MS Conclusion: “The Scientific Opinion (EFSA, 2010b) focused on the intervention analysis. Therefore the conclusions of the Scientific Opinion are unaffected by this error.” “although the quantitative conclusions of the intervention analysis do change the qualitative conclusions regarding the effect of interventions do not change, as the relative reductions are similar to those presented in the original report” So did we need the model? Typical coding error rates: “Mistakes are probably inevitable in a model of this complexity” They report 0.01 errors/kLOC (thousand lines of code) which is very, very low Microsoft: 0.5 /kLOC on release Industry average: 10 / kLOC Vose Software: 1.2 / kLOC Clean room: 0.1 / kLOC Space shuttle: 0 in 500 kLOC (so they were close to NASA)
Vose Software Why big models tend to fail More errors Simulation models are stochastic: We can’t easily check the numbers being produced Big models have more variables: Which means greater data needs, so scratch around for data, less chance of being kept up-to-date More assumptions, so hard to know how realistic the model is Simpler models may seem less ‘realistic’, but at least we know it Few people are competent to provide an external check: Internal checks have a very poor success rate Better to start differently: What can we say without a model, or a very simple one How complete are the data What are the uncertainties
Vose Software What do we usually (not) know? We have some idea of pathogen prevalence Maybe at the farm Usually at the slaughter plant (pre-processing) Some idea of load Some samples of skin, occasionally an organ Maybe enumerated for individuals, maybe for pooled samples Maybe whole carcass rinses Often just presence/absence Almost always at the slaughter plant Maybe some idea of strain But it’s quite rare to have enumeration by strain, just presence/absence Often some idea of the dose-response relationship But not very statistically accurate In summary Focus on simpler models Get better information from the data we habitually collect
Vose Software Consider this problem Chicken neck skin samples Procedure Ukmeat.org (based on (EC) No 2073/200) Collect samples from carcasses after they have been chilled for at least 1.5 hours Select a bird with a long neck skin for sampling (green arrows) Grab the neck skin through the bag (photo) and cut at least 10g (photo) Collect 2 more samples in the same way to make 3 in total inside the bag “A bag containing 3 skins and a combined weight of more than 30g (roughly 1 oz) is classed as a single sample.” Salmonella test results are reported as either positively detected or absent It’s a HACCP plan, doesn’t give us much load information for food safety risk assessment. Things a risk analyst would love to know How many cfus on the carcass Where are they located Does the location affect survivability and probability of exposing What are the attenuation rates for different process by location on the carcass
Vose Software Consider this problem Red meat carcass samples Procedure Ukmeat.org ( (EC) No 2073/200) A sponge sample must be taken and tested for Salmonella. The sponge should have an area of at least 50cm 2. The width of the sponge should be no larger than 10cm. Wet the sponge (photo), massage inside bag, grasp sponge through bag (photo) Swab carcass post inspection, prior to chilling, following pattern (photos A: cattle; B: sheep; C: pig) Weekly, 5 carcasses / session / species Salmonella test results are reported as either positively detected or absent. Same problem: HACCP based, little load information Some research says you get 20% of the load acquired with incision.
Vose Software Moment-based modelling A work in progress … Lets us anchor to the data where we have, e.g. Prevalence at farm Load and prevalence at chiller Estimated people getting sick Then we use PM data to fill in the gaps Change in prevalence Change in log load Broiler house Transport Slaughter house Hanging Scalding Defeathering Evisceration Washing Chilling Chicken parts Whole chickens Chilled Frozen Catering Cross-contamination Heat treatment Retail Consumer Cross-contamination Heat treatment Dose response Risk estimation
Vose Software Moment-based modelling A work in progress … Collected data tend to be at the slaughter plant It’s a communal point, regulated, can be consistent But a lot has happened before this stage that could be controlled Farm (fly nets, biosecurity, feed, etc), transport, cross- contamination during slaughter, mechanical and chemical removal Log load change data are often not Normally distributed So shape is important (e.g. skewness, kurtosis) This makes it impossible to ‘back-calculate’ loads at previous stages in the process using Monte Carlo Which means we have trouble estimating the effects of interventions Possible solution is moment-based estimates Probability maths let’s us estimate moments (mean, variance, skewness, kurtosis) even if we cannot know the distributional form How PM can help For log load changes, provide at least the first three the moments (AVERAGE, VAR, SKEW, maybe KURT in Excel) for your raw data – or, better still, make the raw data available For prevalence changes, provide s/n before and after
Vose Software Source attribution model Developed from Hald et al i = serovar index j = food type index k = consuming country index = producing country M jka be the amount of a particular food type j that is consumed in country k but originates from country α p j i is the prevalence of infection/contamination of serovar i in food type i coming from country a j relates to the general way the food type is handled (stored, cook) and can be country-specific q i relates to the serovar. A relative global measure of the serovar’s ability to survive, grow and cause infection. It would be great to be able to pin these down better, e.g. looking at relative rates of growth and toxin production averaged over the naturally occurring range of conditions found in the food products. Tries to determine which food source causes infections Matches data on prevalence in food types by serovar With data on human illness rates by serovar Good for Salmonella, not Campylobacter (insufficient typing ability) 240 lines of code Hald, T., Vose, D., Wegener, H.C., Koupeev, T., 2004. A Bayesian approach to quantify the contribution of animal-food sources to human salmonellosis. Risk Anal.24, 255-269.