Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1.

Slides:



Advertisements
Similar presentations
2005 Syndromic Surveillance1 Estimating the Expected Warning Time of Outbreak- Detection Algorithms Yanna Shen, Weng-Keen Wong, Gregory F. Cooper RODS.
Advertisements

 2005 Carnegie Mellon University A Bayesian Scan Statistic for Spatial Cluster Detection Daniel B. Neill 1 Andrew W. Moore 1 Gregory F. Cooper 2 1 Carnegie.
1 A Tutorial on Bayesian Networks Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon State University.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Optimizing Disease Outbreak Detection Methods Using Reinforcement Learning Masoumeh Izadi Clinical & Health Informatics Research Group Faculty of Medicine,
Bayesian Biosurveillance Gregory F. Cooper Center for Biomedical Informatics University of Pittsburgh The research described in this.
Introduction to Risk Factors & Measures of Effect Meg McCarron, CDC.
Decision Theoretic Analysis of Improving Epidemic Detection Izadi, M. Buckeridge, D. AMIA 2007,Symposium Proceedings 2007.
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,
Chapter Seventeen HYPOTHESIS TESTING
What’s Strange About Recent Events (WSARE) v3.0: Adjusting for a Changing Baseline Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon.
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling,
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
The Space-Time Scan Statistic for Multiple Data Streams
Summarization and Deviation Detection -- What is new?
Improvements in the Spatial and Temporal representation of the Model Owen Woodberry Bachelor of Computer Science, Honours.
Model N : The total number of patients in an anthrax outbreak who are seen by clinicians. DT : The time to detect the anthrax outbreak Detection : The.
Bayesian Biosurveillance Using Causal Networks Greg Cooper RODS Laboratory and the Laboratory for Causal Modeling and Discovery Center for Biomedical Informatics.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Conclusions On our large scale anthrax attack simulations, being able to infer the work zip appears to improve detection time over just using the home.
Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University.
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
seminar on Intrusion detection system
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Methods for Real-Time Detection and Assessment of Disease Outbreaks Using Information Technology Michael Wagner, M.D., Ph.D. Director, Real-Time Outbreak.
1 A Tutorial on Bayesian Networks Modified by Paul Anderson from slides by Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
On Anomalous Hot Spot Discovery in Graph Streams
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reasoning with Bayesian Networks. Overview Bayesian Belief Networks (BBNs) can reason with networks of propositions and associated probabilities Useful.
SPONSOR JAMES C. BENNEYAN DEVELOPMENT OF A PRESCRIPTION DRUG SURVEILLANCE SYSTEM TEAM MEMBERS Jeffrey Mason Dan Mitus Jenna Eickhoff Benjamin Harris.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
CMU July 29, 2006 Pittsburgh, PA, USA Machine Learning Algorithms for Surveillance and Event Detection Denver Dash – Intel, Corp. Terran Lane – University.
Additional Data For Harmonized Use Case for Biosurveillance HINF 5430 Final Project By Maria Metty, Priyaranjan Tokachichu &Resty Namata December 13, 2007.
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 ESSENCE: Biosurveillance in Support of the DoD Health Mission.
What’s Strange About Recent Events (WSARE) Weng-Keen Wong (University of Pittsburgh) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University.
Digital Statisticians INST 4200 David J Stucki Spring 2015.
Cluster Detection Comparison in Syndromic Surveillance MGIS Capstone Project Proposal Tuesday, July 8 th, 2008.
A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif Ghafoor, David Ebert, Madiha Sahar Ross Maciejewski,
Successful Alerts and Responses: Real-time Monitoring of ED Chief Complaints and Investigation of Anomalies CDC Public Health Preparedness Conference February.
Queen’s University Public Health Informatics (QPHI) Team Occupational Health Surveillance Tara Donovan QPHI Surveillance Meeting Exploring.
Syndromic Surveillance in Montreal: An Overview of Practice and Research David Buckeridge, MD PhD Epidemiology and Biostatistics, McGill University Surveillance.
Queen’s University Public Health Informatics (QPHI) Team Can monitoring Telehealth Ontario enhance Public Health surveillance? Adam van Dijk.
Using the Repeated Two-Sample Rank Procedure for Detecting Anomalies in Space and Time Ronald D. Fricker, Jr. Interfaces Conference May 31, 2008.
Harmonized Biosurveillance Use Case By Resty Namata, Maria Metty & Priyaranjan Tokachichu December 13, 2007.
Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security.
Detecting Anomalies in Space and Time with Application to Biosurveillance Ronald D. Fricker, Jr. August 15, 2008.
1 Bayesian Networks: A Tutorial. 2 Introduction Suppose you are trying to determine if a patient has tuberculosis. You observe the following symptoms:
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
8 th Grade Science Do Now Make the weekly Do Now on page ___. Why is it important for scientists to do research before starting an experiment? Monday,
Bayesian Disease Outbreak Detection that Includes a Model of Unknown Diseases Yanna Shen and Gregory F. Cooper Intelligent Systems Program and Department.
Michigan Disease Surveillance System Syndromic Surveillance Project January 2005.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Bayesian Biosurveillance of Disease Outbreaks RODS Laboratory Center for Biomedical Informatics University of Pittsburgh Gregory F. Cooper, Denver H.
Weng-Keen Wong, Oregon State University © Bayesian Networks: A Tutorial Weng-Keen Wong School of Electrical Engineering and Computer Science Oregon.
Infectious Disease Surveillance & National/Health Security Michael A. Stoto CNSTAT Workshop on Vital Data for National Needs April 30, 2008, Washington.
Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,
Do You Really Know Your Data Users (and What Matters to Them)?
Online Conditional Outlier Detection in Nonstationary Time Series
Bayesian Networks: A Tutorial
Bayesian Biosurveillance of Disease Outbreaks
APHA, Washington, November, 2007
Detection and Analysis
New Directions in Pre-Syndromic and Subpopulation Health Surveillance
Michael M. Wagner, MD PhD Professor, Department of Biomedical Informatics, University of Pittsburgh School of Medicine
One Health Early Warning Alert
Estimating the Expected Warning Time of Outbreak-Detection Algorithms
Incorporating Statistical Methodology for a Research Proposal
What’s Strange About Recent Events (WSARE)
Presentation transcript:

Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1 School of Electrical Engineering and Computer Science, Oregon State University, 2 Realtime Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, 3 Intel Research, Santa Clara

Motivation Suppose you monitor Emergency Department (ED) data which arrives in realtime Can you specifically detect a large scale anthrax attack? Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, : Male15213Shortness of breath Aug 1, : Male15146Diarrhea Aug 1, : Female15132Fever :::::

Model non-outbreak conditions and notice deviations Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models Spatial methods eg. Spatial Scan Statistic Multivariate methods eg. WSARE 2. Sat : SCORE = PVALUE = % ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

Model non-outbreak conditions and notice deviations Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models Spatial methods eg. Spatial Scan Statistic Multivariate methods eg. WSARE 2. Sat : SCORE = PVALUE = % ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True These are non-specific methods – they look for anything unusual in the data but not specifically for the onset of an anthrax attack.

Population-wide ANomaly Detection and Assessment (PANDA) A detector specifically for a large-scale outdoor release of inhalational anthrax Uses a massive causal Bayesian network Population-wide approach: each person in the population is represented as a subnetwork in the overall model

Population-Wide Approach Note the conditional independence assumptions Anthrax is infectious but non-contagious Time of Release Person Model Anthrax Release Location of Release Person Model Global nodes Interface nodes Each person in the population Person Model

Population-Wide Approach Structure designed by expert judgment Parameters obtained from census data, training data, and expert assessments informed by literature and experience Time of Release Person Model Anthrax Release Location of Release Person Model Global nodes Interface nodes Each person in the population Person Model

Person Model (Initial Prototype) Anthrax Release Location of ReleaseTime Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease GenderAge Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission … …

Person Model (Initial Prototype) Anthrax Release Location of ReleaseTime Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease GenderAge Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission … … Yesterdaynever False Female Unknown Male

Prototype is Computationally Feasible Aside from caching tricks, there are two main optimizations: Incremental Updating Equivalence Classes Performance: On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of initialization time, 3 seconds for each hour’s worth of ED data See Cooper G.F., Dash D.H., Levander J.D., Wong W-K, Hogan W. R., Wagner M. M. Bayesian Biosurveillance of Disease Outbreaks. In Proceedings of the 20th Conference on UAI. Banff, Canada: AUAI Press; pp

What do you gain with a population-wide approach? Coherent framework for: 1.Incorporating background knowledge 2.Incorporating different types of evidence 3.Data fusion 4.Explanation

1. Incorporating Background Knowledge Limited data from actual anthrax attacks available: –Postal attacks 2001 (Only 11 people affected, not representative of a large scale attack) –Sverdlovsk 1979 But literature contains studies on the characteristics of inhalational anthrax

1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern

1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern At an individual level

1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern Can represent this by the effects over individuals

2. Incorporating Evidence Easily incorporate different types of evidence eg. spatial, temporal, demographic, symptomatic Easily incorporate new evidence that distinguishes an individual (or individuals) from others –Modify the appropriate person model

3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, : Male15213Shortness of breath Aug 1, : Male15146Diarrhea Aug 1, : Female15132Fever ::::: ED dataOTC data No data available during an actual anthrax attack that captures the correlation between these two data sources. By modeling the actions of individuals, and incorporating background knowledge, we can come up with a plausible model of the effects of an attack on these two data sources.

3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, : Male15213Shortness of breath Aug 1, : Male15146Diarrhea Aug 1, : Female15132Fever ::::: ED dataOTC data OTC data – aggregated over zipcode and available daily ED data – individual patient records, available usually in real-time

3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, : Male15213Shortness of breath Aug 1, : Male15146Diarrhea Aug 1, : Female15132Fever ::::: ED dataOTC data By representing at the finest granularity (ie. each individual), we can easily deal with different spatial and temporal granularity in data fusion. See Wong, W-K, Cooper G.F., Dash D.H., Dowling, J.N., Levander J.D., Hogan W. R., Wagner M. M. Bayesian Biosurveillance Using Multiple Data Streams. In Proceedings of the 3 rd National Syndromic Surveillance Conference, 2004.

4. Explanation Important to know why the model believes an anthrax attack is occurring Can find the subset of evidence E* that most influences such a belief In PANDA, E* would correspond to a group of individuals Identify the individuals that most contribute to the hypothesis of an attack

4. Explanation Can also use the Bayesian network to calculate the most likely location of release and time of release Currently, we identify the top equivalence classes that contribute the most to the hypothesis that an attack is occurring GenderAge Decile Home Zip Respiratory Symptoms Date Admitted M True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted F True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted M True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted F True2 days ago

Future Work More sophisticated person models Improved explanation capabilities Validation of data fusion model More disease models apart from anthrax Contagious disease models Combining outputs from multiple Bayesian detectors

Thank You! RODS Laboratory: Bayesian Biosurveillance: This research was supported by grants IIS from the National Science Foundation, F from the Department of Homeland Security, and ME from the Pennsylvania Department of Health.