Presentation is loading. Please wait.

Presentation is loading. Please wait.

Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1.

Similar presentations


Presentation on theme: "Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1."— Presentation transcript:

1 Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1 School of Electrical Engineering and Computer Science, Oregon State University, 2 Realtime Outbreak and Disease Surveillance Laboratory, University of Pittsburgh, 3 Intel Research, Santa Clara

2 Motivation Suppose you monitor Emergency Department (ED) data which arrives in realtime Can you specifically detect a large scale anthrax attack? Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, 2005 3:0220-30Male15213Shortness of breath Aug 1, 2005 3:0740-50Male15146Diarrhea Aug 1, 2004 3:0970-80Female15132Fever :::::

3 Model non-outbreak conditions and notice deviations Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models Spatial methods eg. Spatial Scan Statistic Multivariate methods eg. WSARE 2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True

4 Model non-outbreak conditions and notice deviations Traditional Univariate Methods eg. Control chart, CUSUM, EWMA, time series models Spatial methods eg. Spatial Scan Statistic Multivariate methods eg. WSARE 2. Sat 2001-03-13: SCORE = -0.00000464 PVALUE = 0.00000000 12.42% ( 58/467) of today's cases have 20 ≤ Age < 30 AND Respiratory Syndrome = True 6.53% (653/10000) of baseline have 20 ≤ Age < 30 AND Respiratory Syndrome = True These are non-specific methods – they look for anything unusual in the data but not specifically for the onset of an anthrax attack.

5 Population-wide ANomaly Detection and Assessment (PANDA) A detector specifically for a large-scale outdoor release of inhalational anthrax Uses a massive causal Bayesian network Population-wide approach: each person in the population is represented as a subnetwork in the overall model

6 Population-Wide Approach Note the conditional independence assumptions Anthrax is infectious but non-contagious Time of Release Person Model Anthrax Release Location of Release Person Model Global nodes Interface nodes Each person in the population Person Model

7 Population-Wide Approach Structure designed by expert judgment Parameters obtained from census data, training data, and expert assessments informed by literature and experience Time of Release Person Model Anthrax Release Location of Release Person Model Global nodes Interface nodes Each person in the population Person Model

8 Person Model (Initial Prototype) Anthrax Release Location of ReleaseTime Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease GenderAge Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission … …

9 Person Model (Initial Prototype) Anthrax Release Location of ReleaseTime Of Release Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease GenderAge Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission Anthrax Infection Home Zip Respiratory from Anthrax Other ED Disease Gender Age Decile Respiratory CC From Other Respiratory CC Respiratory CC When Admitted ED Admit from Anthrax ED Admit from Other ED Admission … … Yesterdaynever False 15213 20-30 Female Unknown 15146 50-60Male

10 Prototype is Computationally Feasible Aside from caching tricks, there are two main optimizations: Incremental Updating Equivalence Classes Performance: On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of initialization time, 3 seconds for each hour’s worth of ED data See Cooper G.F., Dash D.H., Levander J.D., Wong W-K, Hogan W. R., Wagner M. M. Bayesian Biosurveillance of Disease Outbreaks. In Proceedings of the 20th Conference on UAI. Banff, Canada: AUAI Press; 2004. pp94-103.

11 What do you gain with a population-wide approach? Coherent framework for: 1.Incorporating background knowledge 2.Incorporating different types of evidence 3.Data fusion 4.Explanation

12 1. Incorporating Background Knowledge Limited data from actual anthrax attacks available: –Postal attacks 2001 (Only 11 people affected, not representative of a large scale attack) –Sverdlovsk 1979 But literature contains studies on the characteristics of inhalational anthrax

13 1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern

14 1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern At an individual level

15 1. Incorporating Background Knowledge Can coherently incorporate different types of background knowledge eg. for inhalational anthrax: Progression of symptoms Incubation period Spatial dispersion pattern Can represent this by the effects over individuals

16 2. Incorporating Evidence Easily incorporate different types of evidence eg. spatial, temporal, demographic, symptomatic Easily incorporate new evidence that distinguishes an individual (or individuals) from others –Modify the appropriate person model

17 3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, 2005 3:02 20-30Male15213Shortness of breath Aug 1, 2005 3:07 40-50Male15146Diarrhea Aug 1, 2004 3:09 70-80Female15132Fever ::::: ED dataOTC data No data available during an actual anthrax attack that captures the correlation between these two data sources. By modeling the actions of individuals, and incorporating background knowledge, we can come up with a plausible model of the effects of an attack on these two data sources.

18 3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, 2005 3:02 20-30Male15213Shortness of breath Aug 1, 2005 3:07 40-50Male15146Diarrhea Aug 1, 2004 3:09 70-80Female15132Fever ::::: ED dataOTC data OTC data – aggregated over zipcode and available daily ED data – individual patient records, available usually in real-time

19 3. Data Fusion Date / Time Admitted AgeGenderHome ZipChief Complaint Aug 1, 2005 3:02 20-30Male15213Shortness of breath Aug 1, 2005 3:07 40-50Male15146Diarrhea Aug 1, 2004 3:09 70-80Female15132Fever ::::: ED dataOTC data By representing at the finest granularity (ie. each individual), we can easily deal with different spatial and temporal granularity in data fusion. See Wong, W-K, Cooper G.F., Dash D.H., Dowling, J.N., Levander J.D., Hogan W. R., Wagner M. M. Bayesian Biosurveillance Using Multiple Data Streams. In Proceedings of the 3 rd National Syndromic Surveillance Conference, 2004.

20 4. Explanation Important to know why the model believes an anthrax attack is occurring Can find the subset of evidence E* that most influences such a belief In PANDA, E* would correspond to a group of individuals Identify the individuals that most contribute to the hypothesis of an attack

21 4. Explanation Can also use the Bayesian network to calculate the most likely location of release and time of release Currently, we identify the top equivalence classes that contribute the most to the hypothesis that an attack is occurring GenderAge Decile Home Zip Respiratory Symptoms Date Admitted M20-3015213True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted F20-3015213True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted M30-4015213True2 days ago GenderAge Decile Home Zip Respiratory Symptoms Date Admitted F40-5015213True2 days ago

22 Future Work More sophisticated person models Improved explanation capabilities Validation of data fusion model More disease models apart from anthrax Contagious disease models Combining outputs from multiple Bayesian detectors

23 Thank You! RODS Laboratory: http://rods.health.pitt.edu Bayesian Biosurveillance: http://www.cbmi.pitt.edu/panda/ This research was supported by grants IIS-0325581 from the National Science Foundation, F30602-01-2-0550 from the Department of Homeland Security, and ME-01-737 from the Pennsylvania Department of Health.


Download ppt "Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1."

Similar presentations


Ads by Google