Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University.

Slides:



Advertisements
Similar presentations
Using ESSENCE IV An Overview Objectives Explain ESSENCE and its impact Define surveillance Define syndromic surveillance and its importance Demonstrate.
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007.
 2005 Carnegie Mellon University A Bayesian Scan Statistic for Spatial Cluster Detection Daniel B. Neill 1 Andrew W. Moore 1 Gregory F. Cooper 2 1 Carnegie.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Optimizing Disease Outbreak Detection Methods Using Reinforcement Learning Masoumeh Izadi Clinical & Health Informatics Research Group Faculty of Medicine,
Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and.
Avar Monitoring the blogosphere for emerging, health related events, so Health Officials don‘t have to Team Mentor: Avaré Stewart.
An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Weng-Keen Wong, Greg Cooper, Denver Dash *, John Levander, John Dowling,
What’s Strange About Recent Events (WSARE) v3.0: Adjusting for a Changing Baseline Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon.
 2004 University of Pittsburgh Bayesian Biosurveillance Using Multiple Data Streams Greg Cooper, Weng-Keen Wong, Denver Dash*, John Levander, John Dowling,
The Space-Time Scan Statistic for Multiple Data Streams
Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq
Strategies for Prospective Biosurveillance Using Multivariate Time Series Howard Burkom 1, Yevgeniy Elbert 2, Sean Murphy 1 1 Johns Hopkins Applied Physics.
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Conclusions On our large scale anthrax attack simulations, being able to infer the work zip appears to improve detection time over just using the home.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
Tracking a maneuvering object in a noisy environment using IMMPDAF By: Igor Tolchinsky Alexander Levin Supervisor: Daniel Sigalov Spring 2006.
Population-Wide Anomaly Detection Weng-Keen Wong 1, Gregory Cooper 2, Denver Dash 3, John Levander 2, John Dowling 2, Bill Hogan 2, Michael Wagner 2 1.
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Overview of ‘Syndromic Surveillance’ presented as background to Multiple Data Source Issue for DIMACS Working Group on Adverse Event/Disease Reporting,
“To Ignore or Not to Ignore?” Follow-up to Statistically Significant Signals" Biosurveillance Information Exchange Working Group Reflections from San Diego.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
SPONSOR JAMES C. BENNEYAN DEVELOPMENT OF A PRESCRIPTION DRUG SURVEILLANCE SYSTEM TEAM MEMBERS Jeffrey Mason Dan Mitus Jenna Eickhoff Benjamin Harris.
Radial Basis Function Networks
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
1 ESSENCE: Biosurveillance in Support of the DoD Health Mission.
Cluster Detection Comparison in Syndromic Surveillance MGIS Capstone Project Proposal Tuesday, July 8 th, 2008.
Using the Repeated Two-sample Rank Procedure for Detecting Anomalies in Space and Time Ronald D. Fricker, Jr. University of California, Riverside November.
Chapter 21 R(x) Algorithm a) Anomaly Detection b) Matched Filter.
Using the Repeated Two-Sample Rank Procedure for Detecting Anomalies in Space and Time Ronald D. Fricker, Jr. Interfaces Conference May 31, 2008.
Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research.
Practical Aspects of Alerting Algorithms in Biosurveillance Howard S. Burkom The Johns Hopkins University Applied Physics Laboratory National Security.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.
Detecting Anomalies in Space and Time with Application to Biosurveillance Ronald D. Fricker, Jr. August 15, 2008.
EE515/IS523: Security 101: Think Like an Adversary Evading Anomarly Detection through Variance Injection Attacks on PCA Benjamin I.P. Rubinstein, Blaine.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
School Surveillance Active, Passive or Combination By: Beverly Billard.
~PPT Howard Burkom 1, PhD Yevgeniy Elbert 2, MSc LTC Julie Pavlin 2, MD MPH Christina Polyak 2, MPH 1 The Johns Hopkins University Applied Physics.
No More Black Box: Methods for visualizing and understanding your data for useful analysis Howard Burkom National Security Technology Department Johns.
Bayesian Biosurveillance of Disease Outbreaks RODS Laboratory Center for Biomedical Informatics University of Pittsburgh Gregory F. Cooper, Denver H.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
General Elliptical Hotspot Detection Xun Tang, Yameng Zhang Group
Albany New York (1) G. P. Patil. Albany New York (2) G. P. Patil.
THEORY OF SAMPLING MMEA Certainty Seminar Markku Ohenoja 1 Markku Ohenoja / Control Engineering Laboratory
Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,
JMP Discovery Summit 2016 Janet Alvarado
Inference for the mean vector
Bayesian Biosurveillance of Disease Outbreaks
Probabilistic Robotics
APHA, Washington, November, 2007
Detection and Analysis
What is “Syndromic” Surveillance?
One Health Early Warning Alert
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Joint Statistical Meetings, Vancouver, August 1, 2018
MANOVA Control of experimentwise error rate (problem of multiple tests). Detection of multivariate vs. univariate differences among groups (multivariate.
Feature Selection Methods
EM Algorithm and its Applications
Modeling IDS using hybrid intelligent systems
Scenario-Based Evaluation of Cluster Detection and Tracking Capability
Automated Monitoring of Injuries Due to Falls Using the BioSense System Achintya N. Dey, MA1, Jerome I Tokars, MD MPH1, Peter Hicks, MPH2, Matthew Miller,
Stochastic Methods.
Presentation transcript:

Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University Applied Physics Laboratory DIMACS Working Group Workshop on Analytical Methods for Surveillance of Multidimensional Data Streams Rutgers University, Piscataway NJ February 19, 2004

Problem/Data Context of ESSENCE Surveillance Systems Physician Office Visits Absentee Rates Sales of OTC Remedies Hosp ER Admissions Normalization Analysis Fusion Counts/Clusters of Statistical Significance Epidemiological Significance Who? What? Where? When?

Multiplicity from intertwined effects: multiple data sources, regions, strata (syndrome groups, product groups) Multiple univariate methods –Critical issue: use individual detector outputs without getting overwhelmed by multiple testing –Low power for anomalies spread over inputs Multivariate methods –Critical issue: need modifications to reduce alerts due to irrelevant changes in data relationships –Need to retain power in individual source data Applying Statistical Process Control to Multiple Data Streams

Bonferroni bound: replace  by  /N –Alert based on individual outputs (conservative) Edgington’s “consensus” method (1972) –Combined prob from alg. comb.of N individual p-values –Z-score approximation: ( mean(p-values) – 0.5 ) / ( / √N ) Bayes Belief Net –Originated effort to add sensor data, intelligence info,… –Recently applied to separate algorithm outputs –Can weight each type of information based on training data and/or intuition –Configurable to soften thresholds for evidence accrual Significance Assessment: Multiple Univariate Alerting Algorithms

Variants of Hotelling’s T 2  = vector mean est. from current baseline S = est. of covariance matrix calc. from baseline X = multivariate (filtered?) data from test interval T 2 statistic: (X-  S -1 (X-  Ye et al, 2002) –“Neighbor-regression” preconditioning strategy of Hawkins; removal of covariance effects MEWMA (Lowry), MCUSUM (Crosier, Pignatiello/Runger) –Numerous strategies, adaptations to Poisson data –But which is appropriate for multivariate syndromic data streams? Can EWMA/Shewhart (or CUSUM/Shewhart) encompass both point-source “bioweapon” epicurve and seasonal endemic=>epidemic outbreak? Multivariate Alerting Strategies

Detection Challenge: faint rise in all 3 data sets Military Dx Military Rx Civilian Dx Respiratory Syndrome Data Counts

Detection Challenge: faint rise in all 3 data sets Lowry’s MEWMA: Day 4 alert at each FA rate Respiratory Syndrome Data Counts

Scan Statistics for Biosurveillance 10 cases, 5 days p = cases, 12 days p = cases, 7 days p < Analysis of Claims Data in National Capital Area ICD9 codes for scarlet fever: Scarlet Fever Outbreak Study

Surveillance combining outpatient visits, OTC anti-flu sales, school absenteeism

Practical Issues in Spatiotemporal Monitoring and Evaluation Control needed for mismatched scales & variances among data sources To retain power in indiv. sources, gain combined sensitivity Difficult to assess delays, relative scales of effects among separate sources, in both background & signal Simulation much harder to validate If distance matrix is used, it should reflect proximity according to the epidemiological case definition: Modifications to reflect plausible demographic behaviors The importance of significance testing grows with the number of sources, especially for subregions where expected counts are low More sources => more small spurious clusters

Finding Clusters with Multiple Data Sources For candidate cluster J1, the Kulldorff likelihood ratio is: LR(J1) ≡ (O1/E1) O1 * ((N-O1) / (N-E1)) (N-O1) where O1 = number of cases inside J1, E1 = number of cases outside J1, N = total case count Extension by treating multiple sources as covariates: O1 =  O1 k, E1 =  E1 k, N =  N k, for sources k=1,…,K – “adjusted method”: problem of adding sources with mismatched scales, variances Alternate multisource approach: “stratified” scan statistic  log( LR(J1 k ) ), k=1,…,K – reduces chances for a noisy source to overwhelm others – can cost power to detect faint signal spread over sources

FROC Performance Assessment Adjusted vs Stratified Multisource Scan Statistics Prob. Random Background Significant Cluster Prob. Signal-Based Significant Cluster