Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University.

Similar presentations


Presentation on theme: "Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University."— Presentation transcript:

1 Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University Applied Physics Laboratory DIMACS Working Group Workshop on Analytical Methods for Surveillance of Multidimensional Data Streams Rutgers University, Piscataway NJ February 19, 2004

2 Problem/Data Context of ESSENCE Surveillance Systems Physician Office Visits Absentee Rates Sales of OTC Remedies Hosp ER Admissions Normalization Analysis Fusion Counts/Clusters of Statistical Significance Epidemiological Significance Who? What? Where? When?

3 Multiplicity from intertwined effects: multiple data sources, regions, strata (syndrome groups, product groups) Multiple univariate methods –Critical issue: use individual detector outputs without getting overwhelmed by multiple testing –Low power for anomalies spread over inputs Multivariate methods –Critical issue: need modifications to reduce alerts due to irrelevant changes in data relationships –Need to retain power in individual source data Applying Statistical Process Control to Multiple Data Streams

4 Bonferroni bound: replace  by  /N –Alert based on individual outputs (conservative) Edgington’s “consensus” method (1972) –Combined prob from alg. comb.of N individual p-values –Z-score approximation: ( mean(p-values) – 0.5 ) / ( 0.2887 / √N ) Bayes Belief Net –Originated effort to add sensor data, intelligence info,… –Recently applied to separate algorithm outputs –Can weight each type of information based on training data and/or intuition –Configurable to soften thresholds for evidence accrual Significance Assessment: Multiple Univariate Alerting Algorithms

5 Variants of Hotelling’s T 2  = vector mean est. from current baseline S = est. of covariance matrix calc. from baseline X = multivariate (filtered?) data from test interval T 2 statistic: (X-  S -1 (X-  Ye et al, 2002) –“Neighbor-regression” preconditioning strategy of Hawkins; removal of covariance effects MEWMA (Lowry), MCUSUM (Crosier, Pignatiello/Runger) –Numerous strategies, adaptations to Poisson data –But which is appropriate for multivariate syndromic data streams? Can EWMA/Shewhart (or CUSUM/Shewhart) encompass both point-source “bioweapon” epicurve and seasonal endemic=>epidemic outbreak? Multivariate Alerting Strategies

6 Detection Challenge: faint rise in all 3 data sets Military Dx Military Rx Civilian Dx Respiratory Syndrome Data Counts

7 Detection Challenge: faint rise in all 3 data sets Lowry’s MEWMA: Day 4 alert at each FA rate Respiratory Syndrome Data Counts

8 Scan Statistics for Biosurveillance 10 cases, 5 days p = 0.013 15 cases, 12 days p = 0.002 11 cases, 7 days p < 0.001 Analysis of Claims Data in National Capital Area ICD9 codes for scarlet fever: 034 034.1 Scarlet Fever Outbreak Study

9 Surveillance combining outpatient visits, OTC anti-flu sales, school absenteeism

10 Practical Issues in Spatiotemporal Monitoring and Evaluation Control needed for mismatched scales & variances among data sources To retain power in indiv. sources, gain combined sensitivity Difficult to assess delays, relative scales of effects among separate sources, in both background & signal Simulation much harder to validate If distance matrix is used, it should reflect proximity according to the epidemiological case definition: Modifications to reflect plausible demographic behaviors The importance of significance testing grows with the number of sources, especially for subregions where expected counts are low More sources => more small spurious clusters

11 Finding Clusters with Multiple Data Sources For candidate cluster J1, the Kulldorff likelihood ratio is: LR(J1) ≡ (O1/E1) O1 * ((N-O1) / (N-E1)) (N-O1) where O1 = number of cases inside J1, E1 = number of cases outside J1, N = total case count Extension by treating multiple sources as covariates: O1 =  O1 k, E1 =  E1 k, N =  N k, for sources k=1,…,K – “adjusted method”: problem of adding sources with mismatched scales, variances Alternate multisource approach: “stratified” scan statistic  log( LR(J1 k ) ), k=1,…,K – reduces chances for a noisy source to overwhelm others – can cost power to detect faint signal spread over sources

12 FROC Performance Assessment Adjusted vs Stratified Multisource Scan Statistics Prob. Random Background Significant Cluster Prob. Signal-Based Significant Cluster


Download ppt "Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University."

Similar presentations


Ads by Google