Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.
Face Alignment by Explicit Shape Regression
Design Rule Generation for Interconnect Matching Andrew B. Kahng and Rasit Onur Topaloglu {abk | rtopalog University of California, San Diego.
Watermarking 3D Objects for Verification Boon-Lock Yeo Minerva M. Yeung.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
2005 Syndromic Surveillance1 Estimating the Expected Warning Time of Outbreak- Detection Algorithms Yanna Shen, Weng-Keen Wong, Gregory F. Cooper RODS.
Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University.
Wed. 17th Sept Hamburg LOFAR Workshop.  Extract a cosmological signal from a datacube, the three axes of which are x and y positions, and frequency.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Bayesian Biosurveillance Gregory F. Cooper Center for Biomedical Informatics University of Pittsburgh The research described in this.
Avar Monitoring the blogosphere for emerging, health related events, so Health Officials don‘t have to Team Mentor: Avaré Stewart.
G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.
Decision Theoretic Analysis of Improving Epidemic Detection Izadi, M. Buckeridge, D. AMIA 2007,Symposium Proceedings 2007.
Lecture 5 Template matching
An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
Paper Title Your Name CMSC 838 Presentation. CMSC 838T – Presentation Motivation u Problem paper is trying to solve  Characteristics of problem  … u.
UPM, Faculty of Computer Science & IT, A robust automated attendance system using face recognition techniques PhD proposal; May 2009 Gawed Nagi.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Spatiotemporal Cluster Detection in ESSENCE Biosurveillance Systems Panelist: Howard Burkom National Security Technology Department, John Hopkins University.
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
1 Bayesian Network Anomaly Pattern Detection for Disease Outbreaks Weng-Keen Wong (Carnegie Mellon University) Andrew Moore (Carnegie Mellon University)
Overview of ‘Syndromic Surveillance’ presented as background to Multiple Data Source Issue for DIMACS Working Group on Adverse Event/Disease Reporting,
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Chapter 12: Simulation and Modeling Invitation to Computer Science, Java Version, Third Edition.
ETM 607 – Random Number and Random Variates
Anomaly detection Problem motivation Machine Learning.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
March 8, 2006Spectral RTL ATPG1 High-Level Spectral ATPG for Gate-level Circuits Nitin Yogi and Vishwani D. Agrawal Auburn University Department of ECE.
Network Intrusion Detection Using Random Forests Jiong Zhang Mohammad Zulkernine School of Computing Queen's University Kingston, Ontario, Canada.
A Wavelet-based Anomaly Detector for Disease Outbreaks Thomas Lotze Galit Shmueli University of Maryland College Park Sean Murphy Howard Burkom Johns Hopkins.
What’s Strange About Recent Events (WSARE) Weng-Keen Wong (University of Pittsburgh) Andrew Moore (Carnegie Mellon University) Gregory Cooper (University.
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
Multivariate Signature Scheme using Quadratic Forms Takanori Yasuda (ISIT) Joint work with Tsuyoshi Takagi (Kyushu Univ.), Kouichi Sakurai (Kyushu Univ.)
Automatic Minirhizotron Root Image Analysis Using Two-Dimensional Matched Filtering and Local Entropy Thresholding Presented by Guang Zeng.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Use of Public Health Intelligence for Disease Outbreaks (PHIDO) to Enhance Provincial Routine Reportable Disease Surveillance in Manitoba Weimin Hu, MBBS.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Detecting Influenza Outbreaks by Analyzing Twitter Messages By Aron Culotta Jedsada Chartree 02/28/11.
CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.
SASMI Self-Awareness and Self-Monitoring for Innovation.
NC-BSI: TASK 3.5: Reduction of False Alarm Rates from Fused Data Problem Statement/Objectives Research Objectives Intelligent fusing of data from hybrid.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Unsupervised Streaming Feature Selection in Social Media
Comparison of Image Registration Methods David Grimm Joseph Handfield Mahnaz Mohammadi Yushan Zhu March 18, 2004.
~PPT Howard Burkom 1, PhD Yevgeniy Elbert 2, MSc LTC Julie Pavlin 2, MD MPH Christina Polyak 2, MPH 1 The Johns Hopkins University Applied Physics.
Glossary of Technical Terms Correlation filter: a set of carefully designed correlation templates with regard to shift invariance as well as distortion-
No More Black Box: Methods for visualizing and understanding your data for useful analysis Howard Burkom National Security Technology Department Johns.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
EECS6898 Final Project Mortality Predictions in ICU Yijing Feng yf2375.
Mining Statistically Significant Co-location and Segregation Patterns.
Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems Anna L. Buczak, PhD, Linda J. Moniz, PhD, Joseph Lombardo,
Computer aided teaching of statistics: advantages and disadvantages
Monte Carlo Quality Assessment (MC-QA)
Research in Computational Molecular Biology , Vol (2008)
Statistics 1: Elementary Statistics
Estimating the Expected Warning Time of Outbreak-Detection Algorithms
Digital Image Processing Week III
Geology 491 Spectral Analysis
Jia-Bin Huang Virginia Tech
Presentation transcript:

Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and Inbal Yahav RH Smith School of Business University of Maryland with Howard Burkom and Sean Murphy JHU Applied Physics Lab This work was partially supported by NIH grant RFA-PH

Outline The Biosurveillance Problem Motivation: Reasons for simulation Simulation Methodology  Options/Generation  Mimicking a dataset Analysis  Is this is a good mimic? Results

The Biosurveillance Problem

The Biosurveillance Problem, cont. Given time series (usually pre-diagnostic daily data) Detect disease outbreaks With few false alerts Early

Difficulties with Biosurveillance Data Teams work on different authentic datasets  Each team has their own private data  Cannot compare results  Researchers with no data cannot join the effort Data are unlabeled  We don’t know exactly when there are outbreaks  Challenges evaluation of algorithm performance  Hinders comparison of different algorithms

Project Mimic Q: What if there was a way to  generate pseudo-authentic data  similar in statistical structure to real data  AND insert simulated outbreak signatures into it? A: we’d have new, labeled pseudo-real data!

Project Mimic: Dataset Mimicker “Mimics” statistical structure of background data  Levels of counts of different series  Day-of-week patterns  Seasonal patterns  Holidays  Within-series autocorrelation  Cross-series cross-correlation Extracts features from the authentic dataset Output: dataset that “looks” like real dataset

Set of 6 series from one city OriginalMimicked Resp GI

3 series from one city, zoomed in

Mimic Methodology Our method(s):  Create random autocorrelated multivariate data Normal or poisson Uses mean, standard deviation, reduced cross- correlation, 1-day acf from original  Holiday factor  Seasonal factor  Day-of-week factor  Details at Mimicking implicitly uses a generative model What is the right model?

Evaluating Mimics Test: could the original data have been generated from the mimicker? Compare different generative models If the model were simple, could use AIC Instead, Chi-squared

Chi-squared Goodness-of-fit Tests By series By day of week Separate values into bins Chi-squared Test on counts

Example of Disparity

Project Mimic: Outbreak signature simulator Generates multivariate outbreak-signatures Options:  Number of outbreak-signatures in series?  Magnitude of outbreak?  How many (and which) series will include outbreak- signatures?  Stochastic/fixed?  Include effects such as DOW, holidays, etc.? (like background data) Output: matrix of outbreak-signatures to be inserted in the background data

Outbreak labels

Project Mimic Combining the background matrix + outbreak-signature matrix yields labeled data Two final products  Mimicker: Data and outbreak-signature simulators (in freeware R) Can be used by data owners to disseminate pseudo-data Can be used by research teams to evaluate robustness of methods  Mimics: Datasets that mimic DARPA BioALIRT data Benchmark datasets for comparison across groups Can be used to perform optimization methods for improved detection  Available at Example: BioALIRT data on 3 series (Resp from civilian/military/prescriptions)

Mimicked data + outbreak-signature

Conclusions Mimic opens the door to:  new techniques  new researchers First data sets of their kind  Open methodology  Publicly available  Realistic