Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean.

Similar presentations

Presentation on theme: "Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean."— Presentation transcript:

1 Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean 11/17/2005

2 Detection Problems Traditionally used medical and public health data may take months to collect, obtain, and analyze –Need better system for collection, efficient detection and privacy protection Real-time collection often does not result in enough data, the signal is too weak for detection –Need to be able to collect and effectively analyze more data from different sources

3 Outline System and Data Requirements for Timely Detection Grocery Sales Data Example: Combining Data Across Sources Advantages and Disadvantages of Different Data Sources

4 Detection System and Data Requirements Types of Data Available Traditional data –ER visits, 911 calls, mortality records, veterinary reports, school or work absence records… Non-traditional –To detect known agent, e.g. anthrax –OTC medication sales, grocery (e.g. OJ and soup) sales Initial Data Requirements Frequently collected –Real-time, frequent non-traditional data, or improved traditional Fast transfer –Electronic recording and data conversion

5 Essential Data Features Early signature of the outbreak –Data allows detection of a disease signature a day or week before the disease apparent –OTC sales, website searches, bio-sensors Sufficient amounts of data –Lack of sufficient data leads to under-detection –Temporal or spatial aggregation, but could slow detection or dampen a signal Local, not regional or national data –Improves sensitivity and timeliness

6 Detection System Requirements Immediate analysis of incoming data –Resources for quick storage and efficient detection algorithms Immediate output –Output an operational decision-making conclusion in a user-friendly transferable format Flexibility –Almost or fully automated for different outbreak types Considerations – Number of false alarms vs. speed of true detection rate –Expense of false alarms vs. risk of not detecting true outbreak

7 Advances NYC syndromic surveillance system* –Track 911 calls, OTC sales, ER admissions, absenteeism (weekly false alarms) Real-time outbreak and disease surveillance (RODS) system –Real-time collection of ER visits in Western Pennsylvania (including retailer data) National Electronic Disease Surv. System –CDC initiative for electronic transfer of health information New sources (not yet available) –Track medical web searches, body tracking devices, biosensor data

8 Inhalational Anthrax First stage –A few hours to a few days (assume within 3 days) –Nonspecific symptoms: fever, sweat, fatigue, cough, sore throat, nausea, headache –Similar to flu symptoms, except no runny nose –Rapid treatment improves survival Second stage –Develops rapidly –Extreme symptoms –At least 80% fatality rate within 2 – 48 hours Grocery Sales Example

9 Data electronically recorded in real-time Large amounts of data at rich levels of detail Processing time vs. level of detail considerations Aggregated level of daily sales for each item and hourly basket-level data Purchase data are localized, useful for detecting large-scale outbreaks in small areas OTC and grocery sales can show an early signature of symptoms of an outbreak Dependence between sales within neighboring periods of time due to fine time scale Smaller ratio between signal and noise Sales Data Features

10 Statistical Detection System Framework Decide which items to monitor –Epidemiological and statistical analysis of information contained in different sales Model the no-outbreak sales baseline –Account for promotions, sales, season, etc that would add noise (clean data) Simulate an outbreak signature –Footprint of anthrax known in traditional data, consult with outside experts for new data Develop a roll-forward algorithm –Integrate previous data for detection in new data Test system for real and false alarms –Select threshold based on simulations

11 Data Nasal symptoms are unrelated to anthrax Focus on cough meds (daily) and tissues, OJ and soup (basket-level)

12 Baseline Data indicates seasonal effect in overall sales and includes flu cases Assume cough meds insensitive to promotions Smoothing methods applied Estimate baseline variability False alarms near holidays for all methods Simulation Epidemiologist opinions on how anthrax is manifested in cough medication sales Sales increase linearly over 3 day period

13 Detection System Formulation Detail from reference [12] Clean data –Preprocess: Account for store level sales –Filter/De-noise: Decompose series into cosine waves, retain those with large magnitudes Forecast via wavelet approach –Efficient and tractable for non-stationary series –Autoregressive moving average model not flexible to data type, user intervention required –Decompose series into resolutions of different frequencies –For each resolution, use autoregressive model for forecasting the next point

14 Detection System (cont.) Threshold for next-day forecasts –Control chart type argument to determine anthrax- related variability –Alarm if true sales more than 3 standard deviations above de-noised series prediction Basket-level (50 products, 200k-500k/week) –Method of association rules: Pairs and triplets –Threshold: Most unexpected combinations Evaluation –Simulate anthrax footprint as 3 day spike linearly increasing pattern –Study different configurations of the system –If the scale of the footprint increases cough sales by factor of 1.36 or more, 100% footprints detected –Outbreaks coinciding with holidays problematic


16 Combining Data Sources: Benefits and Challenges Data Linkage Linking data from multiple sources requires system-wide unique identifiers or variables for record linkage Linkage methods use match features or string distances –Need extensions that link multiple lists and allow for missing identifiers

17 Approaches to Using Multiple Data Sources Independently and simultaneously monitor separate sources –Multiple testing inflates false alarm rate Track different series intensively but sequentially –Alarms trigger further data collection and analyses of other series (Univ. of Utah – flu) –Hierarchical signaling Multivariate modeling –Use merged records for individuals or families –Measurement error from record linkage –Privacy and confidentiality concerns

18 Privacy and Confidentiality Issues Health Ins. Portability and Accountability Act (HIPPA) restrictions –Permits de-identified data for research –Medical and public health org. may be exempt Private commercial interests –Concern over information in grocery and OTC sales data Integrated data concerns –Linking across databases may pose more risks in exposing confidential information

19 Summary: Questions for Consideration What and how do non-traditional data carry signals of an outbreak? How can we efficiently and accurately integrate and analyze data from multiple sources? How can we effectively temporally or spatially aggregate data? How can we use geographic detail to control excessive false alarms? Can merged files useful for detection not allow for re-identification and linkage to source? Is a risk-utility trade-off tolerable? Can a trusted third-party update files in real-time, separately from the detection system?

Download ppt "Statistical Issues and Challenges Associated with Rapid Detection of Bio-Terrorist Attacks SE Fienberg and G Shmueli (2005) Presented by Lisa Denogean."

Similar presentations

Ads by Google