Presentation is loading. Please wait.

Presentation is loading. Please wait.

Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University.

Similar presentations


Presentation on theme: "Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University."— Presentation transcript:

1 Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University With Stephen Fienberg (Statistics) Anna Goldenberg & Rich Caruana (CS)

2 Overview Current bio-surveillance systems – Monitoring traditional data – Using simple SPC methods Early detection – Use of non-traditional data – Building a flexible, automated detection system – Evaluating the system Results and enhancements

3 Traditional Data Sources Public health sources – School absence records – Sentinel practices – Laboratory data Medical sources – Patient visits at urgent care, outpatient clinics, emergency rooms Speed of detection: weeks after the actual occurrence – Rate of data arrival

4 Why is detection slow? Data arrives late – Projects using electronic reporting systems: Influenza surveillance system (U of Utah) Tracking ICD9 codes (U of Pittsburgh) Future: increasing availability of electronic means for gathering surveillance data Data available on weekly or monthly scale Data are nation-wide Signature of outbreak in data is late!

5 Non-Traditional Data Data that indirectly measure symptoms – Over-the-counter medication and grocery sales – Web browsing at medical websites – Automatic body tracking devices Different levels of availability Regional, localized data Confidentiality issues

6 Manifestation of Flu in Traditional and Non-Traditional Data Lab Flu WebMD School Cough& Cold Throat Resp Viral Death weeks

7 OTC Medication and Grocery Sales Benefits – Manifestation of outbreak is very early – Timeliness in collection and reporting (daily) – Extremely detailed (basket-level) Drawbacks – No info about epidemic manifestation in sales data – Requires knowledge about marketing efforts (sales, discounts) – If outbreak replicates sales patterns – hard to detect (Holidays are a big challenge) – Hard to model!

8 Prior Uses of Non-Traditional Data Diarrheal Disease Surveillance: data from 38 drug stores in NY (Mikol et al., 2000) Monitoring near-real-time satellite vegetation and climate data for predicting emerging Rift Valley Fever epidemics in East Africa (DoD and NASA, 2001)

9 Description of Our Data Daily sales of several OTC medication groups for 541 days between Aug 8, ’ 99 to Jan 31, ‘ 01 Concentrated on cough&cold medication (inhalational symptoms): – Cough medication – Tabs & Caps – Nasal medication

10 Hypothetical Scenario of an Inhalational Anthrax Attack Symptoms: almost all typical to flu! – fever – fatigue – cough – mild chest discomfort – but no runny nose (!) Death may occur within 24-36 hours

11 Sales of Four Sub-Categories

12 Overview Current bio-surveillance systems Non-traditional data The detection system An evaluation method Results and Conclusions Future work

13 The Detection System Take into account special features of OTC and grocery sales data – Time series – Seasonality – Weekday/Weekend effect – Stores closed on certain days – Influence of total sales patterns – Very noisy, non-stationary Create automated system

14 Layers of the Detection System WARNING! – POSSIBLE BEGINNING OF AN EPIDEMIC/ATTACK YES Real-time sales > threshold Preprocessing Forecasting next day sales Creating a threshold New day sales NO De-noising

15 Pre-Processing

16 De-Noising Target: obtain main features of data, reduce noise to improve predictability Selected method: Discrete Cosine Transform with horizontal filtering How much to de-noise? – Retain minimal coefficient set that Maximizes accuracy Optimizes predictability – Use cross-validation and MSE-based criteria

17 De-Noising: DCT with Horizontal Filtering de-noised set 2 de-noised set 1

18 Forecasting Target: Predict next day sales Use pre-processed, de-noised data Problem: non-stationary (ARIMA doesn ’ t work) Method: 1) decompose with wavelets 2) predict each wavelet resolution 3) sum to obtain overall prediction

19 Prediction Using Wavelets

20 Threshold Selection: SPC Based on empirical distribution of residuals (real values – predictions), we fit a “ 3σ ” limit

21 Comparing Next-Day Sales to the Threshold

22 Overview Current bio-surveillance systems Non-traditional data The detection system An evaluation method Results and Conclusions Ongoing work (basket-level data) Future work

23 Evaluating the System How fast does it detect an anthrax footprint? Problems: – data does not include outbreak signature – We don ’ t know what signature looks like in such data Solution: simulated signature 123123 day spike base Inhalational anthrax signature

24 Constructing the Signature Sverdlovsk outbreak, 1979 Based on data from Meselson et al., Science (1994)

25 Anthrax Signature in OTC Sales Add signature at each data point sequentially, and look at rate of detection Try different slopes, heights Compare different configurations of system for different signatures slope = 1/3 Detects 100% of spikes within 3 days for height = 1.3(data range)

26 Results and Conclusions The detection system – works with grocery data – detects simulated footprint quickly – has low false alarm rate The system is flexible (tools are interchangeable) Almost fully automated, efficient computation “ Perfect bio-attack ” is on holiday

27 Future Work Combine with traditional medical and public health data sources Aggregated data: Track several series simultaneously Basket data: Utilize other features of grocery data such as spatial factor, customer information


Download ppt "Early Statistical Detection of Bio-Terrorism Attacks by Tracking OTC Medication Sales Galit Shmueli Dept. of Statistics and CALD Carnegie Mellon University."

Similar presentations


Ads by Google