Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors.

Similar presentations


Presentation on theme: "1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors."— Presentation transcript:

1 1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors

2 2 Overview Objective Contributions Approach TEMPADIS Summary and Conclusions

3 3 Objective Discover patterns that represent groups of patients that had a similar course of disease for a catastrophic or chronic illness Motivation –Medical –AI

4 4 Contributions Data Preprocessing –Normalization –Learning Missing Data –Learning Implicit Knowledge Exploratory Analysis –Event Set Sequence Approach

5 5 Contributions Domain Understanding –New perspective on mass of data –Identify groups of patients for further medical study

6 6 Approach Example Events – Laboratory Results 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00

7 7 Approach Example Events 468 C CV 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG Example Events – Visits – Diagnoses – Pharmacy

8 8 Event Set Sequences –Events Value Event: laboratory test result, visit Duration Event: pharmacy, diagnosis –Event Set is all Events that occur in a window of time –Event Set Sequence is all Event Sets that occur over a long period of time Approach Event Set Sequences

9 9 Approach Example Event Set 461 L WBC 2.70 461 L HCT 40.10 461 L PLT 239.00 461 L CD4% 19.00 461 L CD4A 188.00 468 C CV 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED 469 P CTM 60 CO-TRIMOXAZOLE DS 469 P AZT 200 ZIDOVUDINE 100MG

10 10 Approach Normalization –Normal for each patient is different –Especially when effected by a catastrophic or chronic illness –Example: CD4A General Population Normal: 416 - 1751 Well HIV-positive patient: 200 - 350 Severely immune-compromised patient: 0 - 50

11 11 Approach Normalization (continued) –Scale to -4…0…+4 0 is normal Each number represents a deviation from normal 1 and 2 are noticeable but not severe 3 is severe 4 is very severe

12 12 Approach Replace Missing Data –Diagnosis data very incomplete –Learn severity of condition from pharmacy data –Induce decision tree to classify conditions

13 13 Approach Create Health Status Categories 1= HIV-positive asymptomatic 2 = Asymptomatic, on anti-HIV therapy 3 = Immune-compromised, on prophylactic therapy 4= Active illness 5 = Severe active illness

14 14 Approach Learn Implicit Knowledge –Need to augment explicit knowledge –Recovery time is expert’s implicit knowledge –Use neural network to learn recovery time function 0 = Nothing to recover from 1-4 = weeks to recover 5 = 5 or more weeks to recover

15 15 Approach Categorize Pharmacy Data –A myriad of drugs prescribed –Need to understand significance –Categorize by use

16 16 Approach Categories –Nucleoside Analogs –Protease Inhibitors –Prophylaxis Therapies –Intraveneous antibiotics –Anti-virals –Anti-PCP/Toxoplasmosis –Anti-mycobacterials

17 17 Approach Categories (continued) – Anti-wasting syndrome – Anti-fungals – Chemotherapies

18 18 Approach Result: Understandable representation of patient data 861 C 1.1 26.1 167 0.0 0 16 0 862 0.0 0.0 0 0.0 0 0 2 24: 30 38: 50 867 H 4.3 19.2 144 0.0 0 11 3 0: 3 22: 1 35: 2 868 H 2.2 26.2 144 0.0 0 5 3 0: 3 22: 1 35: 2 869 0.0 0.0 0 0.0 0 0 1 35: 60 874 C 1.3 32.4 0 0.0 0 17 0 889 C 1.1 30.4 154 0.0 0 36 0 890 0.0 0.0 0 0.0 0 0 3 22: 30 38: 50 39:480 923 0.0 0.0 0 0.0 0 0 1 39:480 933 H 3.6 20.4 182 0.0 0 11 3 0: 2 22: 1 39: 12

19 19 Approach Result: Understandable representation of patient data 861 C 3 1 -4 -3 0 -9 -9 –1 0 0 2 0 0 0 0 0 0 0 867 H 4 4 0 -4 -1 -9 -9 –2 0 0 2 0 0 0 1 1 0 0 868 H 4 1 -2 -3 -1 -9 -9 –4 0 0 2 0 0 0 1 1 0 0 874 C 4 3 -4 -1 -9 -9 -9 0 0 0 2 0 0 0 1 1 0 0 889 C 4 2 -4 -2 -1 -9 -9 2 0 0 2 0 0 0 1 1 0 0 933 H 4 4 0 -4 0 -9 -9 –2 0 0 1 0 0 0 0 2 0 0

20 20 Approach Result: Understandable representation of patient data < { (EV C)(HS 3)(RT 1)(WBC -4)(HCT -3)(PLT 0) (LMPH –1)(onD 0010000000) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT -1) (LMPH –2)(onD 0010001100) } { (EV H)(HS 4)(RT 1)(WBC -2)(HCT -3)(PLT -1) (LMPH –4)(onD 0010001100) } { (EV C)(HS 4)(RT 3)(WBC -4)(HCT -1) (onD 00010001100) } { (EV C)(HS 4)(RT 2)(WBC -4)(HCT -2)(PLT -1) (LMPH 2)(onD 0010001100) } { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT 0) (LMPH –2)(onD 0010000100) } >

21 21 Approach Inexact Match –Use set difference Partial match, feature by feature Assumes default partial match for missing data –Use weakest-link/average-link Require minimum degree of match Require average degree of match

22 22 TEMPADIS Raw Target Data Data Cleaning Data Normalization Normalized Database

23 23 TEMPADIS Normalized Database Decision Tree Neural Net Reduced, Knowledge-Added Data

24 24 TEMPADIS Knowledge-Added Database Sequence Builder Temporal Patterns

25 25 Validation –Results are temporal patterns that demonstrate groups of patients had similar experience during the course of disease –Only medical experts can assess validity of discovered patterns –These results have been validated by the experts in the HIV Clinical Research Group Results

26 26 Results Given a database of patients followed for 4 to 9 years –Discovered interesting patterns –Interestingness has multiple dimensions Length Data that appears in the patterns Data that does not appear in the patterns

27 27 Results Advanced patients, subject to various OIs < { (EV C)(HS 3)(RT 0)(WBC 0)(HCT -1)(PLT 0)(LMPH -3) (onD 0000000000) } { (EV E)(HS 3)(RT 2)(WBC 3)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -1)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 1)(WBC -1)(HCT -1)(PLT 1)(LMPH 2) (onD 1010000000) } { (EV E)(HS 3)(RT 1)(WBC 2)(HCT -1)(PLT 1)(LMPH 4) (onD 0000000000) } { (EV C)(HS 3)(RT 1)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) (CD4A -2)(LMPH 0)(onD 1010000000) } >

28 28 Advanced patients, fairly stable < { (EV C)(HS 3)(RT 0)(WBC -1)(HCT -1)(PLT 1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 3)(RT 0)(WBC 0)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(onD 1010000000) } { (EV C)(HS 3)(RT 0)(WBC -2)(HCT 0)(PLT -1)(CD4P -4) (CD4A -4)(LMPH 0)(onD 0010000000) } { (EV C)(HS 4)(RT 1)(WBC 1)(HCT -4)(PLT 0)(CD4P -4) (CD4A -4)(LMPH -4)(onD 0011001000) } { (EV C)(HS 3)(RT 3)(onD 0010000000) } { (EV )(HS 3)(RT 1)(WBC 0)(HCT 0)(PLT 0)(LMPH 0) (onD 0000000000) } { (EV C)(HS 3)(RT 0)(CD4A -4)(onD 0010000000) } >

29 29 Asymptomatic period < { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 1)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV E)(HS 1)(RT 0)(WBC -1)(HCT 0)(PLT 1)(CD4P -1) (CD4A -2)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } { (EV E)(HS 1)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P 0) (CD4A 0)(LMPH 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } { (EV C)(HS 1)(RT 0)(onD 0000000000) } >

30 30 Summary Nine Steps of KDD –Identify goal –Identify target data set –Data cleaning and preprocessing –Data reduction and projection –Identify data mining method

31 31 Summary Nine Steps of KDD –Exploratory Analysis –Data Mining –Interpretation of Mined Patterns –Acting on Discovered Knowledge

32 32 Conclusions Objective Met with Contributions –Patterns discovered representing groups of patients with similar experience in course of disease –This perspective on the data has not previously been produced –This kind of computation on this kind of data has not previously been produced

33 33 Future Work Improve discovery algorithm –Backtracking is a barrier to overcome Improve search control Develop heuristic for measuring interestingness Add ability to identify clinically identical/similar patterns

34 34 Future Work Move database to new Intelligent Systems in Medicine and Biology Lab Bring database up to date Include more domain data in Event Sets Explore impact of new developments in HIV treatment


Download ppt "1 Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors."

Similar presentations


Ads by Google