Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou

Similar presentations

Presentation on theme: "An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou"— Presentation transcript:

1 An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou RENNES - France

2 The application Cardiac arrhytmias P wave Normal QRS complex …ok, but how can a specific arrhythmia be automatically characterized ? Abnormal QRS complex Electrocardiograms Abnormal rhythm Normal rhythm M. Rabbit, you suffer from bigeminy, a severe cardiac arrhythmia M. Dog, you are ok !...

3 A problem definition close to supervised machine learning Discretized and labeled electrocardiograms p,Q,p,q,p,Q,p,Q p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q N..ok, which patterns are frequent in the sequence labeled but not frequent in sequences labeled ? P N p,Q,p,q,p,Q,p,Q p,q,p,q,p,q,p,q p,Q,p,p,Q,p,p,Q Temporal patterns representative of the sequence P P Frequent temporal patterns

4 Formalization of the problem The framework of inductive database (IDB) Sequences {L bigeminy } 2 P {L lbb, L mobitz, L normal } 2 N Temporal patterns … An IDB …ok, which temporal patterns C satisfy Qu expert (P,N,T,C) = (9 L 2 P, freq(C,L) ¸ T L ) Æ ( 8 L 2 N, freq(C,L) < T L ) ? Sequences {L bigeminy } 2 P {L lbb, L mobitz, L normal } 2 N Temporal patterns {C|freq(C, L bigeminy )¸ T 0 }, {C|freq(C, L lbbb )¸ T 1 }, {C|freq(C, L mobitz )¸ T 2 }, {C|freq(C, L normal )¸ T 3 }, {C| Qu expert (P,N,T,C) }

5 Plan Introduction Problem features Sequences Chronicles Inductive databases Order relation What is frequency ? Algorithms Frequent Minimal Chronicles Search (Fmc Search) Querying the IDB Experiments and problems to be solved Conclusion and future work

6 Long sequences of time-stamped events with few types Numerical temporal information of major importance An example of an event sequence: Features of sequences ABA B ABBAA B … Events time

7 Features of temporal patterns Chronicles Chronicle: a set of events temporally constrained May contain several events of the same type Specifies numerical temporal constraint between events: the uncertain delay represented by an interval [d min,d max ] d min,d max 2 Z Is easily readable by an expert of the application domain C, t 0 A, t 1 B, t 2 [5;10] [-2;20] C,1B,5A,8B,10C,15C,16A,26B,34A,27 Instances I C (L) Event sequence: an ordered list of time-stamped events C:

8 Inductive database Required definitions A query language that makes use of frequency constraints freq(C,L) ¸ T and freq(C,L) · T If a query on frequency satisfies monotonicity or anti-monotonicity properties then a search based on frequency is easier to compute An order relation on chronicles must be defined

9 An order relation on chronicles C is more general than C (C v C ), each event of C can be matched to an event of C each temporal constraint of C is more general than the corresponding constraint in C' A,t 2 B,t 3 B,t 4 [5;10] [9;20] A,t 0 B,t 1 [8;21] v C C

10 How to compute the frequency of a chronicle in a sequence ? B B AABB I C (L) L: ABB [2,3][-1,3] [1,5] C: BABBAB The cardinal of the set of All the instances ? Minimal occurences ? [Mannila,97] Earliest distinct instances ? [Dousson, 99] Distinct instances ?

11 Monotonicity and anti-monotonicity properties Constraints on frequency should satisfy monotonicity or anti-monotonicity properties A ABB A B [-2,2] A freq(C,L) ¸ freq(C,L) L: C: 2 instances 3 instances · Minimal occurences [Mannila, 97] don t have monotonicity and anti-monotonicity properties

12 Recognition criterion Let I C (L) be the set of instances of the chronicle C in the sequence L A recognition criterion selects a unique set E of instances from I C (L) The frequency of the chronicle C in the sequence L according to the recognition criterion Q is freq Q (C,L) = |E| A monotonic criterion is a recognition criterion Q such that C v C ) freq Q (C,L) ¸ freq Q (C,L)

13 Fmc Search Fmc Search: Frequent Minimal Chronicles Search freq Q (C,L) ¸ T, max win (C) · W Input: L: An event sequence T: A minimum frequency threshold Q:A recognition criterion (application dependent) W: A maximal time window Output: Fmc Q,W (L,T) Every chronicle from Fmc Q,W (L,T) satisfies 3 properties: is as specific as possible generalizes at least T instances… …that respect the recognition criterion Q Algorithm: Step 1Step 2Step 3 Step 4 x x x x x Fmc Q,W (L,T)

14 Fmc Search Step 1: Chronicle instance extraction The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W] Implemented in the software FACE (Frequency Analyser for Chronicle Extraction) x x x x x Fmc Q,W (L,T) The numerical temporal constraints of chronicles found by FACE are not specific enough

15 Fmc Search Step 2: Fuzzy clustering of instances A fuzzy clustering of each set I C (L) found at step 1 is performed B B AABB AB BB x x x x I C (L) An instance has a membership degree to each cluster x x x x x Fmc Q,W (L,T)

16 Fmc Search Step 3: Chronicle construction from clusters For each fuzzy cluster of step 2: Instances are sorted in the decreasing order of their membership degree The T first instances that respect the Q criterion are kept to construct a chronicle This chronicle is the lgg (least general generalization) of the selected instances x x x x x Fmc Q,W (L,T) The specificity of chronicles depends on the clustering

17 Fmc Search Step 4: Chronicle filtering - keep the most specific Compute the set of frequent minimal (maximally specific) chronicles Fmc Q,W (L,T) The most specific chronicles are retained Monotonicity property: A chronicle C that satisfies freq Q (C,L) ¸ T is more general than at least one chronicle of Fmc Q,W (L,T) x x x x x Fmc Q,W (L,T)

18 Querying the IDB A chronicle C satisfies this query iff: C is more general than at least one chronicle of Fmc Q,W (L P,T P ) monotonicity property C is not more general than every chronicle of Fmc Q,W (L N,T N ) anti-monotonicity property Version space ? T An adaptation of Mitchells algorithm computes this version space Remember my query: Qu expert (P,N,T,C). For the explanation P = {L P } and N = {L N } freq(C,L P )¸T P Æ freq(C,L N ) < T N

19 Experiments Characterization of cardiac arrhythmias Data: 4 sequences of cardiac events elaborated from electrocardiograms labeled by an expert containing ~4000 events of 3 types (P waves, normal QRS complexes, abnormal QRS complexes) A typical query : freq Q d (C,L bigeminy ) ¸ 5% Æ freq Q d (C,L normal ) · 10% Æ freq Q d (C,L mobitz ) · 10% Æ freq Q d (C,L lbbb ) · 10% Æ W = 3 s

20 Experiments An example of cardiac chronicle Characterizes bigeminy arrhythmia Also found by a supervised learning method (ILP) from ECGs [Carrault, 03] p = P waves q = normal QRS Q = abnormal QRS

21 Problems to be solved The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence This slows down Mitchell's algorithm dramatically Finding Fmc is an NP-complete problem The set Fmc Q,W (L N,T N ) is correct but not complete Results have to be filtered in order to give the correct solution ? T Optimal Fmc Q,W (L N,T N ) Practical Fmc Q,W (L N,T N )

22 Conclusion An original method to extract temporal patterns in the form of chronicles Chronicles express constraints on time by numerical intervals A formalization of the problem in the framework of inductive database which provides the definition of: An order relation on temporal patterns A monotonic recognition criterion and the related frequency A management of numerical temporal constraints (this task is very hard) An algorithm that finds Fmc in sequences A method to reuse and adapt Mitchells algorithm

23 Future work Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchells algorithm Adapt Mitchells algorithm in order to provide an approximate solution whose quality is user-defined Extend the method to other measures of interest Explore new applications intrusion detection

24 in Event Sequences An IDB for Mining Temporal Patterns Alexandre Vautier, Marie-Odile Cordier, and René Quiniou RENNES - France

25 Maximum size of I C (L) as a function of the number of events in a chronicle

Download ppt "An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou"

Similar presentations

Ads by Google