Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou.

Similar presentations


Presentation on theme: "Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou."— Presentation transcript:

1 Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou Université de Rennes 1 INRIA Rennes - Bretagne Atlantique

2 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 2 Usual KD Process User needs: A data mining task Domain knowledge Data Selection Preprocessing Transformation Data Mining Interpretation/ Evaluation Models Transformed Data Preprocessed Data Target Data Knowledge

3 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 3 Usual KD Process User needs: A data mining task Domain knowledge Data Selection Preprocessing Transformation Data Mining Interpretation/ Evaluation Models Transformed Data Preprocessed Data Target Data Knowledge What can a user extract from data without domain knowledge ?

4 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 4 Represent network alarms Understand network behavior Detect new DDoS attacks An alarm is composed of –A directed link between two IP addresses –A date –A severity (low,med,high) (related to the link rate) Application context Network Alarms

5 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 5 Represent network alarms Understand network behavior Detect new DDoS attacks An alarm is composed of –A directed link between two IP addresses –A date –A severity (low,med,high) (related to the link rate) Application context Network Alarms

6 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 6 Application context Network Alarms Data Mining Algorithms Alarms Models Generalized links: M 1 = { ! *, * ! ,…} Sequences M 2 = {1.5.5.* ! * > * ! ,…} Clustering on date and severity M 3 = {{ 11/01/05…11/03/05, low}, { 11/07/05…11/15/05, high}}

7 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 7 Objectives Goal : search models that fit the given data –Current assumption: the user has sufficient knowledge to define the type of model choose the relevant DM algorithm –Our proposition: alleviate the current assumption by executing automatically DM algorithms to extract models from data evaluating the resulting models in a generic manner to propose to the user the best suited model(s)

8 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 8 Framework DM algorithm specifications Data Specification Unification of specifications Model extraction Generic evaluation Model ranking

9 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 9 Schemas for specification Enhanced algebraic specifications (Types, operations and equations) Category theory [Mac Lane 1942] –Sketch [Ehresmann 1965] Use specification inheritance

10 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 10 Data specification Network Alarm Schema Node: a type Edge: –A function –A relation Green dotted edge: projection ) Cartesian product Red dashed edge: inclusion ) union

11 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 11 Data specification Network Alarm Schema Node: a type Edge: –A function –A relation Green dotted edge: projection ) Cartesian product Red dashed edge: inclusion ) union

12 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 12 DM Algorithm specification Generalized edges

13 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 13 DM Algorithm specification Generalized edges Covering relation Model type DM algorithm

14 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 14 ? Schema unification

15 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 15 ? Schema unification Abstract Data Type

16 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 16 ? Unification of Schema Abstract Data Type

17 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 17 Framework DM algorithm specifications Data Specification Unification of specifications Model extraction Generic evaluation Model ranking

18 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 18 Generic evaluation Compare different kinds of model Inspired by Kolmogorov complexity The complexity of an object x is the size s(p) of the shortest program p that outputs x executed on a universal machine f C f (x) = min { s(p) | f(p) = x }

19 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 19 Generic evaluation Complexity of data d in a schema S relatively to a model m (c: M $ D) : complexity of K(d,m,S) = k(M) the model structure +k(D)the data structure +k(c) the covering relation +k(m|M)the model + k(d|m,c,D)the data knowing …

20 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 20 Path Indexing Covering Relation Decomposition m c: M $ D MD c(m) d k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D) Null Decomposition

21 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 21 Path Indexing Covering Relation Decomposition m c: M $ D MD c(m) d k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D) m t: M $ A MA D d s: A $ D c = s ± t: M $ D Null Decomposition Decomposition relying on relation composition t(m) c(m) = s ± t(m)

22 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 22 Path Indexing Covering Relation Decomposition m c: M $ D MD c(m) d k(d|m,c,D) = k(d|c(m)) + k(d\c(m)|D) m t: M $ A MA D d s: A $ D c = s ± t: M $ D Null Decomposition Decomposition relying on relation composition t(m) c(m) = s ± t(m) k(d|m, s ± t,D) = k(a|t(m)) + k(d|s(a)) + k(d\s(a)|D) a s(a)

23 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 23 Experiments Extraction of clusters, generalized edges, and sequences –Dataset: alarms –Duration: 400 seconds (without DM algorithm duration) –6 operational algorithms Experiments on datasets generated by models Network alarm from real network

24 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 24 Discussions Unification : –Exponential in time with respect to the number of nodes in a schema Generic evaluation –Linear in time and space Adapt the evaluation method –User defined –According to a model visualization –According to local data instead of global data

25 Vautier et al. – Towards Data Mining Without Information on Knowledge Structure 25 What do schemas bring to Data Mining ? Describe data and DM algorithms with a common language Allow to unify data structure with DM algorithms input Provide a way to compute the model complexity relatively to a type in a schema Provide a way to compute the data complexity relatively to –A model –A covering relation and its decomposition Are implementable in an efficient manner

26 Towards Data Mining Without Information on Knowledge Structure Thank you ! Alexandre Vautier, Marie-Odile Cordier and René Quiniou INRIA Rennes - Bretagne Atlantique Université de Rennes 1


Download ppt "Towards Data Mining Without Information on Knowledge Structure Wednesday, September 19 th 2007 Alexandre Vautier, Marie-Odile Cordier and René Quiniou."

Similar presentations


Ads by Google