Presentation is loading. Please wait.

Presentation is loading. Please wait.

Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) Prague Sept. 04.

Similar presentations


Presentation on theme: "Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) Prague Sept. 04."— Presentation transcript:

1 Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) zighed@univ-lyon2.fr Prague Sept. 04

2 About Computer science dep. In Lyon, there are 3 universities, 100000 students Lumière university Lyon 2, has 22000 students, Lyon 2, is mainly a liberal art university The faculty of economic has tree departments, among them the computer science one We belong to this department We have Bachelor, Master and PhD programs for 300 students

3 ERIC Lab at the University EconomicSociologyLinguisticLaw Faculties of university of Lyon 2 ERIC Research centers of the university Knowledge Engineering Research Center - The budget of ERIC doesn’t depend from the university, it’s given par The national ministry of education - We have a large autonomy in decision making

4 ERIC Lab Born in 1995, 11 professors (N. Nicoloyannis, director) 15 PhD Students Grants+contracts+WK+…=200K€/year Research topics –Data mining (theory, tools and applications) –Data warehouse management (T,T,A)

5 Data Mining (T,T,A) Theory –Induction graphs –Learning and classification Tools –SIPINA : Plate form for data mining Applications –Medical fields –Chemical applications –Human science –… Data mining TTA for complex data

6 Data mining on complex data An example : Breast cancer diagnosis

7 Motivations Contingency table Association measure : It measures the strength of the relationship between X and Y

8 Motivations Contingency table Association measure : It measures the strength of the relationship between X and Y

9 Motivations Contingency table Association measure : It measures the strength of the relationship between X and Y

10 Motivations Contingency table Association measure : It measures the strength of the relationship between X and Y According to a specific association measure, may we improve the strength of the relationship by merging some rows and/or some columns ?

11 Motivations Contingency table Association measure : It measures the strength of the relationship between X and Y According to a specific association measure, may we improve the strength of the relation ship by merging some rows and/or some columns ?

12 An example

13 Goal: Find the groupings that maximize the association between attributes Yes, we can improve the association by reducing the size of the contingency table For the preceding example the maximization of the Tschuprow’s t gives

14 Extension Contingency table According to a specific association measure, may we find the optimal reduced contingency table ?

15 Optimal solution (exhaustive search) Goal : Find the best cross partition on T

16 Optimal solution (exhaustive search)

17 According to a specific association measure, may we find the optimal reduced contingency table ? Yes, but the solution is intractable in real word because of the high time complexity

18 Heuristic Proceed successively to the grouping of 2 (row or column) values that maximizes the increase in the association criteria.

19 Complexity

20 Simulation Goal : How far is the quasi-optimal solution from the true optimum? Comparison tractable for tables not greater than 6 × 6. Simulation Design Randomly generate 200 tables Analysis of the distribution of the deviations between optima and quasi-optima. Generating the Tables 10000 cases distributed in the cxr cells of the table with an uniform distribution (worst case).

21 Quasi-optimal solution

22

23 Conclusion Implementation for new approach induction decision tree. –Zighed, D.A., Ritschard, G., W. Erray and V.-M. Scuturici (2003), Abogodaï,a New approach for Decision Trees, in Lavrac, N., D. Gamberger, L. Todorovski and H. Blockeel (eds), Knowledge Discovery in databases: PKDD 2003, LNAI 2838, Berlin: Springer, 495--506. –Zighed D. A., Ritschard G., Erray W., Scuturici V.-M. (2003), Decision tree with optimal join partitioning, To appear in Journal of Information Intelligent Systems, Kluwer (2004). Divisive top-down approach Extension to multidimensionnal case


Download ppt "Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) Prague Sept. 04."

Similar presentations


Ads by Google