Presentation is loading. Please wait.

Presentation is loading. Please wait.

20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo.

Similar presentations


Presentation on theme: "20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo."— Presentation transcript:

1 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo Puuronen,JYU Dr. Alexey Tsymbal,TCD Prof. Tommi Kärkkäinen,JYU Prof. Ryszard Michalski,GMU Prof. Peter Kokol, UM Dr. Kari Torkkola,Motorola Labs Supervisors: Reviewers: Opponent: JYU, Agora Building, Auditorium 2 December 20, :00 Mykola Pechenizkiy Feature Extraction for Supervised Learning in Knowledge Discovery Systems Public examination of dissertation

2 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 2 Outline  DM and KDD background –KDD as a process –DM strategy  Classification –Curse of dimensionality and indirectly relevant features –Feature extraction (FE) as dimensionality reduction  Feature Extraction for Classification –Conventional Principal Component Analysis –Class-conditional FE: parametric and non-parametric  Research Questions  Research Methods  Contributions

3 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 3 Knowledge discovery as a process Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1997.

4 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 4 CLASSIFICATION New instance to be classified Class Membership of the new instance J classes, n training observations, p features Given n training instances (x i, y i ) where x i are values of attributes and y is class Goal: given new x 0, predict class y 0 Training Set The task of classification Examples: - diagnosis of thyroid diseases; - heart attack prediction, etc.

5 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 5 Improvement of Representation Space  Curse of dimensionality  drastic increase in computational complexity and classification error with data having a large number of dimensions  Indirectly relevant features

6 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 6 FE example “Heart Disease” 0.1·Age-0.6·Sex-0.73·RestBP-0.33·MaxHeartRate -0.01·Age+0.78·Sex-0.42·RestBP-0.47·MaxHeartRate -0.7·Age+0.1·Sex-0.43·RestBP+0.57·MaxHeartRate 100% Variance covered 87% 60% 67%

7 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 7 Extracted features Original features How to construct good RS for SL? RQ4: Which features – original, extracted or both – are useful for SL? RQ1 – How important is to use class information in the FE process? RQ2 – Is FE data oriented or SL oriented or both? RQ5 – How many extracted features are useful for SL? RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity? RQ7 – What is the effect of sample reduction on the performance of FE for SL? RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier?

8 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 8 Research Problem  Studying both theoretical background and practical aspects of FE for SL in KDSs Main Contribution  Many-sided analysis of the research problem  Ensemble of relatively small contributions Research Method  A multimethodological approach to the construction of an artefact for DM (following Nunamaker et al., ) DM Artifact Development Experimentation Theory Building Observation

9 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 9 Further Research How to help in decision making on the selection of the appropriate DM strategy for a problem at consideration? When FE is useful for SL? What is the effect of FE on interpret- ability of results and transparency of SL?

10 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 10 Additional Slides …  Further Slides for Step-by-Step Analysis of Research Questions and Corresponding Contributions

11 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 11 Research Questions: RQ1 – How important is to use class information in the FE process? RQ2 – Is FE a data- or hypothesis-driven constructive induction? RQ3 – Is FE for dynamic integration of base-level classifiers useful in a similar way as for a single base-level classifier? RQ4 – Which features – original, extracted or both – are useful for SL? RQ5 – How many extracted features are useful for SL?

12 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 12 Research Questions (cont.): RQ6 – How to cope with the presence of contextual features in data, and data heterogeneity? RQ7 – What is the effect of sample reduction on the performance of FE for SL? RQ8 – When FE is useful for SL? RQ9 – What is the effect of FE on interpretability of results and transparency of SL? RQ10 – How to make a decision about the selection of the appropriate DM strategy for a problem at consideration?

13 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 13 RQ1: U se of class information in FE  Tsymbal A., Puuronen S., Pechenizkiy M., Baumgarten M., Patterson D Eigenvector- based Feature Extraction for Classification ( Article I, FLAIRS’02) Use of class information in FE process is crucial for many datasets: Class-conditional FE can result in better classification accuracy while solely variance-based FE has no effect on or deteriorates the accuracy. No superior technique, but nonparametric approaches are more stables to various dataset characteristics

14 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 14 RQ2: Is FE a data- or hypothesis-driven CI?  Pechenizkiy M Impact of the Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 ( Article III, AI’05) Search for the most appropriate FE technique FE process Trans- formed train set Train set Search for the most appropriate SL technique FE model SL process SL model Test set Prediction Search for the most appropriate FE technique FE process Trans- formed Train set Search for the most appropriate SL technique FE model SL process SL model Test set Prediction Ranking of different FE techniques according to the corresponding accuracy results of a SL technique can vary a lot for different datasets. Different FE techniques behave also in a different way when integrated with different SL techniques. Selection of FE method is not independent from the selection of classifier

15 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 15 RQ3: FE for Dynamic Integration of Classifiers ( Article VIII, Pechenizkiy et al., 2005)

16 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 16 RQ4: How to construct good RS for SL?  Pechenizkiy M., Tsymbal A., Puuronen S PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004) Combination of original features with extracted features can be beneficial for SL with many datasets, especially when tree-based inducers like C4.5 are used for classification. Which features – original, extracted or both – are useful for SL?

17 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 17 RQ4: How to construct good RS for SL? (cont.)  Pechenizkiy M., Tsymbal A., Puuronen S On Combining Principal Components with Parametric LDA-based Feature Extraction for Supervised Learning. (Article III, FCDS)

18 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 18 RQ5: How many extracted features are useful?  Criteria for selecting the most useful transformed features are often based on variance accounted by the features to be selected  all the components, the corresponding eigenvalues of which are significantly greater than one  a ranking procedure: select principal components that have the highest correlations with the class attribute  Pechenizkiy M., Tsymbal A., Puuronen S PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004)

19 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 19 RQ6: How to cope with data heterogeneity?  Pechenizkiy M., Tsymbal A., Puuronen S Supervised Learning and Local Dimensionality Reduction within Natural Clusters: Biomedical Data Analysis, (T-ITB, "Mining Biomedical Data“) Training Data Test Data SL Classifier C1C1 SL DR Natural Clustering Accuracy Cluster 1 Cluster 2 Cluster n SL C2C2 CnCn C1C1 C2C2 CnCn DR Accuracy

20 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 20 RQ7: What is the effect of sample reduction?  Pechenizkiy M., Puuronen S., Tsymbal A The Impact of Sample Reduction on PCA- based Feature Extraction for Naïve Bayes Classification. ( Article V, ACM SAC’06: DM Track)

21 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 21 RQ8: When FE is useful for SL?  Kaiser-Meyer-Olkin (KMO) criterion: accounts total and partial correlation IF KMO > 0.5 THEN Apply PCA General recommendation: Rarely works in the context of SL

22 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 22 RQ9: What is the effect of FE on interpretability?  Pechenizkiy M., Tsymbal A., Puuronen S PCA-based feature transformation for classification: issues in medical diagnostics, ( Article II, CBMS’2004)  Interpretability refers to whether a classifier is easy to understand. – rule-based classifiers like a decision tree and association rules are very easy to interpret, – neural networks and other connectionist and “black-box” classifiers have low interpretability. FE enables: New concepts – new understanding Information summary from a large number of features into a limited number of components The transformation formulae provide information about the importance of the original features Better RS – better neighbourhood – better interpretability by analogy with similar medical cases Visual analysis projecting data onto 2D or 3D plots.

23 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 23 RQ9: Feature Extraction & Interpretability (cont. )  The assessment of interpretability relies on the user’s perception of the classifier  The assessment of an algorithm’s practicality depends much on a user’s background, preferences and priorities.  Most of the characteristics related to practicality can be described only by reporting users’ subjective evaluations.  Thus, –the interpretability issues are disputable and difficult to evaluate, –many conclusions on interpretability are relative and subjective.  Collaboration between DM researchers and domain experts is needed for further analysis of interpretability issues Objectivity of interpretability  Pechenizkiy M., Tsymbal A., Puuronen S PCA-based feature transformation for classification: issues in medical diagnostics, (Article II, CBMS’2004)

24 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 24 RQ10: Framework for DM Strategy Selection  Pechenizkiy M DM strategy selection via empirical and constructive induction. ( Article IX, DBA’05)

25 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 25 Additional Slides …

26 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 26 Meta-Learning

27 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 27 New Research Framework for DM Research

28 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 28 New Research Framework for DM Research … following Hevner et al. framework

29 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 29 Some Multidisciplinary Research  Pechenizkiy M., Puuronen S., Tsymbal A Why Data Mining Does Not Contribute to Business? In: C.Soares et al. (Eds.), Proc. of Data Mining for Business Workshop, DMBiz (ECML/PKDD’05), Porto, Portugal, pp  Pechenizkiy M., Puuronen S., Tsymbal A Competitive advantage from Data Mining: Lessons learnt in the Information Systems field. In: IEEE Workshop Proc. of DEXA’05, 1st Int. Workshop on Philosophies and Methodologies for Knowledge Discovery PMKD’05, IEEE CS Press, pp (Invited paper).  Pechenizkiy M., Puuronen S., Tsymbal A Does the relevance of data mining research matter? (resubmitted as a book chapter to) Foundations of Data Mining, Springer.  Pechenizkiy M., Tsymbal A., Puuronen S Knowledge Management Challenges in Knowledge Discovery Systems. In: IEEE Workshop Proc. of DEXA’05, 6th Int. Workshop on Theory and Applications of KM, TAKMA’05, IEEE CS Press, pp

30 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 30 Some Applications:  Pechenizkiy M., Tsymbal A., Puuronen S., Shifrin M., Alexandrova I Knowledge Discovery from Microbiology Data: Many-sided Analysis of Antibiotic Resistance in Nosocomial Infections. In: K.D. Althoff et al. (Eds) Post-Conference Proc. of 3rd Conf. on Professional Knowledge Management: Experiences and Visions, LNAI 3782, Springer Verlag, pp  Pechenizkiy M., Tsymbal A., Puuronen S Supervised Learning and Local Dimensionality Reduction within Natural Clusters: Biomedical Data Analysis, (T-ITB, "Mining Biomedical Data“)  Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S Dynamic Integration of Classifiers for Handling Concept Drift. (submitted to Special Issue on Application of Ensembles, Information Fusion, Elsevier)

31 /12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 31 Contact Info Mykola Pechenizkiy Department of Computer Science and Information Systems, University of Jyväskylä, FINLAND Tel Mobile: Fax: THANK YOU! MS Power Point slides of recent talks and full texts of selected publications are available online at:


Download ppt "20.12.05/12:00 Agora, Aud 2 Public examination of PhD thesis: “Feature Extraction for Supervised Learning in Knowledge Discovery Systems” 1 Prof. Seppo."

Similar presentations


Ads by Google