A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search.

Slides:



Advertisements
Similar presentations
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Similarity Evaluation Techniques for Filtering Problems ? Vagan Terziyan University of Jyvaskyla
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Data Mining Classification: Alternative Techniques
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
01 -1 Lecture 01 Artificial Intelligence Topics –Introduction –Knowledge representation –Knowledge reasoning –Machine learning –Applications.
Chapter 2: Pattern Recognition
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Temporal Knowledge Acquisition From Multiple Experts by Helen Kaikova, Vagan Terziyan.
An Interval Approach to Discover Knowledge from Multiple Fuzzy Estimations Vagan Terziyan * & **, Seppo Puuronen **, Helen Kaikova * *Department of Artificial.
Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux.
AAECC’99 Honolulu November 1999 Flexible Arithmetic for Huge Numbers with Recursive Series of Operations Vagan Terziyan*, Alexey Tsymbal**, Seppo Puuronen**
A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
Mining Several Databases with an Ensemble of Classifiers Seppo Puuronen Vagan Terziyan Alexander Logvinovsky 10th International Conference and Workshop.
Dynamic Integration of Virtual Predictors Vagan Terziyan Information Technology Research Institute, University of Jyvaskyla, FINLAND
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Presented by Zeehasham Rasheed
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
A Similarity Evaluation Technique for Cooperative Problem Solving with a Group of Agents Seppo Puuronen, Vagan Terziyan Third International Workshop CIA-99.
Intelligent Web Applications (Part 1) Course Introduction Vagan Terziyan AI Department, Kharkov National University of Radioelectronics / MIT Department,
Machine Learning: Ensemble Methods
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
1 The Law of Semantic Balance brief introduction of the concept Vagan Terziyan University of Jyvaskyla, Finland
1 A Semantic Metanetwork Vagan Terziyan University of Jyvaskyla, Finland
Manchester, 2007 Adaptive learning paths for improving Lifelong Learning experiences Ana Elena Guerrero Roldán, Julià Minguillón Universitat Oberta de.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Module 04: Algorithms Topic 07: Instance-Based Learning
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presented by Tienwei Tsai July, 2005
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
Benk Erika Kelemen Zsolt
Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Chapter 4 Decision Support System & Artificial Intelligence.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Intelligent Control Methods Lecture 2: Artificial Intelligence Slovak University of Technology Faculty of Material Science and Technology in Trnava.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
k-Nearest neighbors and decision tree
Rule Induction for Classification Using
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
What is Pattern Recognition?
iSRD Spam Review Detection with Imbalanced Data Distributions
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Hubs and Authorities & Learning: Perceptrons
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Presentation transcript:

A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search 1-2 September, 1999 Florence (Italy)

Authors Department of Computer Science and Information Systems University of Jyvaskyla FINLAND Seppo Puuronen Vagan Terziyan Department of Artificial Intelligence Kharkov State Technical University of Radioelectronics, UKRAINE

Contents 4 The Research Problem and Goal 4 Basic Concepts 4 External Similarity Evaluation 4 Evaluation of Classifiers Competence 4 An Example 4 Internal Similarity Evaluation 4 Conclusions

The Research Problem During the past several years, in a variety of application domains, researchers in machine learning, computational learning theory, pattern recognition and statistics have tried to combine efforts to learn how to create and combine an ensemble of classifiers. The primary goal of combining several classifiers is to obtain a more accurate prediction than can be obtained from any single classifier alone.

Goal 4 The goal of this research is to develop simple similarity evaluation technique to be used for classification problem based on an ensemble of classifiers 4 Classification here is finding of an appropriate class among available ones for certain instance based on classifications produced by an ensemble of classifiers

Basic Concepts: Training Set (TS)  TS of an ensemble of classifiers is a quadruple: D is the set of instances D 1, D 2,..., D n to be classified; C is the set of classes C 1, C 2,..., C m, that are used to classify the instances; S is the set of classifiers S 1, S 2,..., S r, which select classes to classify the instances; P is the set of semantic predicates that define relationships between D, C, S

Basic Concepts: Semantic Predicate P

Problem 1: Deriving External Similarity Values Instances Classes Classifiers

External Similarity Values External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D. ESV are based on total support among all the classifiers for voting for the appropriate classification (or refusal to vote)

Problem 2: Deriving Internal Similarity Values Instances Classes Classifiers

Internal Similarity Values Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S. ISV are based on total support among all the classifiers for voting for the appropriate connection (or refusal to vote)

Why we Need Similarity Values (or Distance Measure) ? 4 Distance between instances is used by agents to recognize nearest neighbors for any classified instance 4 distance between classes is necessary to define the misclassification error during the learning phase 4 distance between classifiers is useful to evaluate weights of all classifiers to be able to integrate them by weighted voting

Deriving External Relation DC: How well class fits the instance Classifiers Instances Classes

Deriving External Relation SC: Measures Classifiers Competence in the Area of Classes 4 The value of the relation (S k,C j ) in a way represents the total support that the classifier S k obtains selecting (refusing to select) the class C j to classify all the instances.

Example of SC Relation Classifiers Instances Classes

Deriving External Relation SD: Measures “Competence” of Classifiers in the Area of Instances 4 The value of the relation (S k,D i ) represents the total support that the classifier S k receives selecting (or refusing to select) all the classes to classify the instance D i.

Example of SD Relation Instances Classes Classifiers

Standardizing External Relations to the Interval [0,1] n is the number of instances m is the number of classes r is the number of classifiers

Competence of a Classifier DiDi Conceptual pattern of features Conceptual pattern of class definition Instances Classes CjCj Classifier Competence in the Instance Area Competence in the Area of Classes

Classifier’s Evaluation: Competence Quality in an Instance Area - measure of the “classification abilities” of a classifier relatively to instances from the support point of view

Agent’s Evaluation: Competence Quality in the Area of Classes - measure of the “classification abilities” of a classifier in the correct use of classes from the support point of view

Quality Balance Theorem The evaluation of a classifier’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “real world of instances” or “conceptual world of classes” because both competence values are always equal

Proof...

An Example 4 Let us suppose that four classifiers have to classify three papers submitted to a conference with five conference topics 4 The classifiers should define their selection of appropriate conference topic for every paper 4 The final goal is to obtain a cooperative result of all the classifiers concerning the “paper - topic” relation

C (classes) Set in the Example Classes - Conference Papers Notation AI and Intelligent SystemsC 1 Analytical TechniqueC 2 Real-Time SystemsC 3 Virtual RealityC 4 Formal MethodsC 5

S ( classifiers ) Set in the Example Classifiers - “Referees” Notation A.B.S 1 H.R.S 2 M.L.S 3 R.S.S 4

D (instances) Set in the Example

Selections Made for the Instance “Paper 1” D 1 P(D,C,S) C 1 C 2 C 3 C 4 C 5 S S ** * -1 *** S S Classifier H.R. considers “Paper 1” to fit to topic Virtual Reality* and refuses to include it to Analytical Technique** or Formal Methods***. H.R. does not choose or refuse to choose the AI and Intelligent Systems + or Real-Time Systems ++ topics to classify “Paper 1”.

Selections Made for the Instance “Paper 2” D 2 PC 1 C 2 C 3 C 4 C 5 S S S S

Selections Made for the Instance “Paper 3” D 3 PC 1 C 2 C 3 C 4 C 5 S S S S

Result of Cooperative Paper Classification Based on DC Relation

Results of Classifiers’ Competence Evaluation ( based on SC and SD sets) … Proposals obtained from the classifier A.B. should be accepted if they concern topics Real-Time Systems and Virtual Reality or instances “Paper 1” and “Paper 3”, and these proposals should be rejected if they concern AI and Intelligent Systems or “Paper 2”. In some cases it seems to be possible to accept classification proposals from the classifier A.B. if they concern Analytical Technique and Formal Methods. All four classifiers are expected to give an acceptable proposals concerning “Paper 3” and only suggestion of the classifier M.L. can be accepted if it concerns “Paper 2”...

Deriving Internal Similarity Values Via one intermediate setVia two intermediate sets

Internal Similarity for Classifiers: Instance-Based Similarity Instances Classifiers

Internal Similarity for Classifiers: Class-Based Similarity Classes Classifiers

Internal Similarity for Classifiers : Class-Instance-Based Similarity Classifiers Classes Instances

Conclusion 4 Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported classification result and to evaluate the classifiers according to their competence 4 We also discussed relations between elements taken from the same set: instances, classes, or classifiers. This can be used, for example, to divide classifiers into groups of similar competence relatively to the instance-class environment