Issues with Data Mining

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Random Forest Predrag Radenković 3237/10
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Evaluation.
Ensemble Learning what is an ensemble? why use an ensemble?
Evaluation.
Ensemble Learning: An Introduction
Three kinds of learning
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
ICS 273A Intro Machine Learning
Machine Learning: Ensemble Methods
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Module 04: Algorithms Topic 07: Instance-Based Learning
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CLassification TESTING Testing classifier accuracy
CS 391L: Machine Learning: Ensembles
Chapter 9 – Classification and Regression Trees
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Mining and Decision Support
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
University of Waikato, New Zealand
Data Mining Practical Machine Learning Tools and Techniques
Data Science Algorithms: The Basic Methods
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Cost-Sensitive Learning
Figure 1.1 Rules for the contact lens data.
Data Mining Practical Machine Learning Tools and Techniques
Cost-Sensitive Learning
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
A task of induction to find patterns
CS 391L: Machine Learning: Ensembles
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Issues with Data Mining

Data Mining involves Generalization Data mining (Machine Learning) learns generalizations of the instances in training data E.g. a decision tree learnt from weather data captures generalizations about the prediction of values for Play attribute This means, generalizations predict (or describe) the behaviour of instances beyond the training data This in turn means, knowledge is extracted from raw data using data mining This knowledge drives end-user’s decision making process

Generalization as Search The process of generalization can be viewed as searching a space of all possible patterns or models For a pattern that fits the data This view provides a standard framework for understanding all data mining techniques E.g decision tree learning involves searching through all possible decision trees Lecture 4 shows two example decision trees that fit the weather data One of them is a better generalization than the other (Example 2)

Bias Important choices made in a data mining system are representation language– the language chosen to represent the patterns or models, search method – the order in which the space is searched model pruning method– the way overfitting to the training data is avoided This means, each data mining scheme involves Language bias Search bias Overfitting-avoidance bias

Language Bias Different languages used for representing patterns and models E.g. rules and decision trees A concept fits a subset of training data That subset can be described as a disjunction of rules E.g classifier for the weather data can be represented as a disjunction of rules Languages differ in their ability to represent patterns and models This means, when a language with lower representation ability is used, the data mining system may not achieve good performance Domain knowledge (external to training data) helps to cut down search space

Search Bias An exhaustive search over the search space is computationally expensive Search is speeded up by using heuristics Pure children nodes indicate good tree stumps in decision tree learning By definition heuristics cannot guarantee optimum patterns or models Using information gain may mislead us to select a suboptimal attribute at the root Complex search strategies possible Those that pursue several alternatives parallelly Those that allow backtracking A high-level search bias General-to-specific: start with a root node and grow the decision tree to fit the specific data Specific-to-general: choose specific examples in each class and then generalize the class by including k-nearest neighbour examples

Overfitting-avoidance bias We want to search for ‘best’ patterns and models Simple models are the best Two strategies Start with the simplest model and stop building model when it starts to become complex Start with a complex model and prune it to make it simpler Each strategy biases search in a different way Biases are unavoidable in practice Each data mining scheme might involve a configuration of biases These biases may serve some problems well There is no universal best learning scheme! We saw this in our practicals with Weka

Combining Multiple Models Because there is no ideal data mining scheme, it is useful to combine multiple models Idea of democracy – decisions made based on collective wisdom Each data mining scheme acts like an expert using its knowledge to make decisions Three general approaches Bagging Boosting Stacking Bagging and boosting both follow the same approach Take a vote on the class prediction from all the different schemes Bagging uses a simple average of votes while boosting uses a weighted average Boosting gives more weight to more knowledgeable experts Boosting is generally considered the most effective

Bias-Variance Decomposition Assume Infinite training data sets of the same size, n Infinite number of classifiers trained on the above data sets For any learning scheme Bias = expected error of the classifier even after increasing training data infinitely Variance = expected error due to the particular training set used Total expected error = bias + variance Combining multiple classifiers decreases the expected error by reducing the variance component

Bagging Bagging stands for bootstrap aggregating Combines equally weighted predictions from multiple models Bagging exploits instability in learning schemes Instability – small change in training data results in big change in model Idealized version for classifier Collect several independent training sets Build a classifier from each training set E.g learn a decision tree from each training set The class of a test instance is the prediction that received most votes from all the classifiers Practically it is not feasible to obtain several independent training sets

Bagging Algorithm Involves two stages Model Generation Classification Let n be the number of instances in the training data For each of t iterations Sample n instances with replacement from training data Apply the learning algorithm to the sample Store the resulting model Classification For each of the t models: Predict class of instance using model Return class that has been predicted most often

Boosting Multiple data mining methods might complement each other Each method performing well on a subset of data Boosting combines complementing models Using weighted voting Boosting is iterative Each new model is built to overcome the deficiencies in the earlier models Several variants of boosting AdaBoost.M1 – based on the idea of giving weights to instances Boosting involves two stages Model generation Classification

Boosting Model generation Classification Assign equal weight to each training instance For each of t iterations: Apply learning algorithm to weighted dataset and store resulting model Compute error e of model on weighted dataset and store error If e=0 or e>=0.5 Terminate model generation For each instance in dataset: If instance classified correctly by model: Multiply weight of instance by e/(1-e) Normalize weight of all instances Classification Assign weight of zero to all classes For each of the t (or less) models: Add –log(e/(1-e)) to weight of class predicted by model Return class with highest weight

Stacking Bagging and boosting combine models of the same type E.g. a set of decision trees Stacking is applied to models of different types Because voting may not work when different models do not perform comparably well, Voting is problematic when two out of three classifiers perform poorly Stacking uses a metalearner to combine different base learners Base learners: level-0 models Meta learner: level-1 model Predictions of base learners fed as inputs to the meta learner Base learner predictions on training data cannot be input to meta learner Instead use cross-validation results on base learner Because classification is done by base learners, meta learners use simple learning schemes

Combining models using Weka Weka offers methods to perform bagging, boosting and stacking over classifiers In the Explorer, under the classify tab, expand the ‘meta’ section of the hierarchical menu AdaboostM1 (one of the boosting methods) on Iris data classifies only 7 out of 150 incorrectly You are encouraged to try these methods on your own