Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach.

Slides:

Advertisements

Similar presentations

DECISION TREES. Decision trees  One possible representation for hypotheses.

Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

On-line learning and Boosting

Unsupervised Learning

Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.

A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Review of : Yoav Freund, and Robert E

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Decision Tree Rong Jin. Determine Milage Per Gallon.

Motion Analysis (contd.) Slides are from RPI Registration Class.

Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.

Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

Intelligible Models for Classification and Regression

For Better Accuracy Eick: Ensemble Learning

Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.

Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Benk Erika Kelemen Zsolt

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.

De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.

1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Logistic Regression Saed Sayad 1www.ismartsoft.com.

1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Ensemble Methods for Machine Learning. COMBINING CLASSIFIERS: ENSEMBLE APPROACHES.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Statistics 202: Statistical Aspects of Data Mining

An Empirical Comparison of Supervised Learning Algorithms

ECE 3301 General Electrical Engineering

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007

CIKM Competition 2014 Second Place Solution

Linear Discriminators

Adaboost Team G Youngmin Jun

Data Mining Practical Machine Learning Tools and Techniques

Q4 : How does Netflix recommend movies?

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Instructor :Dr. Aamer Iqbal Bhatti

Support Vector Machine _ 2 (SVM)

Data Mining Ensembles Last modified 1/9/19.

Predicting Complex Models for Industrial Processes

Presentation transcript:

Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach

What is it all about? The purpose of this challenge is to somehow combine the individual models to give the best overall model performance. We call it Ensembling.

What do we mean by Ensembling? Ensembling, Blending, Committee of Experts are various terms used for the process of improving predictive accuracy by combining models built with different algorithms, or the same algorithm but with different parameter settings. It is a technique frequently used to win predictive modelling competitions, but how it is actually achieved in practice maybe somewhat arbitrary.

Why Ensembling? Remember NETFLIX prize? Over 1,000 sets of predictions have been provided Taking the mean prediction over all these models is only slightly worse than the best individual model. The mean of the best 10 models is significantly better than any individual model.

That’s why we are after Ensembling!

Data (1/2) RMSE Small sets of predictions for 15,000 ratings. AUC Small sets of prediction for 15,000 ratings. RMSE Medium sets of predictions for 20,000 ratings. AUC Medium sets of predictions for 20,000 ratings. RMSE Large - 1,151 sets of predictions for 50,000 ratings. AUC Large - 1,151 sets of predictions for 50,000 ratings.

Data (2/2) the predicted ratings values have been converted to integers by rounding to 3 decimal places and multiplying by 1, <Prediction<5000 Targets = { 1000,2000,3000,4000,5000} for RMSE challenge Targets={-1,1} for AUC challenge Each of data sets is split into 2 files, one for Training (Target/Rating provided) and one for Scoring (Target/Rating withheld)

Our First Approach Weighted Averaging (1/3) A: 200 models, movies ratings predictions Target: movies real ratings How to find such weights? Our approach is to find vector w such as:

Weighted Averaging (2/3) A is not square, so we must find pseudo-Inverse of A. It’s easy in M ATLAB. w= A\target. W is Least-Square solution of previous equation. Formal mathematical problem:

Weighted Averaging (3/3) The result is above the baselines, good for us! RMSE = The problem is, it’s so overfitted to Train data set. Also We don’t use any information of Test data set. Just multiply w to test matric and get the results.

Our Second Approach Ensemble Selection(1/3) Implemented from this paper: Ensemble Selection from Libraries of Models [R.Caruana, A.Niculesco-Mizil, Proceedings of ICML’04] Winner team of KDD Orange cup also used this method.(IBM team) Just like weighted averaging but they find weights by hill climbing search. Search for models which improve RMSE.

Ensemble Selection(2/3) ensemble selection procedure 1.Start with the empty ensemble. Initialized with N best models ( N ~5-25) 2.Select a ranodm Bag of models in library. Add to the ensemble the model in the Bag that maximizes the ensemble’s performance to the error metric on a hillclimb (validation) set. 3. Repeat Step 2 for a fixed number of iterations or Until all the models have been used. 4.Return the ensemble from the nested set of ensembles that has maximum performance on the hillclimb (validation) set.

Ensemble Selection(3/3) It’s a fast search but we have better searchs than just simple hill climbing! RMSE slightly improved. RMSE = Author argued their method have better performance than many methods such as : SVM, ANN, BAG-DT, KNN,BST-DT. We compare Ensemble Selection and ANN in our problem. They were right!

Some Statistics about Data set

How much good are these 200 models? Predicting 1000s

How much good are these 200 models? Predicting 2000s

How much good are these 200 models? Predicting 3000s

How much good are these 200 models? Predicting 4000s

How much good are these 200 models? Predicting 5000s

Noise: difference from target

Some other Ideas Discritize predictions, then find frequent motifs of each set of movies ranked 1000 to Using GA to search for better weights Using Estimation theory. (noises are gaussian or semi gaussin ) Using some metrics for each row of test data set to determine its distance to some selected rows of data sets. Metrics could be RMSE,KL divergence, cos( θ ),…

Green Ensemble E.Khoddam Mohammadi M.J.Mahzoon A.Askari A.Ghaffari Nejad

Thanks for your attention.