Konstantina Christakopoulou Liang Zeng Group G21

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Data Mining and Machine Learning
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
CMPUT 466/551 Principal Source: CMU
Chapter 7 – Classification and Regression Trees
Sparse vs. Ensemble Approaches to Supervised Learning
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Data mining and statistical learning - lecture 13 Separating hyperplane.
ICS 273A Intro Machine Learning
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
Decision tree LING 572 Fei Xia 1/16/06.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Machine Learning CS 165B Spring 2012
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Chapter 9 – Classification and Regression Trees
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Scaling up Decision Trees. Decision tree learning.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CLASSIFICATION: Ensemble Methods
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Using Classification Trees to Decide News Popularity
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Random Forests Feb., 2016 Roger Bohn Big Data Analytics 1.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
Bagging and Random Forests
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Eco 6380 Predictive Analytics For Economists Spring 2016
Chapter 13 – Ensembles and Uplift
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Trees, bagging, boosting, and stacking
Basic machine learning background with Python scikit-learn
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Ungraded quiz Unit 6.
Random Survival Forests
Introduction to Data Mining, 2nd Edition
Multiple Decision Trees ISQS7342
Statistical Learning Dong Liu Dept. EEIS, USTC.
Ensemble learning.
Ensemble learning Reminder - Bagging of Trees Random Forest
Classification with CART
Chapter 7: Transformations
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Konstantina Christakopoulou Liang Zeng Group G21 Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina Christakopoulou Liang Zeng Group G21 Related to the Chapter 28: Data Mining

Motivation. Machine Learning for Economic Transactions: Linear Regression is not Enough! Big data size A lot of features: Choose variables Relationships are not only linear!!

Connection to the Course: Decision Trees e.g ID3 Challenges of ID3: Cannot handle continuous attributes Prone to outliers 1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes + handle missing attributes + over-fitting by post-pruning 2.  Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!

ID3 Decision Tree

Classification and Regression Trees(CART) Classification tree is when the predicted outcome is the class to which the data belongs. Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).

Classification and Regression Trees(CART) Predict Titanic survivors using age and class

Classification and Regression Trees(CART) A CART for Survivors of the Titanic using R language

Random Forests

Random Forests Decision Tree Learning + Many decision trees + One Tree Choose a bootstrap sample and start to grow a tree At each node: Choose random sample of predictors to make the next decision Repeat many times to grow a forest of trees For prediction: have each tree make its prediction and then a majority vote. Decision Tree Learning + Many decision trees + One Tree + Each DT on a random subset of samples + On all learning samples + Reduce the effect of outliers (no overfitting) + Prone to distortions e.g outliers Random Forest

Boosting, Bagging, Bootstrap Randomization can help! Bootstrap: choose (with replacement) a sample Bagging: averaging across models estimated with several bootstraps Boosting: repeated estimation where misclassified observations are given an increasing weight. Final is an average

Thank you!