Presentation is loading. Please wait.

Presentation is loading. Please wait.

Konstantina Christakopoulou Liang Zeng Group G21

Similar presentations


Presentation on theme: "Konstantina Christakopoulou Liang Zeng Group G21"— Presentation transcript:

1 Konstantina Christakopoulou Liang Zeng Group G21
Big Data: New Tricks for Econometrics Varian, Hal R. "Big data: New tricks for econometrics." The Journal of Economic Perspectives (2014): 3-27. Konstantina Christakopoulou Liang Zeng Group G21 Related to the Chapter 28: Data Mining

2 Motivation. Machine Learning for Economic Transactions:
Linear Regression is not Enough! Big data size A lot of features: Choose variables Relationships are not only linear!!

3 Connection to the Course: Decision Trees e.g ID3
Challenges of ID3: Cannot handle continuous attributes Prone to outliers 1. C4.5, Classification And Regression Trees (CART) can handle: + continuous and discrete attributes + handle missing attributes + over-fitting by post-pruning 2.  Random Forests: Ensemble of decision stumps. Randomization (choosing sample + choosing attributes) leads to better accuracy!

4 ID3 Decision Tree

5 Classification and Regression Trees(CART)
Classification tree is when the predicted outcome is the class to which the data belongs. Regression tree is when the predicted outcome can be considered a real number (e.g. the age of a house, or a patient’s length of stay in a hospital).

6 Classification and Regression Trees(CART)
Predict Titanic survivors using age and class

7 Classification and Regression Trees(CART)
A CART for Survivors of the Titanic using R language

8 Random Forests

9 Random Forests Decision Tree Learning + Many decision trees + One Tree
Choose a bootstrap sample and start to grow a tree At each node: Choose random sample of predictors to make the next decision Repeat many times to grow a forest of trees For prediction: have each tree make its prediction and then a majority vote. Decision Tree Learning + Many decision trees + One Tree + Each DT on a random subset of samples + On all learning samples + Reduce the effect of outliers (no overfitting) + Prone to distortions e.g outliers Random Forest

10 Boosting, Bagging, Bootstrap
Randomization can help! Bootstrap: choose (with replacement) a sample Bagging: averaging across models estimated with several bootstraps Boosting: repeated estimation where misclassified observations are given an increasing weight. Final is an average

11 Thank you!


Download ppt "Konstantina Christakopoulou Liang Zeng Group G21"

Similar presentations


Ads by Google