Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring advanced methods

Similar presentations


Presentation on theme: "Exploring advanced methods"— Presentation transcript:

1 Exploring advanced methods
고급컴퓨터알고리듬 Exploring advanced methods 전자전기컴퓨터공학부 홍기주

2 This chapter covers Training variance Non-monotone effect
환자의 키와 몸무게만으로 건강 여부를 판단하는 모델을 만든다면, 어떤 문제가 있는가? (limited features) Training variance training set이 조금만 바뀌어도 예측 결과가 크게 달라짐 Non-monotone effect ideal healthy weight is in a bounded range, not arbitrarily heavy or arbitrarily light. Linearly inseparable data 선형 분리가 안됨

3 This chapter covers Reducing training variance with bagging and random forests Learning non-monotone relationships with generalized additive models Increasing data separation with kernel methods Modeling complex decision boundaries with support vector machines

4 Using bagging and random forests to reduce training variance
Decision trees are an attractive method for a number of reasons: They take any type of data, numerical or categorical, without any distributional assumptions and without preprocessing. Most implementations (in particular, R’s) handle missing data; the method is also robust to redundant and nonlinear data. The algorithm is easy to use, and the output (the tree) is relatively easy to understand. Once the model is fit, scoring is fast.

5 Using bagging and random forests to reduce training variance
On the other hand, decision trees do have some drawbacks: They have a tendency to overfit, especially without pruning They have high training variance: samples drawn from the same population can produce trees with different structures and different prediction accuracy Prediction accuracy can be low, compared to other methods 위의 단점을 개선하기 위해 bagging 혹은 random forest 를 이용함

6 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Data set spamD.tsv ( D.tsv)

7 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Preparing Spambase data

8 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Evaluating the performance of decision trees

9 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Evaluating the performance of decision trees The accuracy and F1 scores both degrade on the test set, and the deviance increases

10 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Bagging decision trees

11 Using bagging and random forests to reduce training variance
Using bagging to improve prediction Bagging decision trees Bagging improves accuracy and F1, and reduces deviance over both the training and test sets when compared to the single decision tree (less generalization error)

12 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Bagging의 단점 개별 tree에서 사용되는 Feature set이 거의 동일함 랜덤 포레스트란? 랜덤 포레스트 알고리즘 과정은 모집단으로부터 추출된 training dataset에서 복 원 추출에 의해 부스트랩 데이터를 생성한다. 이러한 방법을 N번 반복하여 N개의 부트스트랩 데이터를 생성하고, 의사결정나무 알고리즘을 적용할 때 각각의 노드 에서 랜덤하게 m개의 설명변수를 선택한다.랜덤 포레스트 방법론은 트리(tree) 사 이에 상관관계를 줄임으로써baging 방법에 비해 분산을 줄여준다는 장점을 가지 고 있다.

13 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction The random forest method does the following: Draws a bootstrapped sample from the training data For each sample, grows a decision tree, and at each node of the tree Randomly draws a subset of 𝑚𝑡𝑟𝑦 variables from the 𝑝 total features that are available Picks the best variable and the best split from that set of 𝑚𝑡𝑟𝑦 variables Continues until the tree is fully grown

14 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction The random forest method does the following: Draws a bootstrapped sample from the training data For each sample, grows a decision tree, and at each node of the tree Randomly draws a subset of 𝑚𝑡𝑟𝑦 variables from the 𝑝 total features that are available Picks the best variable and the best split from that set of 𝑚𝑡𝑟𝑦 variables Continues until the tree is fully grown

15 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Using random forests

16 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Report the model quality The random forest model performed dramatically better than the other two models in both training and test. But the random forest’s generalization error was comparable to that of a single decision tree (and almost twice that of the bagged model).

17 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Examining Variable Importance randomForest() 호출 시 importance = T를 설정하면 variable importance를 계산

18 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Examining Variable Importance 중요한 변수 선택을 통해 더 작고 빠르게 tree를 만드는 것이 가능하고, 다른 modeling algorithm에서 사용하는 것 또한 가능함

19 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Examining Variable Importance 중요한 변수 선택을 통해 더 작고 빠르게 tree를 만드는 것이 가능하고, 다른 modeling algorithm에서 사용하는 것 또한 가능함

20 Using bagging and random forests to reduce training variance
Using random forests to further improve prediction Fitting with fewer variables The smaller model performs just as well as the random forest model built using all 57variables.

21 Using bagging and random forests to reduce training variance
Bagging and random forest takeaways Bagging stabilizes decision trees and improves accuracy by reducing variance. Bagging reduces generalization error. Random forests further improve decision tree performance by de- correlating the individual trees in the bagging ensemble. Random forests’ variable importance measures can help you determine which variables are contributing the most strongly to your model. Because the trees in a random forest ensemble are unpruned and potentially quite deep, there’s still a danger of overfitting. Be sure to evaluate the model on holdout data to get a better estimate of model performance.

22 Using generalized additive models (GAMs) to learn non-monotone relationships
Understanding GAMs 저 체중 환자라면, 몸무게가 늘릴수록 더욱 건강해 질 수 있다. 하지만 거기에도 한계가 있다. (non monotone)

23 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example Preparing an artificial problem

24 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example Linear regression applied to our artificial example data가 sin()과 cos()에 의해 만들어졌으므로 linear 하지 않음. R-squared가 0.04로 매우 낮음

25 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example Linear regression applied to our artificial example 현재 model은 error가 heteroscedastic함

26 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example GAM applied to our artificial example

27 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example GAM applied to our artificial example

28 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example GAM applied to our artificial example

29 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example GAM applied to our artificial example The GAM has been fit to be homoscedastic

30 Using generalized additive models (GAMs) to learn non-monotone relationships
A one-dimensional regression example Comparing linear regression and GAM performance The GAM performed similarly on both sets (RMSE of 1.40 on test versus 1.45 on training; R-squared of 0.78 on test versus 0.83 on training).

31 Using generalized additive models (GAMs) to learn non-monotone relationships
Extracting the nonlinear relationships Extracting a learned spline from a GAM

32 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Applying linear regression (with and without GAM) to health data Dataset CDC 2010 natality dataset ( irthData.rData) 주어진 데이터를 이용하여 신생아의 몸무게를 예측 독립변수 mother’s weight (PWGT) mother’s pregnancy weight gain(WTGAIN) mother’s age(MAGER) number of prenatal medical visits(UPREVIS)

33 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Applying linear regression (with and without GAM) to health data

34 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Applying linear regression (with and without GAM) to health data

35 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Applying linear regression (with and without GAM) to health data

36 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Applying linear regression (with and without GAM) to health data edf가 1보다 크므로 4개의 변수 모두 nonlinear 관계라 할 수 있음

37 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Plotting GAM results

38 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Plotting GAM results S() spline과 smooth curve의 형태가 similar함

39 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM on actual data Checking GAM model performance on hold-out data Train set과 비교하여 큰 차이가 없으므로 크게 overfit되지 않았다고 할 수 있음

40 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM for logistic regression GLM logistic regression GAM logistic regression 신생아의 몸무게가 2000이하 (DBWT<2000) 예측

41 Using generalized additive models (GAMs) to learn non-monotone relationships
Using GAM for logistic regression GAM logistic regression

42 Using generalized additive models (GAMs) to learn non-monotone relationships
GAM takeaways GAMs let you represent nonlinear and non-monotonic relationships between variables and outcome in a linear or logistic regression framework. In the mgcv package, you can extract the discovered relationship from the GAM model using the predict() function with the type="terms" parameter. You can evaluate the GAM with the same measures you’d use for standard linear or logistic regression: residuals, deviance, R- squared, and pseudo R-squared. The gam() summary also gives you an indication of which variables have a significant effect on the model. Because GAMs have increased complexity compared to standard linear or logistic regression models, there’s more risk of overfit.

43 Using kernel methods to increase data separation
Synthetic variables? 현재 사용 가능한 변수들로는 좋은 모델을 만들기 힘들어서 새로 운 변수를 얻고자 할 때, 기존에 가지고 있는 데이터를 조합하여 새로운 변수를 만들 수 있는데 이를 Synthetic variable이라고 함 Kernel method를 이용하여 새로운 변수를 만들어서 machine learning의 성능을 향상시킴

44 Using kernel methods to increase data separation
현재 사용 가능한 변수들로는 좋은 모델을 만들기 힘들어서 새로운 변수 를 얻고자 할 때, 기존에 가지고 있는 데이터를 조합하여 새로운 변수를 만들 수 있는데 이를 Synthetic variable이라고 함 Kernel method를 이용하여 새로운 변수를 만들어서 machine learning 의 성능을 향상시킴

45 Using kernel methods to increase data separation
Understanding kernel functions An artificial kernel example k(u,v) = phi(u) %*% phi(v)

46 Using kernel methods to increase data separation
Understanding kernel functions Kernel transformation을 이용하여 linear하게 데이터를 나누는 것이 목표

47 Using kernel methods to increase data separation
Using an explicit kernel on a problem Applying stepwise linear regression to PUMS data (

48 Using kernel methods to increase data separation
Using an explicit kernel on a problem Applying an example explicit kernel transform Phi()를 이용하여 새로운 modeling variable을 생성함

49 Using kernel methods to increase data separation
Using an explicit kernel on a problem Applying an example explicit kernel transform

50 Using kernel methods to increase data separation
Using an explicit kernel on a problem Modeling using the explicit kernel transform RMSE가 조금 개선됨

51 Using kernel methods to increase data separation
Using an explicit kernel on a problem Inspecting the results of the explicit kernel model age와 log income간의 non-monotone관계를 반영하기 위해 AGEP_AGEP라는 새로운 변수를 사용함

52 Using kernel methods to increase data separation
Kernel takeaways Kernels provide a systematic way of creating interactions and other synthetic variables that are combinations of individual variables The goal of kernelizing is to lift the data into a space where the data is separable, or where linear methods can be used directly

53 Using SVMs to model complicated decision boundaries
Understanding support vector machines 선형 분리가 불가능한 데이터(left)를 고차원 커널 공간으로 lift(right)하여 데이터를 선형 분리하는 초평면을 구하는 문제

54 Using SVMs to model complicated decision boundaries
Trying an SVM on artificial example data Setting up the spirals data as an example classification problem

55 Using SVMs to model complicated decision boundaries
Trying an SVM on artificial example data Setting up the spirals data as an example classification problem

56 Using SVMs to model complicated decision boundaries
SUPPORT VECTOR MACHINES WITH THE WRONG KERNEL SVM with a poor choice of kernel 커널을 잘못 선택한 경우

57 Using SVMs to model complicated decision boundaries
SUPPORT VECTOR MACHINES WITH THE WRONG KERNEL SVM with a poor choice of kernel

58 Using SVMs to model complicated decision boundaries
SUPPORT VECTOR MACHINES WITH THE WRONG KERNEL SVM with a good choice of kernel

59 Using SVMs to model complicated decision boundaries
Using SVMs on real data Revisiting the Spambase example with GLM

60 Using SVMs to model complicated decision boundaries
Using SVMs on real data Applying an SVM to the Spambase example

61 Using SVMs to model complicated decision boundaries
Using SVMs on real data Printing the SVM results summary

62 Using SVMs to model complicated decision boundaries
COMPARING RESULTS Shifting decision point to perform an apples-to-apples comparison SVM에서 spam(false postive)을 162개로 예측했으므로 GLM의 threshold도 수정

63 Using SVMs to model complicated decision boundaries
Support vector machine takeaways SVMs are a kernel-based classification approach where the kernels are represented in terms of a (possibly very large) subset of the training examples. SVMs try to lift the problem into a space where the data is linearly separable (or as near to separable as possible). SVMs are useful in cases where the useful interactions or other combinations of input variables aren’t known in advance. They’re also useful when similarity is strong evidence of belonging to the same class.

64 Summary Bagging and random forests—To reduce the sensitivity of models to early modeling choices and reduce modeling variance Generalized additive models—To remove the (false) assumption that each model feature contributes to the model in a monotone fashion Kernel methods—To introduce new features that are nonlinear combinations of existing features, increasing the power of our model Support vector machines—To use training examples as landmarks (support vectors), again increasing the power of our model

65 END


Download ppt "Exploring advanced methods"

Similar presentations


Ads by Google