Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – Center for Biological Sequence.

Similar presentations


Presentation on theme: "Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – Center for Biological Sequence."— Presentation transcript:

1 Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – kerner@cbs.dtu.dkkerner@cbs.dtu.dk Center for Biological Sequence Analysis Technical University of Denmark

2 Exercise Solution: donors_trainset.arff - All features: trees.J48 === Stratified cross-validation === === Summary === Correctly Classified Instances 4972 94.5967 % Incorrectly Classified Instances 284 5.4033 % Kappa statistic 0.8381 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.87 0.034 0.875 0.87 0.872 true 0.966 0.13 0.965 0.966 0.966 false === Confusion Matrix === a b <-- classified as 971 145 | a = true 139 4001 | b = false

3 Exercise Solution: donors_trainset.arff - All features: bayes.NaiveBayes === Stratified cross-validation === === Summary === Correctly Classified Instances 4910 93.417 % Incorrectly Classified Instances 346 6.583 % Kappa statistic 0.8056 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.862 0.046 0.834 0.862 0.848 true 0.954 0.138 0.962 0.954 0.958 false === Confusion Matrix === a b <-- classified as 962 154 | a = true 192 3948 | b = false

4 Exercise Solution: donors_trainset.arff - All features: functions.SMO === Stratified cross-validation === === Summary === Correctly Classified Instances 4986 94.863 % Incorrectly Classified Instances 270 5.137 % Kappa statistic 0.8455 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.871 0.03 0.885 0.871 0.878 true 0.97 0.129 0.965 0.97 0.967 false === Confusion Matrix === a b <-- classified as 972 144 | a = true 126 4014 | b = false

5 @RELATION donors.train @ATTRIBUTE -7_A {0,1} @ATTRIBUTE -7_T {0,1} @ATTRIBUTE -7_C {0,1} [...] @ATTRIBUTE 6_A {0,1} @ATTRIBUTE 6_T {0,1} @ATTRIBUTE 6_C {0,1} @ATTRIBUTE 6_G {0,1} @ATTRIBUTE class {true,false} @DATA 0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0, 0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0, 0,true 0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,0, 0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1, 0,true [...] @RELATION donors.train @ATTRIBUTE -7 {A,C,G,T} @ATTRIBUTE -6 {A,C,G,T} @ATTRIBUTE -5 {A,C,G,T} @ATTRIBUTE -4 {A,C,G,T} [...] @ATTRIBUTE +3 {A,C,G,T} @ATTRIBUTE +4 {A,C,G,T} @ATTRIBUTE +5 {A,C,G,T} @ATTRIBUTE +6 {A,C,G,T} @ATTRIBUTE splicesite {true,false} @DATA C,T,C,C,G,A,A,A,G,G,A,T,T,true T,C,A,G,A,A,G,G,A,G,G,G,C,true T,T,G,G,A,A,G,T,C,G,C,A,G,true [..] donors_trainset.arff Binary Feature Encoding Exercise Solution: donors_trainset_diffencod.arff Fewer features Four (nominal) values per feature

6 Exercise Solution: donors_trainset_diffencod.arff - All features: trees.J48 === Stratified cross-validation === === Summary === Correctly Classified Instances 4948 94.14 % Incorrectly Classified Instances 308 5.86 % Kappa statistic 0.8248 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.862 0.037 0.862 0.862 0.862 true 0.963 0.138 0.963 0.963 0.963 false === Confusion Matrix === a b <-- classified as 962 154 | a = true 154 3986 | b = false

7 Exercise Solution: donors_trainset_diffencod.arff - All features: bayes.NaiveBayes === Stratified cross-validation === === Summary === Correctly Classified Instances 4922 93.6454 % Incorrectly Classified Instances 334 6.3546 % Kappa statistic 0.8078 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.834 0.036 0.862 0.834 0.848 true 0.964 0.166 0.956 0.964 0.96 false === Confusion Matrix === a b <-- classified as 931 185 | a = true 149 3991 | b = false

8 Exercise Solution: donors_trainset_diffencod.arff - All features: functions.SMO === Stratified cross-validation === === Summary === Correctly Classified Instances 4986 94.863 % Incorrectly Classified Instances 270 5.137 % Kappa statistic 0.8456 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.872 0.031 0.885 0.872 0.878 true 0.969 0.128 0.966 0.969 0.967 false === Confusion Matrix === a b <-- classified as 973 143 | a = true 127 4013 | b = false

9 Exercise Solution: Feature Selection: CfsSubsetEval, BestFirst: Features-2A, -1G, 1A, 2A, 3_G CorrelationCoefficients: J48:0.7981 NaiveBayes:0.7762 SMO:0.7388 MultilayerPerceptron:0.8053 ClassifierSubsetEval (w/ NaiveBayes), BestFirst: Features: -7A, -7C, -6G, -4A, -1G, 1A, 1T, 1C, 2A, 3G, 4T, 5A CorrelationCoefficients: J48:0.7935 NaiveBayes:0.8033 SMO:0.7597 MultilayerPerceptron:0.7765

10 Summary Generally, there is no ‘best’ method for all problems. Feature representation can influence classification results. Feature selection often improves classification performance, but not always. Feature selection significantly speeds up classification – thereby allowing also computationally very demanding classifiers Always try to test multiple methods!


Download ppt "Classification by Machine Learning Approaches - Exercise Solution Michael J. Kerner – Center for Biological Sequence."

Similar presentations


Ads by Google