Presentation is loading. Please wait.

Presentation is loading. Please wait.

Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents.

Similar presentations


Presentation on theme: "Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents."— Presentation transcript:

1 Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents

2 Num Features vs. Running Time Random Forests ran out of memory at 4000 features. At 3000 features SVM = 17 sec/fold RF-100 = 743 sec/fold RF-50 = 368 sec/fold RF-10 = 79 sec/fold Approximately double the run time for double the features and double the run time for double the trees.

3 SVM Results DatasetAttributesAccuracy Domains684850.9 to 79.4 Localization1251.4 to 57.7 Physiochemical1856.5 to 67.5 Primary Features109650.9 to 81.7 Secondary Structure26250.2 to 73.3 RBF with C = 1.0 or C = 100 with G = 0.25 PolyKernel all settings relatively equal PolyKernel outperformed RBF on average

4 SVM Testing Methodology SMO used for results Used WEKA Explorer to generate runs Able to classify each dataset in its entirety without splitting Tried libsvm, much faster, crazy accuracy. Accuracy and ROC AUC were equal for SVM Information Gain with threshold of 0 used for all datasets

5 RF Results DatasetAttributesAccuracy Domains100074.3 to 77.1 Localization1264.1 to 71.0 Localization (no IG)4464.2 to 71.0 Physiochemical1870.3 to 72.9 Primary Features99475.5 to 77.1 Secondary Structure100069.2 to 76.2 FastRandomForest used Thresholds of IG (Ranker) used to limit datasets For smaller datasets, 2000 (and 1000) trees crashed WEKA

6 RF Testing Methodology FastRandomForestused for results Used WEKA Explorer to generate runs Able to classify each dataset in its entirety with information gain without splitting Information Gain often used to restrict datasets to approx 1000 attributtes Information Gain used with threshold of 0 on sets under the 1000 attributes mark

7 RF Best Results Primary Dataset: 77.1% correctly classified, 0.83 AUC with 500 trees and ½ sqrt features Secondary Dataset: 76.2% correctly classified, 0.76 AUC with 100 trees and ½ sqrt features Localization Dataset: 71.0% correctly classified, 0.75 AUC with 500 trees and sqrt features Physiochemical Dataset: 72.9% correctly classified, 0.79 AUC with 500 trees and sqrt features Domains Dataset: 77.1% correctly classifies, 0.85 AUC with 100 trees and ½ sqrt features

8 Neat Charts

9 Best Results Primary Dataset: SVM @ 81.7% Secondary Dataset: RF @ 76.2% Localization Dataset: RF @ 71.0% Physiochemical Dataset: RF @ 72.9% Domains Dataset: RF @ 77.1%


Download ppt "Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents."

Similar presentations


Ads by Google