Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents.

Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents

Num Features vs. Running Time Random Forests ran out of memory at 4000 features. At 3000 features SVM = 17 sec/fold RF-100 = 743 sec/fold RF-50 = 368 sec/fold RF-10 = 79 sec/fold Approximately double the run time for double the features and double the run time for double the trees.

SVM Results DatasetAttributesAccuracy Domains684850.9 to 79.4 Localization1251.4 to 57.7 Physiochemical1856.5 to 67.5 Primary Features109650.9 to 81.7 Secondary Structure26250.2 to 73.3 RBF with C = 1.0 or C = 100 with G = 0.25 PolyKernel all settings relatively equal PolyKernel outperformed RBF on average

SVM Testing Methodology SMO used for results Used WEKA Explorer to generate runs Able to classify each dataset in its entirety without splitting Tried libsvm, much faster, crazy accuracy. Accuracy and ROC AUC were equal for SVM Information Gain with threshold of 0 used for all datasets

RF Results DatasetAttributesAccuracy Domains100074.3 to 77.1 Localization1264.1 to 71.0 Localization (no IG)4464.2 to 71.0 Physiochemical1870.3 to 72.9 Primary Features99475.5 to 77.1 Secondary Structure100069.2 to 76.2 FastRandomForest used Thresholds of IG (Ranker) used to limit datasets For smaller datasets, 2000 (and 1000) trees crashed WEKA

RF Testing Methodology FastRandomForestused for results Used WEKA Explorer to generate runs Able to classify each dataset in its entirety with information gain without splitting Information Gain often used to restrict datasets to approx 1000 attributtes Information Gain used with threshold of 0 on sets under the 1000 attributes mark

RF Best Results Primary Dataset: 77.1% correctly classified, 0.83 AUC with 500 trees and ½ sqrt features Secondary Dataset: 76.2% correctly classified, 0.76 AUC with 100 trees and ½ sqrt features Localization Dataset: 71.0% correctly classified, 0.75 AUC with 500 trees and sqrt features Physiochemical Dataset: 72.9% correctly classified, 0.79 AUC with 500 trees and sqrt features Domains Dataset: 77.1% correctly classifies, 0.85 AUC with 100 trees and ½ sqrt features

Neat Charts

Best Results Primary Dataset: SVM @ 81.7% Secondary Dataset: RF @ 76.2% Localization Dataset: RF @ 71.0% Physiochemical Dataset: RF @ 72.9% Domains Dataset: RF @ 77.1%

Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents.

Similar presentations

Presentation on theme: "Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents.

Similar presentations

Presentation on theme: "Team Question Mark Jeff Brown Dante Kappotis Robert Vanderley Anthony Biasella A little information on: HERPES presents."— Presentation transcript:

Similar presentations

About project

Feedback