Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg.

Similar presentations


Presentation on theme: "A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg."— Presentation transcript:

1 A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg

2 What is weka? ● Java-based Machine Learning Tool ● Implements numerous classifiers ● 3 modes of operation – GUI – Command Line – Java API (not discussed here) ● Google: ‘weka java’

3 S1 S2 … SN.arff Weka best model results Test.arff results Weka Preprocessing (you) Homework 2 Weka Workflow Grading (us) Experimentation (you) T1 … TN Your Feature Extractor Your Feature Extractor

4 weka Homepage ● http://www.cs.waikato.ac.nz/ml/weka/ ● To run: – java -Xmx1024M -jar ~cs4705/bin/weka.jar &

5 .arff file format ● http://www.cs.waikato.ac.nz/~ml/weka/arff.html % 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor, Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa …

6 .arff file format @attribute attrName {numeric, string,, date}  numeric: a number  nominal: a (finite) set of strings, e.g. {Iris-setosa,Iris-versicolor, Iris-virginica}  string:  date: (default ISO-8601) yyyy-MM-dd’T’HH:mm:ss

7 Example Arff Files ● ~cs4705/bin/weka-3-4-11/data/ ● iris.arff ● soybean.arff ● weather.arff

8 To Classify with weka GUI 1.Run weka GUI 2.Click 'Explorer' 3.'Open file...' 4.Select 'Classify' tab 5.'Choose' a classifier 6.Confirm options 7.Click 'Start' 8.Wait... 9.Right-click on Result list entry a.'Save result buffer' b.'Save model'

9 Classify ● Some classifiers to start with. – NaiveBayes – JRip – J48 – SMO ● Find References by selecting a classifier ● Use Cross-Validation!

10 Analyzing Results ● Important tools for Homework 2 – Accuracy ● “Correctly classified instances” – F-measure – Confusion matrix – Save model – Visualization

11 Running weka from the Command Line ● Running an N-fold cross validation experiment – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i ● Using a predefined test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff

12 ● Saving the model – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model ● Classifying a test set – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff ● Getting help – java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?

13 Tips for Homework Success ● Start early ● Read instructions carefully ● Start simply ● Your system should always work – 80/20 Rule – Add features incrementally – This way, you always have something you can turn in.


Download ppt "A Short Introduction to Weka Natural Language Processing Thursday, September 27 Frank Enos and Andrew Rosenberg."

Similar presentations


Ads by Google