Presentation is loading. Please wait.

Presentation is loading. Please wait.

 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Similar presentations


Presentation on theme: " The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato."— Presentation transcript:

1

2  The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato in New Zealand  It is Comprehensive suite of Java class libraries  Implement many state-of-the-art machine learning and data mining algorithms  It supports data files like CSV(Comma Separated file), ARFF(Attribute-Relation File Format)…

3  Collection of ML(Machine Learning) algorithms – open-source Java package  Schemes for classification include: decision trees, rule learners, naive Bayes, decision tables, locally weighted regression, SVMs, instance-based learners, logistic regression, voted perceptrons, multi-layer perceptron  Schemes for numeric prediction include: linear regression, model tree generators, locally weighted regression, instance-based learners, decision tables, multi-layer perceptron  Meta-schemes include: Bagging, boosting, stacking, regression via classification, classification via regression, cost sensitive classification  Schemes for clustering: EM and Cobweb

4  49 data preprocessing tools  76 classification/regression algorithms  8 clustering algorithms  15 attribute/subset evaluators + 10 search algorithms for feature selection  3 algorithms for finding association rules  3 graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The Knowledge Flow” (new process model inspired interface)

5  Require declarations of @RELATION, @ATTRIBUTE and @DATA  @RELATION declaration associates a name with the dataset Syntax: @RELATION E.g. @RELATION stud  @ATTRIBUTE declaration specifies the name and type of an attribute Syntax: @attribute Datatype can be numeric, nominal, string or date E. g. @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}  @DATA declaration is a single line denoting the start of the data segment Missing values are represented by ? @DATA 5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, ?, 1.4, ?, Iris-versicolor

6  In addition to nominal and numeric attributes, exemplified by the weather data, the ARFF format has two further attribute types: string attributes and date attributes. String attributes have values that are textual. Suppose you have a string attribute that you want to call description. In the block defining the attributes, it is specified as follows: @attribute description string Then, in the instance data, include any character string in quotation marks (to include quotation marks in your string, use the standard convention of preceding each one by a backslash, \). Strings are stored internally in a string table and represented by their address in that table. Thus two strings that contain the same characters will have the same value.

7  In Date attributes are strings with a special format and are introduced like this: @attribute today date (for an attribute called today). Weka, the machine learning software discussed in Part II of this book, uses the ISO-8601 combined date and time format yyyy-MM-dd-THH:mm:ss with four digits for the year, two each for the month and day, then the letter T followed by the time with two digits for each of hours, minutes, and seconds.1 In the data section of the file, dates are specified as the orresponding string representation of the date and time, for example, 2004-04-03T12:00:00. Although they are specified as strings, dates are converted to numeric form when the input file is read. Dates can also be converted internally to different formats, so you can have absolute timestamps in the data file and use transformations to forms such as time of day or day of the week to detect periodic behavior.

8  Similar to AARF files except that data value 0 are not represented  Non-zero attributes are specified by attribute number and value  For examples of ARFF files see $WEKAHOME/data @data 0, X, 0, Y, “class A” 0, 0, W, 0, "class B" @data {1 X, 3 Y, 4 "class A"} {2 W, 4 "class B"}

9  -t Specify training file represented  -T If none, CV is performed on training data  -x Number of folds for cross-validation  -s For CV  -l Use saved model  -d Output model to file

10  Internal variables private Should have protected or package- level access  SparseInstance for Strings requires dummy at index 0 Problem: Strings are mapped into internal indices to an array String at position 0 is mapped to value “0” When written out as SparseInstance, it will not be written (0 value) If read back in, first String missing from Instances Solution: Put dummy string in position 0 when writing a SparseInstance with strings Dummy will be ignored while writing, actual instance will be written properly

11


Download ppt " The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato."

Similar presentations


Ads by Google