Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.

Similar presentations


Presentation on theme: "1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is."— Presentation transcript:

1 1 1 Slide Using Weka

2 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is about going from data to information, information that can give you useful predictions Data mining is about going from data to information, information that can give you useful predictions n Examples?? You’re at the supermarket checkout. You’re at the supermarket checkout. You’re happy with your bargains … and … the supermarket is happy you’ve bought some more stuff You’re happy with your bargains … and … the supermarket is happy you’ve bought some more stuff Say you want a child, but you and your partner can’t have one. Can data mining help? Say you want a child, but you and your partner can’t have one. Can data mining help? n Data mining vs. machine learning

3 3 3 Slide Data Mining Using Weka n What’s Weka? A bird found only in New Zealand? A bird found only in New Zealand? n Data mining workbench Waikato Environment for Knowledge Analysis Waikato Environment for Knowledge Analysis n Machine learning algorithms for data mining tasks 100+ algorithms for classification 100+ algorithms for classification 75 for data preprocessing 75 for data preprocessing 25 to assist with feature selection 25 to assist with feature selection 20 for clustering, finding association rules, etc 20 for clustering, finding association rules, etc

4 4 4 Slide Data Mining Using Weka n What will you learn? Load data into Weka and look at it Load data into Weka and look at it Use filters to preprocess it Use filters to preprocess it Explore it using interactive visualization Explore it using interactive visualization Apply classification algorithms Apply classification algorithms Interpret the output Interpret the output Understand evaluation methods and their implications Understand evaluation methods and their implications Understand various representations for models Understand various representations for models Explain how popular machine learning algorithms work Explain how popular machine learning algorithms work

5 5 5 Slide Data Mining Using Weka n What will you learn? (cont.) Be aware of common pitfalls with data mining Be aware of common pitfalls with data mining Use Weka on your own data … and understand what you are doing! Use Weka on your own data … and understand what you are doing!

6 6 6 Slide Data Mining Using Weka n Getting started with Weka Install Weka Install Weka Explore the “Explorer” interface Explore the “Explorer” interface Explore some datasets Explore some datasets Build a classifier Build a classifier Interpret the output Interpret the output Use filters Use filters Visualize your data set Visualize your data set

7 7 7 Slide Data Mining Using Weka n Install Weka Download links available on Course Page Download links available on Course Page http://chouc.people.cofc.edu/SCU/DM/index.html n Platform: Windows X86 Windows X86 Windows X64 Windows X64 Mac OSX Mac OSX n Version: 3.6.10 the latest stable version of Weka the latest stable version of Weka datasets for the course datasets for the course

8 8 8 Slide Data Mining Using Weka n Exercise Install Weka Install Weka Get datasets along with the installation Get datasets along with the installation Load the Weka program Load the Weka program Open Explorer Open Explorer Open a dataset (weather.nominal.arff) Open a dataset (weather.nominal.arff) Look at attributes Look at attributes Edit the dataset Edit the dataset Save it if you need to make changes to the dataset Save it if you need to make changes to the dataset

9 9 9 Slide Command‐line interface Graphical interface Performance comparisons Exploring the Explorer

10 10 Slide Exploring the Explorer

11 11 Slide attributes 1234567891011121314 instances Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Exploring the Explorer n Open a dataset (weather.nominal.arff)

12 12 Slide 19 open file weather.nominal.arff Exploring the Explorer

13 13 Slide attributes attribute values Exploring the Explorer

14 14 Slide attributes 1234567891011121314 instances Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Exploring the Explorer

15 15 Slide attributes class attribute values open file weather.nominal.arff Exploring the Explorer

16 16 Slide Classification Dataset: classified examples “ Model ” that classifies new examples instance: fixed setof features classified example class discrete: “ classification ” problem continuous: “ regression ” problem discrete ( “ nominal ” ) continuous ( “ numeric ” ) attribute 1 attribute 2 … attribute n sometimes called “ supervised learning ” Exploring the Explorer

17 17 Slide attributes class attribute values open file weather. numeric.arff Exploring the Explorer

18 18 Slide open file glass.arff Exploring the Explorer

19 19 Slide Exploring the Explorer n Exercise on the classification problem Datasets: weather.nominal, weather.numeric Datasets: weather.nominal, weather.numeric Nominal vs numeric attributes Nominal vs numeric attributes ARFF file format ARFF file format Checking attributes Checking attributes

20 20 Slide Exploring the Explorer n File format ARFF file format ARFF file format Native in WekaNative in Weka More informationMore information CSV file format CSV file format Compatible with Excel and WekaCompatible with Excel and Weka

21 21 Slide Exploring the Explorer n Excise on File Preparation Prepare ARFF file Prepare ARFF file Specialized formatSpecialized format Need to follow ARFF syntaxNeed to follow ARFF syntax CSV file format CSV file format Comma separated formatComma separated format Notepad compatibleNotepad compatible Excel compatibleExcel compatible

22 22 Slide Exploring the Explorer n Excise on File Preparation (cont.) ARFF  CSV ARFF  CSV EasyEasy In Weka Explorer, use Save… feature after loading the dataset and change file format to CSV data filesIn Weka Explorer, use Save… feature after loading the dataset and change file format to CSV data files CSV  ARFF CSV  ARFF EasyEasy In Weka Explorer, use Open File… feature and change the file format to CSV data filesIn Weka Explorer, use Open File… feature and change the file format to CSV data files Next, use Save… feature and change the file format to Arff data filesNext, use Save… feature and change the file format to Arff data files

23 23 Slide Building a classifier n Use J48 to analyze the glass dataset Open file glass.arff Open file glass.arff Check the available classifiers Check the available classifiers Choose the J48 decision tree learner (trees>J48) Choose the J48 decision tree learner (trees>J48) Run it Run it Examine the output Examine the output Look at the correctly classified instances … and the confusion matrix Look at the correctly classified instances … and the confusion matrix

24 24 Slide Building a classifier n Investigate J48 Open the configuration panel Open the configuration panel Check the More information Check the More information Examine the options Examine the options Use an unpruned tree Use an unpruned tree Look at leaf sizes Look at leaf sizes Set minNumObj to 15 to avoid small leaves Set minNumObj to 15 to avoid small leaves Visualize tree using right ‐ click menu Visualize tree using right ‐ click menu

25 25 Slide Building a classifier n From C4.5 to J48 ID3 (1979) ID3 (1979) C4.5 (1993) C4.5 (1993) C4.8 (1996) C4.8 (1996) C5.0 (commercial) C5.0 (commercial) J48

26 26 Slide Building a classifier n Investigate J48 Classifiers in Weka Classifiers in Weka Classifying the glass dataset Classifying the glass dataset Interpreting J48 output Interpreting J48 output J48 configuration panel J48 configuration panel … option: pruned vs unpruned trees … option: pruned vs unpruned trees … option: avoid small leaves … option: avoid small leaves

27 27 Slide Using a filter n Use a filter to remove an attribute (3 rd attribute) Open weather.nominal.arff Open weather.nominal.arff Check the filters Check the filters supervised vs unsupervisedsupervised vs unsupervised attribute vs instanceattribute vs instance Choose the unsupervised attribute filter Remove Choose the unsupervised attribute filter Remove Check the More information; look at the options Check the More information; look at the options Set attributeIndices to 3 and click OK (to remove the 3 rd attribute) Set attributeIndices to 3 and click OK (to remove the 3 rd attribute) Apply the filter Apply the filter Save the result or press Undo to skip the change Save the result or press Undo to skip the change

28 28 Slide Using a filter n Use Remove button to remove attributes Open weather.nominal.arff Open weather.nominal.arff Use check boxes and Remove button Use check boxes and Remove button

29 29 Slide Using a filter n Remove instances where humidity is high Open weather.nominal.arff Open weather.nominal.arff Supervised or unsupervised? Supervised or unsupervised? Attribute or instance? Attribute or instance? Look at them Look at them Select RemoveWithValues Select RemoveWithValues Set attributeIndex to 3 (3 rd attribute) Set attributeIndex to 3 (3 rd attribute) Set nominalIndices to 1 (first value: high) Set nominalIndices to 1 (first value: high) Apply Apply Undo Undo

30 30 Slide Using a filter n Fewer attributes, better classification! Open glass.arff Open glass.arff Run J48 (trees>J48) Run J48 (trees>J48) Remove Fe Remove Fe Remove all attributes except RI and MG Remove all attributes except RI and MG Look at the decision trees Look at the decision trees Use right ‐ click menu to visualize decision trees Use right ‐ click menu to visualize decision trees

31 31 Slide Using a filter n Summary Filters in Weka Filters in Weka Supervised vs unsupervised, attribute vs instance Supervised vs unsupervised, attribute vs instance To find the right one, you need to look To find the right one, you need to look Filters can be very powerful Filters can be very powerful Smartly removing attributes Smartly removing attributes improve performanceimprove performance increase comprehensibilityincrease comprehensibility

32 32 Slide Visualizing your data n Using the Visualize panel Open iris.arff Open iris.arff Bring up Visualize panel Bring up Visualize panel Click one of the plots; examine some instances Click one of the plots; examine some instances Set x axis to petalwidth and y axis to petallength Set x axis to petalwidth and y axis to petallength Click on Class color to change the color Click on Class color to change the color Bars on the right change correspond to attributes: click for x axis; right ‐ click for y axis Bars on the right change correspond to attributes: click for x axis; right ‐ click for y axis Jitter slider (to see the overlapped instances) Jitter slider (to see the overlapped instances) Show Select Instance: Rectangle option Show Select Instance: Rectangle option Submit, Reset, Clear and Save Submit, Reset, Clear and Save

33 33 Slide Visualizing your data n Visualizing classification errors Open iris.arff Open iris.arff Run J48 (trees>J48) Run J48 (trees>J48) Visualize classifier errors (from Results list) Visualize classifier errors (from Results list) Plot predictedclass against class Plot predictedclass against class Identify errors shown by confusion matrix Identify errors shown by confusion matrix

34 34 Slide Visualizing your data n Summary Get down and dirty with your data Get down and dirty with your data Visualize it Visualize it Clean it up by deleting outliers Clean it up by deleting outliers Look at classification errors Look at classification errors (there’s a filter that allows you to add classifications as a new attribute)(there’s a filter that allows you to add classifications as a new attribute)


Download ppt "1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is."

Similar presentations


Ads by Google