Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations


Presentation on theme: "Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

1 Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

2 What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif

3 What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif

4 Plan for Today Any questions  About the second assignment? Announcements  Quiz 1 Answer Key on Blackboard  Comments about Assignment 1 Kwiatowska Paper Error Analysis Data Cleansing Trees vs Tables Weka helpful hints ARFF format

5 General Points – Assignment I Book Keeping Issues  Write your name on the assignment  Write the assignment number  Format the assignment using MS Doc Please don’t use.docx formats Visualize the tree in graphical fashion  Use Right-Click and then take a screen shot  Embed the figure in doc file

6 Visualizing the Tree

7

8 Kwiatowksa Paper

9 Clinical Prediction Rules Example application of machine learning Rules created by medical practitioners based on their experience Can we use machine learning contribute to the accumulation of medical wisdom? ER1 = If BMI > 40 and Age > 65 and Gender = male  Then OSA = Yes ER2 = If BMI < 25 and Age < 25 and Gender = female  Then OSA = No

10 Methodological Flaw? Human generated rules don’t cover most of the data ER1 ER2

11 Machine Learning Result

12 ML Wins some and Loses Others ER1 ER2

13 Compare results ER1 = If BMI > 40 and Age > 65 and Gender = male  Then OSA = Yes ER2 = If BMI < 25 and Age < 25 and Gender = female  Then OSA = No Note: the paper says this is the tree for set B.

14 Claims from paper… Learned rules were largely consistent with human generated rules Automatic  If BMI > 28.03 and Gender = Male Then OSA = Yes Human  If BMI > 40 and Age > 65 and Gender = Male Then OSA = Yes Do you buy their argument?

15 More claims… Is this a contradiction? Automatic  If Gender = Male and MP = 2 Then OSA = No Human  If BMI > 40 and Age > 65 and Gender = Male Then OSA = Yes What about the relationship between age and MP or BMI and MP?

16 What is the Mallampati classification? http://www.accessmedicine.com/search/searchAMResultImg.aspx? rootterm=mallampati+score&rootID=46310&searchType=1

17 Thought questions Would you trust medical “wisdom” that comes from data mining? What would be your concerns? What would you want to know about how the “wisdom” was learning?

18 Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values  Names with or without middle initial  Nick name versus real name  Typos City or street names may change over time  Street names may change depending on the block Inconsistencies in how forms are filled out  Address and phone number fields in different countries

19 Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values  Names with or without middle initial  Nick name versus real name  Typos City or street names may change over time  Street names may change depending on the block Inconsistencies in how forms are filled out  Address and phone number fields in different countries

20 Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values  Names with or without middle initial  Nick name versus real name  Typos City or street names may change over time  Street names may change depending on the block Inconsistencies in how forms are filled out  Address and phone number fields in different countries

21 Trees versus Tables

22 Decision Tables vs Decision Trees Open World Assumption  Only examine some attributes in particular contexts  Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption  Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table

23 Decision Tables vs Decision Trees Open World Assumption  Only examine some attributes in particular contexts  Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption  Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table

24 Decision Tables vs Decision Trees Open World Assumption  Only examine some attributes in particular contexts  Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption  Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table

25 Decision Tables vs Decision Trees Open World Assumption  Only examine some attributes in particular contexts  Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption  Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table

26 Weka Helpful Hints

27 Use the visualize tab to view 3-way interactions

28 Click in one of the boxes to zoom in

29 Use the visualize tab to view 3-way interactions

30 Weka Data Structures

31 ARFF format

32 Types of Attributes Numeric (continuous)  @attribute temperature numeric  Real numbers or integers  Can be compared (less than, greater than, equality, inequality)  Some algorithms treat numeric scales as ratios or look at “distances”  Some methods normalize numeric scales  Some machine learning algorithms treat numbers as nominal values

33 Types of Attributes Nominal (categorical)  @attribute outlook {sunny, overcast, rainy}  Finite number of pre-specified values  Values are just labels (the actual label is not meaningful to the algorithms)  Values are not ordered and cannot be compared except for equality/inequality

34 Types of Attributes Strings (just like nominal, makes troubleshooting text processing more convenient)  @attribute description string  Value can be any string in quotes “Look, Mom! No hands!”  Can be converted to a vector of numeric attributes, each representing one word

35 Types of Attributes Date (numeric)  @attribute today date ‘YYYY-MM-dd-THH:mm:ss’  2006-01-24-T12:00:00  Specified as strings but then converted to numbers when file is read

36 Reasoning About Time

37 Not Bad Performance with a Simple Split

38 Threshold is Off

39 Ordinal Values Weka technically does not have ordinal attributes  But you can simulate them with “temperature coding”!  Try to represent “If X less than or equal to.35”?.2.25.28.31.35.45.47.52.6.63 ABCD A A or B A or B or C A or B or C or D

40 Questions?


Download ppt "Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."

Similar presentations


Ads by Google