Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning II Introduction to Artificial Intelligence CS440/ECE448 Lecture 21.

Similar presentations


Presentation on theme: "Learning II Introduction to Artificial Intelligence CS440/ECE448 Lecture 21."— Presentation transcript:

1 Learning II Introduction to Artificial Intelligence CS440/ECE448 Lecture 21

2 Last lecture (in)efficiency of exact inference with Bayes netsThe (in)efficiency of exact inference with Bayes nets The learning problemThe learning problem Decision treesDecision trees This lecture Identification treesIdentification trees Neural networks: PerceptronsNeural networks: Perceptrons Reading Reading Chapters 18 and 20

3

4

5

6 Inductive learning method Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

7 Inductive learning method Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

8 Inductive learning method Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

9 Inductive learning method Ockhams razor: prefer the simplest consistent hypothesis. Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

10 Inductive Learning Given examples of some concepts and a description (features) for these concepts as training data, learn how to classify subsequent descriptions into one of the concepts. –Concepts (classes) –Features –Training set –Test set Here, the function has discrete outputs (the classes).

11 Decision Trees Should I play tennis today? Note: A decision tree can be expressed as a disjunction of conjunctions (Outlook = sunny) (Humidity = normal) (Outlook = overcast) (Wind=Weak) Outlook Humidity Wind NoYes NoYes No Sunny Rain Overcast HighLow Strong Weak

12 Learning Decision Trees Inductive Learning. Given a set of positive and negative training examples of a concept, can we learn a decision tree that can be used to appropriately classify other examples? Identification Trees: ID3 [ Quinlan, 1979 ].

13 What on Earth causes people to get sunburns? I dont know, so lets go to the beach and collect some data.

14 Sunburn data NameHairHeightSwim Suit ColorLotionResult SarahBlondAverageYellowNoSunburned DanaBlondTallRedYesFine AlexBrownShortRedYesFine AnnieBlondShortRedNoSunburned EmilyRedAverageBlueNoSunburned PeteBrownTallBlueNoFine JohnBrownAverageBlueNoFine KatieBlondShortYellowYesFine There are 3 x 3 x 3 x 2 = 54 possible feature vectors

15 Exact Matching Method Construct a table recording observed cases. Use table lookup to classify new data. Problem: For realistic problems, exact matching cant be used. 8 people and 54 possible feature vectors: 15% chance of finding an exact match. Another example: 10 6 Examples 12 features 5 values per feature 10 6 / %

16 How can we do the classification? Nearest-neighbor method (but only if we can establish a distance between feature vectors). Use identification trees: An identification tree is a decision tree in which each set of possible conclusions is implicitly established by a list of samples of known class.

17 An ID tree consistent with the data Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

18 Another consistent ID tree Katie Hair Color Alex Blond Red Brown Sunburned Not Sunburned Hair Color Emily Red Suit Color Height Annie Short Suit Color Average Dana Pete Tall Sarah John Yellow Red Blue YellowBlue Red Brown

19 An idea Select tests that divide as well as possible people into sets with homogeneous labels Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond RedBrown Lotion used Sarah Annie Emily Pete John Dana Alex Katie No Yes Height Suit Color

20 Then among blonds... Lotion used Sarah Annie Dana Katie No Yes Height Katie Annie Sarah Dana Short Av Tall Suit Color Sarah Katie Dana Annie YellowRed Blue This is perfectly homogeneous...

21 Combining these two together … Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

22 Decision Tree Learning Algorithm

23 Problem: –For practical problems, it is unlikely that any test will produce one completely homogeneous subset. Solution: –Minimize a measure of inhomogeneity or disorder. –Available from information theory.

24 Information Lets say we have a question which has n possible answers and call them v i. Lets say that answer v i occurs with probability P(v i ), then the information content (entropy) measured in bits of knowing the answer is: One bit of information is enough information to answer a yes or no question. E.g. consider flipping a fair coin, how much information do you have if you know which side comes up? I(½, ½) = - (½ log 2 ½ + ½ log 2 ½) = 1bit

25 Information at a node In our decision tree for a given feature (e.g. hair color), we have –b: number of branches (e.g. possible values for the feature) –N b : number of samples in branch –N p : number of samples in all branches –N bc : number of samples in class c in branch b. Using frequencies as an estimate of the probabilities, we have For a single branch, the information is simply

26 Example Consider a single branch (b=1) which only contains members of two classes A and B. –If half of the points belong to A and half belong to B: –What if all the points belong to A (or to B): We like the latter situation since the branches are homogeneous, so less information is needed to make a decision (maximize information gain).

27 What is the amount of information required for classification after we have used the hair test? Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond RedBrown - 2/4 log 2 2/4 = 1 -1 log log 2 0 = log /3 log 2 3/3 = 0 Information = 4/8*1 + 1/8*0 + 3/8*0 = 0.5

28 Selecting top level feature Using the 8 samples we have so far, we get: TestInformation Hair 0.5 Height 0.69 Suit Color 0.94 Lotion 0.61 Hair wins, least additional information needed for rest of classification. This is used to build the first level of the identification tree: Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond RedBrown

29 Selecting second level feature Lets consider the remaining features for the blond branch (4 samples) TestInformation Height 0.5 Suit Color 1 Lotion 0 Lotion wins, least additional information. Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond RedBrown

30 Thus we get to the tree we had arrived at earlier Hair Color Lotion Used Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown No Yes Sunburned Not Sunburned Sunburned Not Sunburned

31 Using the Identification tree as a classification procedure Hair Color Lotion Used SunburnOK Sunburn OK Blond RedBrown No Yes Rules: If Blond and uses lotion, then OK If Blond and does not use lotion, then gets burned If red-haired, then gets burned If brown hair, then OK Rules: If Blond and uses lotion, then OK If Blond and does not use lotion, then gets burned If red-haired, then gets burned If brown hair, then OK

32 Performance measurement How do we know that h f ? 1.Use theorems of computational/statistical learning theory 2.Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size


Download ppt "Learning II Introduction to Artificial Intelligence CS440/ECE448 Lecture 21."

Similar presentations


Ads by Google