Download presentation

Published byRobin Conant Modified over 4 years ago

1
**Introduction to Artificial Intelligence CS440/ECE448 Lecture 21**

Learning II Introduction to Artificial Intelligence CS440/ECE448 Lecture 21

2
**Last lecture This lecture**

The (in)efficiency of exact inference with Bayes nets The learning problem Decision trees This lecture Identification trees Neural networks: Perceptrons Reading Chapters 18 and 20

6
**Inductive learning method**

Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

7
**Inductive learning method**

Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

8
**Inductive learning method**

Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting:

9
**Inductive learning method**

Construct/adjust h to agree with f on training set. (h is consistent if it agrees with f on all examples) E.g., curve fitting: Ockham’s razor: prefer the simplest consistent hypothesis.

10
Inductive Learning Given examples of some concepts and a description (features) for these concepts as training data, learn how to classify subsequent descriptions into one of the concepts. Concepts (classes) Features Training set Test set Here, the function has discrete outputs (the classes).

11
**“Should I play tennis today?”**

Decision Trees “Should I play tennis today?” Outlook Humidity Wind No Yes Sunny Rain Overcast High Low Strong Weak Note: A decision tree can be expressed as a disjunction of conjunctions (Outlook = sunny) (Humidity = normal) (Outlook = overcast) (Wind=Weak)

12
**Learning Decision Trees**

Inductive Learning. Given a set of positive and negative training examples of a concept, can we learn a decision tree that can be used to appropriately classify other examples? Identification Trees: ID3 [ Quinlan, 1979 ].

13
**What on Earth causes people to get sunburns?**

I don’t know, so let’s go to the beach and collect some data.

14
**There are 3 x 3 x 3 x 2 = 54 possible feature vectors**

Sunburn data Name Hair Height Swim Suit Color Lotion Result Sarah Blond Average Yellow No Sunburned Dana Tall Red Yes Fine Alex Brown Short Annie Emily Blue Pete John Katie There are 3 x 3 x 3 x 2 = 54 possible feature vectors

15
**Exact Matching Method 106/512 0.4%**

Construct a table recording observed cases. Use table lookup to classify new data. Problem: For realistic problems, exact matching can’t be used. 8 people and 54 possible feature vectors: 15% chance of finding an exact match. Another example: 106 Examples 12 features 5 values per feature 106/512 0.4%

16
**How can we do the classification?**

Nearest-neighbor method (but only if we can establish a distance between feature vectors). Use identification trees: An identification tree is a decision tree in which each set of possible conclusions is implicitly established by a list of samples of known class.

17
**An ID tree consistent with the data**

Hair Color Blond Brown Red Alex Pete John Emily Lotion Used Yes No Sarah Annie Dana Katie Sunburned Not Sunburned

18
**Another consistent ID tree**

Height Sunburned Not Sunburned Tall Dana Pete Short Average Hair Color Suit Color Brown Red Red Blond Yellow Blue Alex Hair Color Suit Color Sarah Red Blue Red Yellow Brown Annie Emily Katie John

19
**An idea Select tests that divide as well as possible people into sets with homogeneous labels**

Hair Color Lotion used No Blond Yes Red Brown Sarah Annie Emily Pete John Sarah Annie Dana Katie Dana Alex Katie Alex Pete John Emily Height Suit Color

20
**Then among blonds... This is perfectly homogeneous... Height Katie**

Annie Sarah Dana Short Av Tall Lotion used Sarah Annie Dana Katie No Yes This is perfectly homogeneous... Suit Color Sarah Katie Dana Annie Yellow Red Blue

21
**Combining these two together …**

Hair Color Blond Brown Red Alex Pete John Emily Lotion Used Yes No Sarah Annie Dana Katie Sunburned Not Sunburned

22
**Decision Tree Learning Algorithm**

23
Problem: For practical problems, it is unlikely that any test will produce one completely homogeneous subset. Solution: Minimize a measure of inhomogeneity or disorder. Available from information theory.

24
Information Let’s say we have a question which has n possible answers and call them vi. Let’s say that answer vi occurs with probability P(vi), then the information content (entropy) measured in bits of knowing the answer is: One bit of information is enough information to answer a yes or no question. E.g. consider flipping a fair coin, how much information do you have if you know which side comes up? I(½, ½) = - (½ log2½ + ½ log2½) = 1bit

25
Information at a node In our decision tree for a given feature (e.g. hair color), we have b: number of branches (e.g. possible values for the feature) Nb: number of samples in branch Np: number of samples in all branches Nbc: number of samples in class c in branch b. Using frequencies as an estimate of the probabilities, we have For a single branch, the information is simply

26
Example Consider a single branch (b=1) which only contains members of two classes A and B. If half of the points belong to A and half belong to B: What if all the points belong to A (or to B): We like the latter situation since the branches are homogeneous, so less information is needed to make a decision (maximize information gain).

27
**Information = 4/8*1 + 1/8*0 + 3/8*0 = 0.5**

What is the amount of information required for classification after we have used the hair test? Hair Color Blond Red Brown Sarah Annie Dana Katie Alex Pete John Emily -1 log21 -0 log20 = 0 - 0 log20 - 3/3 log23/3 = 0 - 2/4 log22/4 = 1 Information = 4/8*1 + 1/8*0 + 3/8*0 = 0.5

28
**Selecting top level feature**

Using the 8 samples we have so far, we get: Test Information Hair Height Suit Color Lotion Hair wins, least additional information needed for rest of classification. This is used to build the first level of the identification tree: Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown

29
**Selecting second level feature**

Hair Color Sarah Annie Dana Katie Emily Alex Pete John Blond Red Brown Let’s consider the remaining features for the blond branch (4 samples) Test Information Height Suit Color 1 Lotion Lotion wins, least additional information.

30
**Thus we get to the tree we had arrived at earlier**

Hair Color Blond Brown Red Alex Pete John Emily Lotion Used Yes No Sarah Annie Dana Katie Sunburned Not Sunburned

31
**Using the Identification tree as a classification procedure**

Hair Color Blond Red Brown Lotion Used OK Sunburn Yes No Sunburn OK Rules: If Blond and uses lotion, then OK If Blond and does not use lotion, then gets burned If red-haired, then gets burned If brown hair, then OK

32
**Performance measurement**

How do we know that h ≈ f ? Use theorems of computational/statistical learning theory Try h on a new test set of examples (use same distribution over example space as training set) Learning curve = % correct on test set as a function of training set size

Similar presentations

Presentation is loading. Please wait....

OK

CS690L Data Mining: Classification

CS690L Data Mining: Classification

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google