Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3: CBR Case-Base Indexing

Similar presentations


Presentation on theme: "Lecture 3: CBR Case-Base Indexing"— Presentation transcript:

1 Lecture 3: CBR Case-Base Indexing
Case Based Reasoning Lecture 3: CBR Case-Base Indexing

2 Outline Indexing CBR case knowledge Why might we want an index?
Decision tree indexes C4.5 algorithm Summary

3 Why might we want an index?
Efficiency Similarity matching is computationally expensive for large case-bases Similarity matching can be computationally expensive for complex case representations Relevancy of cases for similarity matching some features of new problem may make certain cases irrelevant despite being very similar Cases are pre-selected from case-base Similarity matching is applied to subset of cases

4 What to index? Case Features are: Indexed Unindexed Unindexed features
Client Ref #: 64 Client Name: John Smith Address: 39 Union Street Tel: Photo: Age: 37 Occupation: IT Analyst Income: £ 20000 Case Features are: Indexed Unindexed Unindexed features Indexed features

5 Indexed vs Unindexed Features
Indexed features are: used for retrieval are predictive of the case’s solution Unindexed feature are: not used for retrieval not predictive of the case’s solution provide valuable contextual information and lessons learned

6 Playing Tennis Example (case-base)
Outlook Temperature Humidity Windy Play Sunny Hot High False No True Cloudy Yes Rainy Mild Cool Normal

7 Decision Tree (Index) for Playing Tennis
outlook sunny rainy cloudy Yes windy humidity normal high true false No Yes No Yes

8 Choosing the Root Attribute
outlook humidity temperature windy sunny rainy high low hot cold true false cloudy mild Yes No Yes Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Which attribute is best for the root of the tree? - the one that gives the best information gain - in this case outlook (as we are going to see)

9 Building Decision Trees – C4.5 Algorithm
Based on the Information Theory (Shannon 1948) Divide and conquer strategy Choose attribute for root node Create branch for each value of that attribute Split cases according to branches Repeat process for each branch until all cases in the branch have the same class Assumption: simplest tree which classifies the cases is best

10 Entropy of a set of cases
Playing Tennis Example: S is the set of 14 cases We want to classify the cases according to the values of “Play”, i.e., Yes and No in this example. the proportion of “Yes” cases is 9 out of 14: 9/14 = 0.64 the proportion of “No” cases is 5 out of 14: 5/14 = 0.36 The Entropy measures the impurity of S Entropy (S) = (log2 0.64) – 0.36(log2 0.36) = -0.64(-0.644)-0.36(-1.474) = = 0.94 Outlook Temperature Humidity Windy Play Sunny Hot High False No Cloudy Yes “No” case “Yes” case 14 cases

11 Entropy of a set of cases
S is a set of cases A is a feature Play in the example {S1 ... Si … Sn} are the partitions of S according to values of A Yes and No in the example {P1 ... Pi … Pn} are the proportions of {S1 ... Si … Sn} in S

12 Gain of an attribute Calculate Gain (S, A) for each attribute A
expected reduction in entropy due to sorting on A Choose the attribute with highest gain as root of tree Gain (S, A) = Entropy(S) – Expectation(A) {S1, ..., Si, …, Sn} = partitions of S according to values of attribute A n = number of attributes A |Si| = number of cases in the partition Si |S| = total number of cases in S

13 Which attribute is root?
If Outlook is made root of the tree There are 3 partitions of the cases S1 for Sunny, S2 for Cloudy, S3 for Rainy S1(Sunny)= {cases 1,2,8,9,11} |S1| = 5 In these 5 cases values for Play are 3 No and 2 Yes Entropy(S1) = - 2/5 (log2 2/5) – 3/5(log2 3/5) = 0.97 Similarly Entropy(S2)= 0 Entropy(S3)= 0.97 Outlook Temperature Humidity Windy Play Sunny Hot High False No True Cloudy Yes Rainy Mild Cool Normal

14 Choosing the Root Attribute
outlook humidity temperature windy sunny rainy high low hot cold true false cloudy mild Yes No Yes Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Which attribute is best for the root of the tree? - the one that gives the best information gain - in this case outlook (as we are going to see)

15 Which attribute is root?
Gain(S, Outlook) = Entropy(S) – Expectation(Outlook) = Gain(S, Outlook) = 0.94 – [5/14 * /14 * 0 + 5/14 * 0.97] = 0.247 Similarly Gain(S, Temperature)= 0.059 Gain(S, Humidity)= 0.051 Gain(S, Windy)= 0.048 Gain(S, Outlook) is the highest gain Outlook should be the root of the decision tree (index)

16 Repeat for Sunny Node outlook outlook sunny rainy cloudy sunny rainy
Yes ? temperature Yes ? windy sunny rainy hot mild cold cloudy false true No Yes No Yes Yes ? humidity Yes No Yes No high normal No Yes

17 Repeat for Rainy Node outlook sunny rainy cloudy Yes
Mild High False Yes Cool Normal False Yes Cool Normal True No Mild Normal False Yes Mild High True No humidity high normal No Yes

18 Decision Tree (Index) for Playing Tennis
outlook sunny rainy cloudy Yes windy humidity normal high true false No Yes No Yes

19 Case Retrieval via DTree Index
Typical implementation e.g. Case-Base indexed using a decision-tree Cases are “stored” in the index leaves… DTree created from cases Automated indexing of case-base

20 Summary Decision tree is built from cases
Decision tree is often used for problem-solving In CBR, decision tree is used to partition cases Similarity matching is applied to cases in leaf node Indexing pre-selects relevant cases for k-NN retrieval BRING CALCULATOR on MONDAY


Download ppt "Lecture 3: CBR Case-Base Indexing"

Similar presentations


Ads by Google