Download presentation

Presentation is loading. Please wait.

Published byCarter McLean Modified over 2 years ago

1
Case Based Reasoning Lecture 3: CBR Case-Base Indexing

2
Outline Indexing CBR case knowledge Why might we want an index? Decision tree indexes C4.5 algorithm Summary

3
Why might we want an index? Efficiency Similarity matching is computationally expensive for large case-bases Similarity matching can be computationally expensive for complex case representations Relevancy of cases for similarity matching some features of new problem may make certain cases irrelevant despite being very similar Cases are pre-selected from case-base Similarity matching is applied to subset of cases

4
What to index? Client Ref #: 64 Client Name: John Smith Address: 39 Union Street Tel: Photo: Age: 37 Occupation: IT Analyst Income: £ … Unindexed features Indexed features Case Features are: - Indexed - Unindexed

5
Indexed vs Unindexed Features Indexed features are: used for retrieval are predictive of the cases solution Unindexed feature are: not used for retrieval not predictive of the cases solution provide valuable contextual information and lessons learned

6
Playing Tennis Example (case-base) OutlookTemperatureHumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo CloudyHotHighFalseYes RainyMildHighFalseYes RainyCoolNormalFalseYes RainyCoolNormalTrueNo CloudyCoolNormalTrueYes SunnyMildHighFalseNo SunnyCoolNormalFalseYes RainyMildNormalFalseYes SunnyMildNormalTrueYes CloudyMildHighTrueYes CloudyHotNormalFalseYes RainyMildHighTrueNo

7
Decision Tree (Index) for Playing Tennis outlook Yes sunny cloudy rainy humidity No Yes high normal windy NoYes truefalse

8
Choosing the Root Attribute humidity Yes No Yes No temperature Yes No Yes No Yes No outlook Yes No Yes No windy Yes No Yes No Which attribute is best for the root of the tree? - the one that gives the best information gain - in this case outlook (as we are going to see) sunny cloudy rainyhighlowtruefalsehot mild cold

9
Building Decision Trees – C4.5 Algorithm Based on the Information Theory (Shannon 1948) Divide and conquer strategy Choose attribute for root node Create branch for each value of that attribute Split cases according to branches Repeat process for each branch until all cases in the branch have the same class Assumption: simplest tree which classifies the cases is best

10
Entropy of a set of cases Playing Tennis Example: S is the set of 14 cases We want to classify the cases according to the values of Play, i.e., Yes and No in this example. the proportion of Yes cases is 9 out of 14: 9/14 = 0.64 the proportion of No cases is 5 out of 14: 5/14 = 0.36 The Entropy measures the impurity of S Entropy (S) = (log ) – 0.36(log ) = -0.64(-0.644)-0.36(-1.474) = = 0.94 OutlookTemperatureHumidityWindyPlay SunnyHotHighFalseNo CloudyHotHighFalseYes …………… Yes case No case 14 cases

11
Entropy of a set of cases S is a set of cases A is a feature Play in the example {S 1... S i … S n } are the partitions of S according to values of A Yes and No in the example {P 1... P i … P n } are the proportions of {S 1... S i … S n } in S

12
Gain of an attribute Calculate Gain (S, A) for each attribute A expected reduction in entropy due to sorting on A Choose the attribute with highest gain as root of tree Gain (S, A) = Entropy(S) – Expectation(A) {S 1,..., S i, …, S n } = partitions of S according to values of attribute A n = number of attributes A |S i | = number of cases in the partition S i |S| = total number of cases in S

13
Which attribute is root? If Outlook is made root of the tree There are 3 partitions of the cases S 1 for Sunny, S 2 for Cloudy, S 3 for Rainy S 1 (Sunny)= {cases 1,2,8,9,11} |S 1 | = 5 In these 5 cases values for Play are 3 No and 2 Yes Entropy(S 1 ) = - 2/5 (log 2 2/5) – 3/5(log 2 3/5) = 0.97 Similarly Entropy(S 2 )= 0 Entropy(S 3 )= 0.97 OutlookTempe rature HumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo CloudyHotHighFalseYes RainyMildHighFalseYes RainyCoolNormalFalseYes RainyCoolNormalTrueNo CloudyCoolNormalTrueYes SunnyMildHighFalseNo SunnyCoolNormalFalseYes RainyMildNormalFalseYes SunnyMildNormalTrueYes CloudyMildHighTrueYes CloudyHotNormalFalseYes RainyMildHighTrueNo

14
Choosing the Root Attribute humidity Yes No Yes No temperature Yes No Yes No Yes No outlook Yes No Yes No windy Yes No Yes No Which attribute is best for the root of the tree? - the one that gives the best information gain - in this case outlook (as we are going to see) sunny cloudy rainyhighlowtruefalsehot mild cold

15
Which attribute is root? Gain(S, Outlook) = Entropy(S) – Expectation(Outlook) = Gain(S, Outlook) = 0.94 – [5/14 * /14 * 0 + 5/14 * 0.97] = Similarly Gain(S, Temperature)= Gain(S, Humidity)= Gain(S, Windy)= Gain(S, Outlook) is the highest gain Outlook should be the root of the decision tree (index)

16
Repeat for Sunny Node outlook Yes sunny cloudy rainy ? temperature No Yes No Yes hotmild cold outlook Yes sunny cloudy rainy ? windy Yes No Yes No falsetrue outlook Yes sunny cloudy rainy ? humidity No Yes highnormal

17
Repeat for Rainy Node outlook Yes sunny cloudy rainy humidity NoYes highnormal MildHighFalseYes CoolNormalFalseYes CoolNormalTrueNo MildNormalFalseYes MildHighTrueNo

18
Decision Tree (Index) for Playing Tennis outlook Yes sunny cloudy rainy humidity No Yes high normal windy NoYes truefalse

19
Case Retrieval via DTree Index Typical implementation e.g. Case-Base indexed using a decision-tree Cases are stored in the index leaves… DTree created from cases Automated indexing of case-base

20
Summary Decision tree is built from cases Decision tree is often used for problem-solving In CBR, decision tree is used to partition cases Similarity matching is applied to cases in leaf node Indexing pre-selects relevant cases for k-NN retrieval BRING CALCULATOR on MONDAY

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google