Source for Information Gain Formula Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations.

Slides:



Advertisements
Similar presentations
Functions Reading: Epp Chp 7.1, 7.2, 7.4
Advertisements

ESSENTIAL CALCULUS CH11 Partial derivatives
Introduction to Basic Statistical Methodology. CHAPTER 1 ~ Introduction ~
Copyright © Cengage Learning. All rights reserved.
1.  Detailed Study of groups is a fundamental concept in the study of abstract algebra. To define the notion of groups,we require the concept of binary.
Mathematics. Matrices and Determinants-1 Session.
Copyright © Cengage Learning. All rights reserved. CHAPTER 1 SPEAKING MATHEMATICALLY SPEAKING MATHEMATICALLY.
1 CLUSTERING  Basic Concepts In clustering or unsupervised learning no training data, with class labeling, are available. The goal becomes: Group the.
Functions Section 2.3 of Rosen Fall 2008
CES 514 – Data Mining Lecture 8 classification (contd…)
CSE115/ENGR160 Discrete Mathematics 02/22/11 Ming-Hsuan Yang UC Merced 1.
Discrete Structures Chapter 5 Relations and Functions
CSE115/ENGR160 Discrete Mathematics 02/21/12
Distance Measures Tan et al. From Chapter 2.
Discrete Math for CS Chapter 5: Functions. Discrete Math for CS New Relation Operations: Given R, a relation on A x B, we define the inverse relation,
Copyright © Cengage Learning. All rights reserved.
1 Section 1.8 Functions. 2 Loose Definition Mapping of each element of one set onto some element of another set –each element of 1st set must map to something,
Distance Measures Tan et al. From Chapter 2. Similarity and Dissimilarity Similarity –Numerical measure of how alike two data objects are. –Is higher.
Efficient Case Retrieval Sources: –Chapter 7 – –
Mathematics for Business (Finance)
Chapter 1 – Functions and Their Graphs
Math 3121 Abstract Algebra I Lecture 3 Sections 2-4: Binary Operations, Definition of Group.
FUNCTION Discrete Mathematics Asst. Prof. Dr. Choopan Rattanapoka.
Mathematics. Session Set, Relation & Function Session - 3.
Efficient Case Retrieval Sources: –Chapter 7 – –
INVERSE FUNCTIONS (reversing the process) Consider the following three functions f: A  B A B abcabc defdef abcabc defdef abcabc defdef Suppose we now.
Fall 2015 COMP 2300 Discrete Structures for Computation Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1.
Programming Project (Last updated: August 31 st /2010) Updates: - All details of project given - Deadline: Part I: September 29 TH 2010 (in class) Part.
Chapter 3 Functions Functions provide a means of expressing relationships between variables, which can be numbers or non-numerical objects.
Mathematical preliminaries Episode 2 0 Sets Sequences Functions Relations Strings.
Week 7 - Friday.  What did we talk about last time?  Set disproofs  Russell’s paradox  Function basics.
Similarity in CBR (Cont’d) Sources: –Chapter 4 – –
Chapter 2 Mathematical preliminaries 2.1 Set, Relation and Functions 2.2 Proof Methods 2.3 Logarithms 2.4 Floor and Ceiling Functions 2.5 Factorial and.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Rational Functions and Models
Similarity in CBR Sources: –Chapter 4 – –
Functions Section 2.3 of Rosen Spring 2012 CSCE 235 Introduction to Discrete Structures Course web-page: cse.unl.edu/~cse235 Questions: Use Piazza.
Functions1 Elementary Discrete Mathematics Jim Skon.
Functions Reading: Chapter 6 (94 – 107) from the text book 1.
Copyright © Cengage Learning. All rights reserved. CHAPTER 8 RELATIONS.
Discrete Mathematics R. Johnsonbaugh
Discrete Mathematics Relation.
11 DISCRETE STRUCTURES DISCRETE STRUCTURES UNIT 5 SSK3003 DR. ALI MAMAT 1.
CHAPTER 3 FUZZY RELATION and COMPOSITION. 3.1 Crisp relation Product set Definition (Product set) Let A and B be two non-empty sets, the product.
IDSS: Overview of Themes AI  Introduction  Overview IDT  Attribute-Value Rep.  Decision Trees  Induction CBR  Introduction  Representation  Similarity.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Calculus: Hughs-Hallett Chap 5 Joel Baumeyer, FSC Christian Brothers University Using the Derivative -- Optimization.
Chapter 6 - Basic Similarity Topics
Calculus: Hughs-Hallett Chap 5 Joel Baumeyer, FSC Christian Brothers University Using the Derivative -- Optimization.
Similarity in CBR (Cont’d) Sources: –Chapter 4 – –
CHapter1 Section 2 Functions and their properties.
Introduction to Optimization
MTH108 Business Math I Lecture 8. Chapter 4 Mathematical Functions.
From the population to the sample The sampling distribution FETP India.
Section 2.6 Inverse Functions. Definition: Inverse The inverse of an invertible function f is the function f (read “f inverse”) where the ordered pairs.
Functions CSRU1400 Spring 2008Ellen Zhang 1 CISC1400, Fall 2010 Ellen Zhang.
Ch1: Graphs y axis x axis Quadrant I (+, +)Quadrant II (-, +) Quadrant III (-, -)Quadrant IV (+, -) Origin (0, 0) (-6,-3) (5,-2) When distinct.
Review of Complex Numbers A complex number z = (x,y) is an ordered pair of real numbers; x is called the real part and y is called the imaginary part,
Symmetry and Coordinate Graphs Section 3.1. Symmetry with Respect to the Origin Symmetric with the origin if and only if the following statement is true:
Discrete Mathematics Lecture # 17 Function. Relations and Functions  A function F from a set X to a set Y is a relation from X to Y that satisfies the.
Section 4.2 Functions. Announcement Exam #3 has been rescheduled –Monday, April 4 th –Chapters 3 and 4 –Practice sheet for Chapter 3 now posted.
School of Computer Science & Engineering
CSE15 Discrete Mathematics 02/27/17
King Fahd University of Petroleum and Minerals
AP Calculus Chapter 1 Section 2
CHAPTER 2: More on Functions
Chapter 2 More on Functions.
Relations and Functions
Introduction to Basic Statistical Methodology
Formalizing Relations and Functions
Presentation transcript:

Source for Information Gain Formula Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig Chapter 18: Learning from Observations

Similarity in CBR (Cont’d) Sources: –Chapter 4 – –

Other Similarity Metrics Suppose that we have cases represented as attribute-value pairs (e.g., the restaurant domain) Suppose initially that the values are binary We want to define similarity between two cases of the form: X = (X 1, …, X n ) where X i = 0 or 1 Y = (Y 1, …,Y n ) where Y i = 0 or 1

Preliminaries Let:  A =  (i=1,n) X i Y i  B =  (i=1,n) X i (1-Y i )  C =  (i=1,n) (1-X i )Y i  D =  (i=1,n) (1-X i ) (1-Y i )  Then, A + B + C + D = (number of attributes for which X i =1 and Y i = 1) (number of attributes for which X i =1 and Y i = 0) (number of attributes for which X i =0 and Y i = 1) (number of attributes for which X i =0 and Y i = 0) n A+D = B+C= “matching attributes” “mismatching attributes”

Hamming Distance H(X,Y) = n –  (i=1,n) X i Y i –  (i=1,n) (1-X i )(1-Y i ) Properties:  Range of H:  H counts the mismatch between the attribute values  H is a distance metric:  H((1-X 1, …, 1-X n ), (1-Y 1, …,1-Y n )) = [0,n] H(X,X) = 0 H(X,Y) = H(Y,X) H((X 1, …, X n ), (Y 1, …,Y n ))

Simple-Matching-Coefficient (SMC)  H(X,Y) = n – (A + D) = B + C Another distance-similarity compatible function is f(x) = 1 – x/max (where max is the maximum value for x)  We can define the SMC similarity, sim H : sim H (X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Homework (I): Show that f(x) is order inverting: if x f(y) Proportion of the difference # of mismatches

Simple-Matching-Coefficient (SMC) (II) If we use on sim H (X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D)  Monotonic:  If A  A’ then:  If B  B’ then:  If C  C’ then:  If D  D’ then: factor(A,B,C,D)  factor(A’,B,C,D) factor(A,B’,C,D)  factor(A,B,C,D) factor(A,B,C’,D)  factor(A,B,C,D) factor(A,B,C,D)  factor(A,B,C,D’)  Symmetric: sim H (X,Y) = sim H (Y,X)

Variations of the SMC The hamming similarity assign equal value to matches (both 0 or both 1) There are situations in which you want to count different when both match with 1 as when both match with 0  Thus, sim((1-X 1, …, 1-X n ), (1-Y 1, …,1-Y n )) = sim((X 1, …, X n ), (Y 1, …,Y n )) may not hold  Example: Two symptoms of patients are similar if they both have fever (X i = 1 and Y i = 1) but not similar if neither have fever (X i = 0 and Y i = 0)  Specific attributes may be more important than other attributes Example: manufacturing domain: some parts of the workpiece are more important than others

Variations of SMC (III) We introduce a weight, , with 0 <  < 1: simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) sim  (X,Y) = (  (A+D))/ (  (A+D) + (1 -  )(B+C))  For which  is sim  (X,Y) = sim H (X,Y)?  = 0.5  sim  (X,Y) preserves the monotonic and symmetric conditions Homework(II): Show that sim  (X,Y) is monotonic

The similarity depends only from A, B, C and D (3) What is the role of  ? What happens if  > 0.5? If  < 0.5? sim  (X,Y) = (  (A+D))/ (  (A+D) + (1 -  )(B+C)) n  = 0.5  > 0.5  < 0.5 If  > 0.5 we give more weights to the matching attributes than to the miss- matching If  < 0.5 we give more weights to the miss- matching attributes than to the matching

Discarding 0-match Thus, sim((1-X 1, …, 1-X n ), (1-Y 1, …,1-Y n )) = sim((X 1, …, X n ), (Y 1, …,Y n )) may not hold Only when the attribute occurs (i.e., X i = 1 and Y i = 1 ) will contribute to the similarity  Possible definition of the similarity: sim = A / (A+ B+C)

Specific Attributes may be More Important Than Other Attributes Significance of the attributes varies Weighted Hamming distance: H W (X,Y) = 1 –  (i=1,n)  i X i Y i –  (i=1,n)  i (1-X i )(1-Y i )  There is a weight vector: (  1, …,  n ) such that  (i=1,n)  i = 1 Example: “Process planning: some features are more important than others”

Homework (Part III): Attributes May Have multiple Values X = (X 1, …, X n ) where X i  T i Y = (Y 1, …,Y n ) where Y i  T i Each T i is finite Define a formula for the Hamming distance in this context

Non Monotonic Similarity The monotony condition in similarity, formally, says that: sim(A,B)  sim(A’,B) always holds for any A and A’ such that A  A’ Informally the monotony condition can be expressed as: For any X, Y, X’ attribute-value vectors, If we obtain X’ by modifying X on the value of one attribute such that X’ and Y have the same value on that attribute then: sim(X,Y) sim(X’,Y) 

Non Monotonic Similarity (2) sim H (X,Y) =  (i=1,n) eq(X i,Y i ) / n  Is the hamming distance monotonic? Yes  Consider the XOR function:  (0,0) and (1,1) are on the same class (+)  (0,1) and (1,0) are on the same class (-)  Thus d((1,1),(1,0)) > d((1,1),(0,0))  Is this monotonic? No

Non Monotonic Similarity (3) You may think: “well that was mathematics, how about real world?” Suppose that we have two interconnected batteries B and B’ and 3 lamps X, Y and Z that have the following properties:  If X is on, B and B’ work  If Y is on, B or B’ work  If Z is on, B works Ok Fail Fail Ok Fail Fail Situation X Y Z B B’ Thus: sim(1,3) > sim(1,2) Non monotonic!

Tversky Contrast Model Defines a non monotonic distance Comparison of a situation S with a prototype P (i.e, a case) S and P are sets of features The following sets:  A = S  P  B = P – S  C = S – P A S P C B

Tversky Contrast Model (2) Tversky-distance: Where f:  [0,  ) f, , , and  are fixed and defined by the user Example:  If f(A) = # elements in A   =  =  = 1  T counts the number of elements in common minus the differences  The Tversky-distance is not symmetric T(P,S) =  f(A) -  f(B) -  f(C)

Local versus Global Similarity Metrics In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine In such situations a reasonable approach to define a global similarity sim  (x,y) is to “aggregate” the local similarity metrics sim i (x i,y i ). A widely used practice sim  (x,y) to increate monotonically with each sim i (x i,y i ). What requirements should we give to sim  (x,y) in terms of the use of sim i (x i,y i )?

Local versus Global Similarity Metrics (Formal Definitions) A local similarity metric on an attribute T i is a similarity metric sim i : T i  T i  [0,1] A function  : [0,1] n  [0,1] is an aggregation function if:   (0,0,…,0) = 0   is monotonic non-decreasing on every argument Given a collection of n similarity metrics sim 1, …, sim n, for attributes taken values from T i, a global similarity metric, is a similarity metric sim:V  V  [0,1], V in T 1  …  T n, such that there is an aggregation  function with: sim(X,Y) = sim  (X,Y) =  (sim 1 (X 1,Y 1 ), …,sim n (X n,Y n ))  (X 1,X 2,…,X n ) = (X 1 +X 2 +…+X n )/n Example: