Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6 - Basic Similarity Topics

Similar presentations


Presentation on theme: "Chapter 6 - Basic Similarity Topics"— Presentation transcript:

1 Chapter 6 - Basic Similarity Topics
Case-based reasoning

2 Introduction Common term in everyday language, where two objects usually are considered similar if they look or sound similar Similarity is a core concept within CBR From a CBR perspective: «Two problems are similar if they have similar solutions» Not as clear defined as the term equality Accepted that similarity is subjective and requires approximate rather than exact reasoning Similarity is a common term in everyday language. In a popular view, two objects are usually considered similar if they look or sound similar. In case-based reasoning, similarity is a core concept. From a CBR-perspective, two problems are similar if they have similar solutions. But this definition doesn’t actually tell use anything about what the similarity concept in itself means. Similarity is not as clearly defined as the term equality and the accepted view is that similarity is subjective and requires approximate rather than exact reasoning.

3 Similarity and case representation
Similarity measures are defined to compare objects (cases) The measures operate on the case representation Similarity is the essential function used for retrieval and the link between case representation and retrieval Only consider attribute-value case representations and attribute-based similarity measures Similarity measures are defined to compare objects, and they operate on the case representation Similarity is the essential function used for retrieval, meaning it is the link between the case representation and caseretrieval There exist a variety of both case representations and similarity measures. But to keep thing simple, in this chapter, we are only considering attribute-value representations and attribute-based similarity measures

4 The mathematics of similarity
Two influencing factors: Fuzzy sets offers a background to model inexact expressions. Do not deal with classical yes-or-no answers, but rather ones that have vague character Metrics are used in mathematics whenever approximations (rather than exact solutions) are involved. This make them suitable for modeling similarity Similarity measures may inherit and benefit from properties of these two factors. Examples of such properties are symmetry, transitivity, etc. Similarity measures are rooted in mathematics, and two of the most important mathematical influence factors of the similarity concept are fuzzy sets and metrics

5 Two mathematical models of similarity
Similarity as a relation: Qualitative measure comparing different similarities Example: two objects are more similar to each other than two other objects R(x,y,z) ⇔ «x is at least as similar to y as x is to z» Allows the definition the nearest neighbour concept The nearest neighbor of x is the y for which the R-relation above holds for all z There are two mathematical ways to represent similarity; as a relation or as a function. The relational form is a qualitative type of measure . It just compares different similarities; for instance saying that two objects are more similar to each other than two other objects Example of k-NN where k=3

6 Two mathematical models of similarity
Similarity as a function: Make similarity quantitative by expressing how similar two objects are Assigning a number/degree of similarity to pairs of objects Def.: A similarity measure for a problem space P is a function sim: P x P → [0,1] Example of similarity functions and how they may be compared sim (x,y) ≥ sim (x,z) ⇔ «x is at least as similar to y as x to z» Similarity as a function is more quantitative, e.g. but putting a number on the similarity

7 Distances Proxy to similarities, both look at the same object from different point of view In most situations we can freely choose between distances and similarities It is possible to convert between similarities and distances. However, such a transformation may not necessarily conserve the exact numerical similarity/distance values Another concept that is closely related to similarity measures and also mentioned in the book is distance measures Distances can be seen as a proxy to similarities; they both look at the same object from different points of view

8 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarities Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities There is no universal similarity measure, but we can still distinguish different types of similarities. The book lists 7 different types of measures, which you van see on the left hand side. There is no sharp boundary between the types and they may overlap. On the right hand side is a table of some concrete examples to illustrate what elementary similarity measures may look like. They are just for exemplification, and I will not go more deeply into them.

9 Types of similarity measures
Measures similarity by counting certain occurrences in the representation Count the number of family members for tax purposes Example: Hamming measures Counting similarities Metric similarities Transformation similarities Structure-oriented similarities Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities The first type of similarity measure is counting similarities, which operates by counting occurrences of certain attributes in the representation. For instance one could count the number of family members for tax purposes An examples of a measure of this type is the Hamming measure, calculating the similarity by counting the number of matching attributes between two objects

10 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarities Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities Next we have metric similarities. They are applicable to attributes with numerical values, and arise as variations of Euclidean metrics Typically, metric similarities are distance functions and represent a travel view of the similarity between objects Applicable to attributes with numerical values Arise as variations of Euclidean metrics Typically distance functions that represent a travel view

11 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarities Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities The third is transformation similarities. This measure counts the number of operations required to transform one object into another, e.g. how many bits that must be changed to convert one object to the other An example of a measure that is often used for this is the Levenshtein distance. Possible change actions when using the levenshtein measure are insertion, deletion and modification, and the number of changes required to make the objects equal is counted The measure counts the number of operations required to transform one object into another Example: Levenshtein distance. Uses insertion, deletion and modification as possible change actions and counts the number of changes required

12 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarities Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities For structure-oriented similarities, the structure in which the knowledge is presented plays a role. Thus would often be the case for object-oriented representations. Structure-oriented similarity refers mainly to attributes that have symbolic attribute values The structure in which the knowledge is presented plays a role, e.g. object-orient representation Refers mainly to attributes that have symbolic attribute values from with the attribute-based structure is built

13 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarity Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities Information-oriented similarities focus on the information and knowledge contained within the object. This type om similarity is often used for texts. Then two objects are considered similar if they provide similar information to the user Information and knowledge plays an essential role Often used for texts; considered similar if they provide similar information to the user

14 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarity Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities Relevance-oriented similarities operates by putting weights on the attributes in a representation to reflect the attribute’s importance or relevance for the overall similarity This measure is not actually a type in itself, but rather a add-on to the other types Weight the importance of different aspects contributing to similarity Not a type in itself, but rather may rather be used in combination with the other types

15 Types of similarity measures
Counting similarities Metric similarities Transformation similarities Structure-oriented similarity Information-oriented similarities Relevance-oriented similarities Dynamic-oriented similarities The last type of measure is dynamic-oriented similarities. They operate on and compare dynamic processes Consider and compare dynamic processes

16 Local-global principle of similarity
Useful when dealing with complex structures The principle: Each object is constructed from atomic parts, by some construction process. Possible to compare the atomic parts by using local measures, before comparing the more complex structure. Determine the influence of each one of the local parts should have on the global measure by assigning weights to each part Difficult problem to determine the weights Then we have the local-global principle. This principle was also presented at the seminar last week, and we will now apply it for similarity purposes. The principle is saying that each object is constructed from atomic parts, by some construction process. This has turned out to be useful when dealing with complex structures. It makes it possible to first compare the atomic parts by using so-called local measures, before comparing the more complex structure. When the local measures are combined to reflect the global view

17 Virtual attributes A problem with the local-global principle arises when there are dependencies between the attributes that influence similarity Example: bank loans Reliability for getting a loan depends on both income and spending Assigning weights to independent attributes make little sense Introduce additional attributes that reflect the dependencies explicitly Such attributes are defined in terms of the given attributes and are called virtual attributes Allows simpler similarity measure A problem with the local-global principle arises when there are dependencies between the attributes that influence similarity Example: bank loans, where the reliability for getting a loan depends on both income and spending Assigning weights to independent attributes make little sense We could instead introduce additional attributes that reflect the dependencies explicitly Such attributes are defined in terms of the given attributes and are called virtual attributes and allows simpler similarity measure

18 Which similarity measure should be used?
Some influencing factors for the choice are: Case representation Size of case base Efficiency needed for retrieval Number of values in the domain of the attributes Useful guidelines: Try to ensure compatibility between case representation and the similarity measure If possible, apply the local-global principle for complex structures This issue is important, but only briefly describe in this chapter. It is described more deeply in chapter 11.

19 Summary Link between case representation and retrieval
There is no clear definition of the concept and there exists a variety of different types of measures Similarity measures are heavily influenced by mathematics. Two mathematical ways to represent similarity is as a function or as a relation The local-global principle may also apply to similarity measures What type of similarity measure that should be used depends on the objects to be compared

20 Comments Few comparisons, missing an overview of the differences between the different types of similarity measures Mainly descriptive presentation, making it difficult to distinguish between the different measures What that the implications of choosing one type of measure over the other In a later chapter?


Download ppt "Chapter 6 - Basic Similarity Topics"

Similar presentations


Ads by Google