Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli.

Similar presentations


Presentation on theme: "Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli."— Presentation transcript:

1 Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli 2 1 IRPPS-CNR, via Nizza 128, Roma, Italy 2 IASI-CNR, viale Manzoni 30, Roma, Italy

2 Summary Motivation Related works Coding a Part-of Hierarchy using GML Similarity evaluation Conclusion

3 Motivation (1) In Geographic Information Systems (GISs) semantic similarity plays an important role, as it supports the identification of objects that are conceptually close, but not identical. GML (Geography Markup Language) is emerging as the dominant standard for exchanging geographic data across the Internet. A semantic similarity model facilitates comparison of entities and allows information retrieval and integration to handle semantically similar concepts. The goal of a similarity model is to obtain flexible and better matches between user-expected and system-retrieved information.

4 Motivation (2) Given the relevance of the Is-in relationship in the geographic context, we focus on GML elements organized according to Part-of (meronymic) hierarchies. The semantics essentially concerns parts which are similar to and inseparable from the whole.

5 Related works (1) Similarity of hierarchically related concepts has been widely investigated in the literature [Resnik] [Rodriguez, Egenhofer]. From the various proposals, we followed the probabilistic approach of Lin, which is based on the notion of information content and overcomes the drawbacks of the traditional edge-counting approach.

6 Related Works (2) Resnik proposes algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguities. Lin starts from the Resnik’ work and addresses also the information content of the comparing concepts.

7 Coding a Part-of Hierarchy with GML (1) The real world in the geographic domain can be represented as a set of features, and AbstractFeatureType codifies a geographic feature in GML. Its geometry type is an important property, it is given in the reference coordinate system and describes the extent, position or relative location of the represented concept.

8 Coding a Part-of Hierarchy with GML (2) The geometric types defined in GML provide the framework for modelling all the geographical concepts. By means of this framework it is possible to model, for example, the concepts composing a communication ways network, such as roads, rivers, canals and other communication infrastructures.

9 Coding a Part-of Hierarchy with GML (3) AbstractFeatureType MultiLineStringTypeMultiPolygonType…….. ComWayTypeRoadTypeRiverTypeCanalTypeNavSegmentTypeNNavSegmentType This figure shows an example of a type hierarchy that introduces concepts concerning communication infrastructures starting from the GML geometric types.

10 Coding a Part-of Hierarchy with GML (4) As mentioned in the motivation, due to the relevance of the Is-in relationship in the geographic context, the paper focuses on GML elements organized according to Part-of (meronymic) hierarchies. For instance, in our example a Part-of relationship exists among communication ways (ComWay) and roads, rivers and canals.

11 Coding a Part-of Hierarchy with GML (5) Usually, in the literature, Part-of hierarchies are modelled in XML using “sequences of elements”, and a similar approach could be followed in GML ComWay RiverRoad NavRiver NNavRiverNavCanalNNavCanal Canal CountryKind However, this approach does not permit to distinguish between elements of the Part-of hierarchy and other elements eventually defined out of the Part- of hierarchy, such as Kind and Country

12 Coding a Part-of Hierarchy with GML (6) In order to put in evidence meronymic relationships within the GML element hierarchy, a Part-of hierarchy could be modelled by introducing some special geographic types such as PartOfWayType, PartOfRivType, PartOfCanType PartOfWay RiverCanalRoad NavRiverNNavRiverNavCanalNNavCanal ComWay Country Kind PartOfRivPartOfCan Each special type is introduced for modelling a Part-of relationship between a geographic concept and their component concepts

13 Coding a Part-of Hierarchy with GML (7) ………………………….. This GML code shows how to put in evidence a meronymic relationship within the GML element hierarchy introducing a special geographic type such as PartOfWayType

14 Evaluating similarity (1) For evaluating concept similarity this paper combines and revisits: the information content approach [Lin98], a proposal inspired by the maximum weighted matching problem in bipartite graphs [FM02].

15 Evaluating similarity (2) The starting assumption is that the association of probabilities with the Part-of taxonomy allows us the notion of a weighted element hierarchy to be introduced. In particular, in our example the probabilities have been estimated in line with WordNet 2.0. For instance, below the concepts Road and River have been defined, with the related frequencies (the numbers in parenthesis). (95) Road – an open way (generally public) for travel and transportation (55) River – a large natural stream of water (larger than a creek)

16 Evaluating similarity (3) The probability of a concept The probability of a concept c is defined as: p(c) = freq(c)/N where freq(c) is the frequency of the concept c in the taxonomy, and N is the total number of concepts. In the example probabilities have been assigned according to WordNet.

17 Evaluating similarity (4) Example: Weighted Concept Hierarchy

18 Evaluating similarity (5) Following the standard approach of information theory [Ross76], the information content of a concept c can be quantified as: – log p(c) that is, as the probability increases, the informativeness decreases.

19 Evaluating similarity (6) The information content similarity (ics) of two concepts such as River and Canal is defined as: ics(River, Canal) = 2 log p(ComWay)/(log p(River)+log p(Canal)) = 0,72 where ComWay is the concept representing the maximum information content shared by River and Canal. According to the Lin’s approach the more information two concepts share, the more similar they are.

20 Evaluating similarity (7) Structural similarity (asim) Inspired by the maximum weighted matching problem in bipartite graphs, we have to identify the set of pairs of typed attributes such that is maximal the sum of the products of the information content similarity of the attributes and the related types.

21 Evaluating similarity (8) Example label:string length:integer flow:integer deepness:integer label:string profundity:integer capacity:integer length:integer RiverType CanalType

22 Evaluating similarity (9) In the previous example the set of pairs of attributes that maximizes the sum of the related information content similarity is the following: {(label,label ), (length,length ), ( flow,capacity), ( deepness,profundity) }

23 Evaluating similarity (10) In fact, by assuming that deepness and profundity are synonyms, we have: ics( label,label)= ics(length,length )= ics( deepness,profundity) = 1 and ics( flow,capacity) = 0.

24 Evaluating similarity (11) The similarity of the sets of attributes of complexTypes (asim) is therefore defined by the above maximum sum divided by the greatest of the cardinalities of the sets of attributes of the types compared. In the case of RiverType and CanalType we have: asim(RiverType,CanalType) = ¾ = 0.75

25 Evaluating similarity (12) Concept Similarity (Gsim) The Similarity (Gsim) of the concepts River and Canal is defined as: Gsim(River, Canal) =(ics(River, Canal)*w + asim(River, Canal)*(1-w)) *  t (RiverType,CanalType) where: ics(River, Canal) is the information content similarity asim(River, Canal) is the structural similarity w is a weight, s.t. 0 <= w <= 1.  t is a Boolean function that, given two complexTypes, returns 0 if their least upper bound in the type hierarchy is AbstractFeatureType, otherwise it returns 1.

26 Evaluating similarity (13) In particular, if we assume w=0.5 Gsim(River, Canal) =(ics(River, Canal)*w + asim(River, Canal)*(1-w)) *  t (RiverType,CanalType) Gsim(River, Canal) = 0.5 ( )*1 = 0.74

27 Conclusion Thank you


Download ppt "Evaluating semantic similarity using GML in Geographic Information Systems Fernando Ferri 1, Anna Formica 2, Patrizia Grifoni 1, and Maurizio Rafanelli."

Similar presentations


Ads by Google