Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:

Similar presentations


Presentation on theme: "Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:"— Presentation transcript:

1 Advanced Information Systems Laboratory http://iaaa.cps.unizar.es Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21: "Ontologies for Urban Development: Interfacing Urban Information Systems" Building an Address Gazetteer on top of an Urban Network Ontology J.Nogueras-Iso, F.J.López, J.Lacasta, F.J.Zarazaga-Soria, P.R.Muro-Medrano Geneva, 6-7 November 2006

2 2 Outline  1. Introduction  2. A typical use-case: IDEZar  3. Ontology building using a manual mapping  4. Ontology building using an automated approach  5. Conclusions

3 3 1. Introduction  The increasing relevance of geographic information for decision-making and resource management in diverse areas promoted the creation of Spatial Data Infrastructures (SDI)  SDI: a coordinated approach to technology, policies, standards, and human resources necessary for the effective acquisition, management, distribution and utilization of GI at different organization levels and involving both public and private institutions  Gazetteer Service  A typical component of an SDI  Directory of instances of a class or classes of features containing some information regarding position  Looks up geographic feature locations based on geographic identifiers

4 4 Address Gazetteer Service  In SDIs for local administrations such as a city council,  address gazetteer services represent one of the most important services that the councils must offer to their citizens  An Address Gazetteer Service  Specialized on Urban Network Features (addresses)  The councils are responsible for the management of urban networks, and these networks are used as reference information for other services at national level such as cadaster or census services

5 5 Creation of the contents of a gazetteer  It usually requires combining multiple repositories  The same feature (concept) is stored in different repositories, each of them contributing with a different piece of attribute information  Typical problems of heterogeneity  Different data models (roles, granularity), encoding  Our proposal to deal with heterogeneity in this context:  Build an urban network ontology upon existing feature types taxonomies

6 6 2. A typical use-case: IDEZar  The IDEZar Project is the result of a collaboration agreement signed in March 2004 between the City Council and the University of Zaragoza  Zaragoza is a medium-sized city (some 650000 inhabitants), in the northeast of Spain (capital of Aragón), growing fast in extension and population. The municipality is about 1000 km2 and includes several towns  Objective: development of a local SDI for Zaragoza  To facilitate, increase and coordinate the use of spatial data by the Council  To develop applications for the citizens and to provide them with access to public sector information

7 7 IDEZar Service Architecture http://www.zaragoza.es/idezar/ > Catalog > Urban-Thematic Public services (libraries, police stations...) Private services (pharmacies, parkings...) > Base Street maps > Environment-Thematic Agenda 21, protected areas... > Street names > Arriving at Zaragoza IDEZar (Local SDI) > IDEE-Nomenclátor Toponyms > IDEE-Base Base map up to 1:25000 of Spain IDEE (National SDI) > Base Orthoimages IDEAr (Aragón – Regional SDI) GeoPortal Street Map and Gazetteer

8 8 Address related repositories  Multiple repositories  Not very different models  Feature = name + type + additional info (location, range, …)  But different taxonomies for urban network feature types  Not specially synchronized Zaragoza City Council Informatics Office AYTO National Statistics Institute TVIAN National Cadaster Office SIGLA Tax Office SIGLA Urban Planning Office AYTO,SIGLA IDEZar AYTO Electoral Census Inhabitant Census Addresses Property Census Amends (streets, addresses) Site development updates Town planning updates Addresses updates Street names Addresses Maps Street types Street names Addresses Maps Addresses ranges Statistics Office TVIAN

9 9 Address related repositories  Statistics Office repository  Inhabitant/poll census, exchanges from/to National Statistics Institute  TVIAN (Tipo de Vía Normalizada): standardized network feature types of the National Statistics Institute Zaragoza City Council Informatics Office AYTO National Statistics Institute TVIAN National Cadaster Office SIGLA Tax Office SIGLA Urban Planning Office AYTO,SIGLA IDEZar AYTO Electoral Census Inhabitant Census Addresses Property Census Amends (streets, addresses) Site development updates Town planning updates Addresses updates Street names Addresses Maps Street types Street names Addresses Maps Addresses ranges Statistics Office TVIAN

10 10 Address related repositories  Cadaster Office repository  Land/Tax management, exchanges from/to National Cadaster Office  SIGLA: network feature types of the Cadaster office Zaragoza City Council Informatics Office AYTO National Statistics Institute TVIAN National Cadaster Office SIGLA Tax Office SIGLA Urban Planning Office AYTO,SIGLA IDEZar AYTO Electoral Census Inhabitant Census Addresses Property Census Amends (streets, addresses) Site development updates Town planning updates Addresses updates Street names Addresses Maps Street types Street names Addresses Maps Addresses ranges Statistics Office TVIAN

11 11 Address related repositories  Informatics Office repository  Central repository used for assignation of new street names  AYTO: Network feature types of the council Zaragoza City Council Informatics Office AYTO National Statistics Institute TVIAN National Cadaster Office SIGLA Tax Office SIGLA Urban Planning Office AYTO,SIGLA IDEZar AYTO Electoral Census Inhabitant Census Addresses Property Census Amends (streets, addresses) Site development updates Town planning updates Addresses updates Street names Addresses Maps Street types Street names Addresses Maps Addresses ranges Statistics Office TVIAN

12 12 Gazetteer content creation  Why do we need to combine both 3 repositories?  Not all features are in the 3 repositories  Attribute information is distributed in the different repositories

13 13 Gazetteer content creation II  Problems found while combining  Matching can not be based uniquely on feature names  2 features may differ in typology but not in name (Spain square vs Spain avenue)  Which is the most appropriate feature type taxonomy for the gazetteer contents?  Solution proposed: define a urban network ontology  An ontology defines explicitly the concepts and relations between these concepts in a domain  This ontology will provide a unified model of the feature types that can be found in this domain  Making the necessary mappings to the particular taxonomies use in the different council offices or external organizations

14 14 How to build up the ontology  The construction of ontologies upon existing vocabularies is a classical and widely used approach  The underlying problem (ontology alignment)  How to find the relationships that hold between the entities represented in different taxonomies  Two approaches for the ontology construction  Manual mapping approach  Automated approach TVIAN AYTOSIGLA

15 15 “PZ” “PL” SQUARE “CN” RESIDENTIAL DEVELOPMENT “CM” MINOR ROAD COUNTRY HOUSE (SOUTH OF SPAIN) MINOR ROAD “CN” “CL” STREET “CLP” “CL” “CLTP” “AN” STREET PEDESTRIAN STREET PEDESTRIAN STREET SEGMENT SIGLA (Cadaster) AYTO (City Council) Concepts Acronyms 3. Manual Mapping approach  Matching of terms (names + acronyms) between the different taxonomies  Difficulties: lack of semantic descriptions  Categories of matches  Exact match  Partial match: one concept is broader or narrower No match  Provisional match: taxonomy errors (homonyms) imply erroneous matches TVIANAYTOSIGLA

16 16 A more flexible approach  Previous approach  Too time expensive and with little scalability  Improvement  Use of well-established shared common core ontology and make mappings between the distinct sources and this common core  New experiment: Use of URBISOC thesaurus  a thesaurus focused on Spanish terminology for Town Planning  developed by the CINDOC/CSIC institute (Centre for Scientific Information and Documentation / Spanish National Research Council) TVIAN AYTOSIGLA URBISOC

17 17 A more flexible approach II  Use of Towntology ontology editor  Focused on ontology construction  Storage of concepts with several definitions that are in a process of selection and characterization  Although improving scalability, still time expensive and error prone

18 18 4. Ontology building using an automated approach  Why?  Manual mappings are time expensive  Some mappings may not be successful because content creators have not assigned the correct feature type  Technique proposed  Formal Concept Analysis (1980, Wille &Ganter …)  It enables the extraction of a hierarchy of concepts from the feature instances contained in the source repositories TVIAN AYTOSIGLA generated

19 19 Basics of FCA  Definition of formal contexts, triple (G,M,I)  G: objects  M: attributes  I: binary relation between G and M, incidence matrix  It is possible to extract formal concepts  Given A  G and B  M, a pair (A,B) is a formal concept if and only if  the set of all attributes shared by the objects in A is identical with B  A is also the set of all the objects which have in common with each other the attributes in B  Additionally it is possible to establish a subconcept- superconcept relation  (A1,B1)  (A2,B2)  A1  A2 (  B2  B1)

20 20 Applying FCA  How to obtain a unique repository of instances, i.e. the formal context required by FCA?  Traditional datalinking has been applied to the feature instances contained in the different databases  based on the analysis of the lexical and spatial similarities of feature attributes  Transform the datalinking matrix into the incidence matrix  Each checked cell (match of source features) generates an object/instance in the incidence matrix  The columns correspond with the transformation of urban network feature type codes (e.g., AYTO CODE, SIGLA CODE) into proper attributes with boolean values

21 21 Incidence matrix Datalinking matrix Replace by code 2718 features 18 AYTO codes 4318 features 35 SIGLA codes

22 22 Applying FCA  Obtain the concept lattice  NEXT CLOSED SET algorithm (Ganter 87) Concept Lattice Incidence matrix FCA Only attributes supremum (least common superconcept) AYTO_PL SIGLA_PZ (square) SIGLA_AV (avenue) SIGLA_CL AYTO_CL (traffic allowed street) infimum (greatest common subconcept) … SIGLA_CL AYTO_AN (carfree designed street) AYTO_AV SIGLA_AV (traffic allowed avenue) SIGLA_AV AYTO_AVP (pedestrian avenue) SIGLA_CL (street) SIGLA_CL AYTO_CLP (pedestrianized street)

23 23 Results  Experiment: combining COUNCIL_FEATURE and CADASTER_FEATURE databases  A concept lattice of 36 concepts from the original 53 concepts  Identification of equivalent concepts in in both taxonomies,  e.g., square (PL in AYTO and PZ in SIGLA)  And also subconcept-superconcept relations.  E.g., identification of street as a broader concept in SIGLA (CL), which has narrower concepts in the AYTO  traffic-allowed streets (CL)  pedestrianized streets (CLP)  Or carfree-designed streets (AN).

24 24 5. Conclusions  FCA approach seems to be more flexible  Dynamic building of the ontology (at least, a draft)  We don’t need to define the concepts, we just need to observe the data that exists  We have created a domain specific ontology that facilitate the interoperability (synchronization, update and merge) of the separate repositories  Future lines  Improve the efficiency of the method  Enrich the generated concepts with commonalities found in other feature attributes of the instances (e.g., geometry, perimeter, area)  Apply to other domains  Hydrology: NMA vs Water Agency repositories

25 25 Advanced Information Systems Laboratory http://iaaa.cps.unizar.es


Download ppt "Advanced Information Systems Laboratory Department of Computer Science and Systems Engineering 1st Workshop of COST Action C21:"

Similar presentations


Ads by Google