Presentation on theme: "ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari."— Presentation transcript:
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari 2, G. Pasi 2, A. Sala 3 1 DEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy 2 DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy 3 DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 1st International Workshop on Interoperability through Semantic Data and Service Integration 25 June 2009 Camogli, Italy
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 2 DB unimo Outline 1.Motivation 2.Building the Global Data and Service View at Set-up Time 3.Data and eService Retrieval 4.Conclusion and future work
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 3 DB unimo Motivation The research on data integration and service discovering has involved from the beginning different (not always overlapping) communities. –Data and services are described with different models, and different techniques to retrieve data and services have been developed. From a user perspective, the border between data and services is often not so definite, since data and services provide a complementary vision about the available resources. Users need new techniques to manage data and services in a unified way. Integration of data and services can be tackled from different perspectives. –Access to data is guaranteed though Service Oriented Architectures (SOA), and Web services are exploited to provide information integration platforms; –Providing a global view on the data sources and on eServices available in the peer to support the access to the two complementary kinds of resources at a same time.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 4 DB unimo Motivation (2) The problem we address in is to retrieve, among the many services available, the ones that are related to the query, according to the semantics of the terms involved in the query. Select Name, Country from Accommodation Where City=Modena
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 5 DB unimo The approach (overview) We assume to have a mediator-based data integration system which provides a global virtual view of data - the Semantic Peer Data Ontology (SPDO). We assume to have a set of semantically annotated service descriptions. –Ontologies used in the service descriptions can be developed outside the peer and are not known in advance, in the integration process. We propose a semantic-based approach to perform data and service integration: –given a SQL- like query expressed in the terminology of the SPDO, retrieve all the services that can be considered related to the query on the data sources. The approach developed is based on: –a mediator-based data integration system, the MOMIS system (Mediator envirOnment for Multiple Information Sources); –a service retrieval engine based on IR techniques performing semantic indexing of service descriptions and keyword-based semantic search.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 6 DB unimo The approach (overview) The integration of data and services is achieved by: 1.building the SPDO (a functionality already provided by MOMIS), 2.building a Global Service Ontology (GSO) consisting of the ontologies used in the service semantic descriptions, 3.defining a set of mappings between the SPDO and the GSO, 4.exploiting, at query time, query rewriting techniques based on these mappings to build a keyword-based query for service retrieval expressed in the GSO terminology starting from a SQL-like query on the data sources.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 7 DB unimo Building the Global Data and Service View The global light service ontology is built by means of the following steps: Service indexing, Global Service Ontology (GSO) construction, Global Light Service Ontology (GLSO) construction and Semantic Similarity Matrix (SSM) definition. The SPDO is built by exploiting the MOMIS integration system
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 8 DB unimoMOMIS
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 9 DB unimo Service Indexing Our approach requires a formal representation of the service descriptions and it is based on full text indexing which extracts terms from six specific sections of the service description: –service name, –Service description, –input, –output, –pre-condition –post-condition A set of index terms I that will be part of the dictionary is extracted. –I O = the set of index terms consisting of ontology –I T = the set index terms extracted from textual descriptions The indexing structure is based on a structured document approach, where inverted file structure consists of: –a dictionary file based on I, –a posting file, with a list of references to the services sections where the considered term occurs
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 10 DB unimo GSO construction The GSO is built by: –loosely merging each service ontology O such that i belongs to O for some i in I O –associating a concept Ci with each i in I T, introducing a class Terms subclass of Thing in the GSO and stating that for every i in I T, Ci is subclass of Terms loosely merging means that SOs are merged without attempting to integrate similar concepts across the different integrated ontologies. –if the source SOs are consistent, the GSO can be assumed to be consistent –Loose merging is clearly not the optimal choice with respect to ontology integration –Since the XIRE component is based on approximate IR techniques and semantic similarity, approximate solutions to the ontology integration problem can be considered acceptable; instead, the whole GSO building process need to be fully automatized.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 11 DB unimo GLSO construction and Semantic Similarity Matrix The GSO may result extremely large in size: only a subset of the terms of the ontologies are relevant to the SWS descriptions. –a technique to reduce the ontology size is exploited and a GLSO (Global Light Service Ontology) is obtained. –We extract from the GSO, the subontology that preserves the meanings of the terms explicitly used in the service descriptions, namely, the set of the index terms I. The Semantic Similarity Matrix (SSM), which is exploited later on for query expansion at query time, is computed. –The SSM is defined by analyzing the GLSO structure, according to some semantic measure developed in literature and takes into account subclass paths, domain and range restrictions on properties, membership of instances, and so on.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 12 DB unimo Mapping of Data and Service Ontologies Mappings between the elements of the SPDO and the GLSO are generated by exploiting and properly modifying the MOMIS clustering algorithm. The clustering algorithm takes as input the SPDO and the GLSO with their associated metadata and generates a set of clusters of classes belonging to the SPDO and the GLSO. Mappings are automatically generated exploiting the clustering result. –A cluster contains only SPDO classes: it is not exploited for the mapping generation; this cluster is caused by the selection of a clustering threshold less selective than the one chosen in the SPDO creation process –A cluster contains only GLSO classes: it is not exploited for the mapping generation; it means that there are descriptions of Web Services which are strongly related –A cluster contains classes belonging to the SPDO and the GLSO: this cluster produces for each SPDO class a mapping to each GLSO class
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 13 DB unimoExample The following mappings are generated with the application of our technology: Accommodation --> Hotel Accommodation.Name --> Hotel.Denomination Accommodation.City --> Hotel.Location Accommodation.Country --> Hotel.Country Hotel Hotel.Denomination Hotel.Location Hotel.Country GLSO fragment SPDO fragment
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 14 DB unimo Data and eService Retrieval select from where The answer to this query is a data set from the data sources together with a set of services which are potentially useful, since they are related to the concepts appearing in the query and then to the retrieved data. The query processing is divided into two simultaneously executed steps: –data set from the data sources is obtained with a query processing on an integrated view The results are obtained by exploiting the MOMIS Query Manager which rewrites the global query as an equivalent set of queries expressed on the local schemata (local queries), by means of an unfolding process – a set of services related to the query is obtained by exploiting the mapping between SPDO and GLSOs and the concept of relevant service mapping. Services are retrieved by the XIRE (eXtended Information Retrieval Engine) component, which is a service search engine based on the vector space.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 15 DB unimo Data and eService Retrieval (overview)
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 16 DB unimo Managing keywords Given a query in an SQL-like notation expressed the SPDO terminology, the set of keywords extracted consists of: –all the classes given in the FROM clause, –all the attributes and the values used in the SELECT and WHERE clauses –all their ranges defined by ontology classes. The set of keywords are exploiting the mappings between the SPDO and the GLSO. Semantic similarity between GLSO terms defined in the SSM is exploited to expand the keyword set into a weighted terms
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 17 DB unimo eServices retrieval Query evaluation is based on the vector space model: –by this model both documents (that is Web Service descriptions) and queries (extracted keywords) are represented as a vector in a n-dimensional space. –Each vector represents a document, and it will have weights different from zero for those keywords which are indexes for that description. –Relevance weights are used to modify the weights in the list resulting from keyword evaluation process.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 18 DB unimo Conclusion and future work In this paper we introduced a technique for publishing and retrieving a unified view of data and services. Such unified view may be exploited for improving the user knowledge of a set of sources and for retrieving a list of web services relate to a data set. The approach is semi-automatic, and works jointly with the tools which are typically provided for searching for data and services separately. Future work will be addressed on evaluating the effectiveness of the approach in the real cases provided within the NeP4B project, and against the OWLS-TC benchmark.