Presentation is loading. Please wait.

Presentation is loading. Please wait.

Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos.

Similar presentations


Presentation on theme: "Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos."— Presentation transcript:

1 Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos Nana Mbinkeu 1 1 DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 2 ISTI – CNR, via Moruzzi 1, Pisa, Italy

2 Claudio Gennaro ISDSI 2009 2Outline 1.Motivation 2.The system and scenario overview 3.Querying an ontology of data and multimedia sources mapping Query unfolding for multimedia conditions ranking 4.Conclusion and future work

3 Claudio Gennaro ISDSI 2009 3Motivation We proposed a method for building a populated domain ontology representative of a set of web data sources. The method exploits the capabilities of a mediator system (MOMIS) to create an integrated view of a set of data sources, i.e. a domain ontology schema, and a set of annotations linking data to the integrated view. We extend that approach with multimedia sources, thus obtaining a methodology for building and querying an ontology representing data and multimedia sources. 1.There are several use cases where applications interact with ontologies of data and multimedia sources. 2.Multimedia and data sources are usually represented with different models. No standard for representing at the same time data and multimedia sources has been adopted by large communities. 3.Different languages and different interfaces for querying “traditional” and “multimedia” data sources have been developed. The formers rely on expressive languages allowing expressing selection clauses, the latters typically implement similarity search techniques for retrieving multimedia documents similar to the ones provided by the user.

4 Claudio Gennaro ISDSI 2009 4 Managing a Semantic Peer: MOMIS + MILOS NeP4B Semantic Peer provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer. MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources MILOS is a general purpose Multimedia Content Management System Manages and serves any multimedia documents Manages any metadata of documents

5 Claudio Gennaro ISDSI 2009 5 Data and Multimedia Sources (DMSs) Data and Multimedia Source (DMS) is an object oriented database of metadata objects describing a collection of multimedia documents (such as images, videos, etc.) represented with a schema defined in ODL I 3 The DMS schema includes, in general, a set of standard attributes declared using standard predefined ODL I3 types, such as string, double, integer, etc, supporting selection predicates typical of structured and semi- structured data, such as =,,... And multimedia attributes, LMS includes another set of special attributes, declared by means of special predefined classes in ODL I3 which support similarity based searches (Full text search, image similarity, geographical search, etc.)

6 Claudio Gennaro ISDSI 2009 6 A sample scenario

7 Claudio Gennaro ISDSI 2009 7 A sample scenario

8 Claudio Gennaro ISDSI 2009 8 A sample scenario

9 Claudio Gennaro ISDSI 2009 9 Quering DMSs A DMS M i can be queried using an extension of standard SQL-like syntax SELECT clause. The WHERE clause consists of a conjunctive combination of predicates on the single standard attributes of M i, as in the following: ORDER BY + LIMIT K, specify in practice a top-k similarity query SELECT M i.A k,…, M i.S l,… FROM M i WHERE M i.A x op 1 val 1 AND M i.A y op 2 val 2... ORDER BY M i.S w (Q 1 ), M i.S z (Q 2 ),… LIMIT K

10 Claudio Gennaro ISDSI 2009 10 Quering DMSs interface city() { // standard attributes attribute string Name; attribute string Zip; attribute string Country; attribute integer Surface; attribute integer Population; // similarity attributes attribute Image Photo; attribute Text Description; attribute GeoCoord GeoPosition, } // query example SELECT Name FROM city WHERE Country = "Italy“ ORDER BY Photo("http://www.flickr.com/32e324e.jpg"), GeoCoord(40.25, 14.32), Description("sea mozzarella pizza") LIMIT 100 This query tries to find among all Italian cities the ones that best match the image given as example, the textual description, and are nearest as possible to the geographical point of location 40.25N, 14.32E.

11 Claudio Gennaro ISDSI 2009 11 DMS: Assumptions Since we would like to build a general purpose framework, we make the following assumptions: The way by which the returned objects are ordered is not known (black box); The DMS does not return scores associated with the objects indicating the relevance of them with respect to the query; If no ORDER BY clause is specified, DMS will return the records sorted in random order.

12 Claudio Gennaro ISDSI 2009 12 Representing the SPDO We build a conceptualization of a set of DMSs, composed of global classes and global attributes and mappings between the SPDO and the DMS schemata,

13 Claudio Gennaro ISDSI 2009 13Mapping The query is defined in a semiautomatic way as follows: –A Mapping Table (MT) is specified for each global class G, whose columns represent the n local classes M 1,…,M n belonging to G and whose rows represent the h global attributes of G. Multimedia attributes can be mapped only onto Global multimedia attributes of the same type. –Join Conditions are defined between pairs of local classes belonging to G and allow the system to identify instances of the same real-world object in different sources.

14 Claudio Gennaro ISDSI 2009 14 Example of mapping Hotelresorthotel name (join)Namedenomination telephoneTelephonetel faxFaxfax wwwWeb-sitewww room_numRoom_numrooms price (RF)Price_avgmean_price cityCitylocation starsStars– free_wifi– photoPhotoimg descriptionDescriptioncommentary

15 Claudio Gennaro ISDSI 2009 15Mapping –Resolution Functions are introduced to solve data conflicts of local attribute values associated to the same real-world object. In our framework we consider and implement some of such resolution functions, in particular, the PREFERRED function, which takes the value of a preferred source and the RANDOM function, which takes a random value. –For what concern the multimedia attributes, we introduce a new resolution function, called MOST_SIMILAR, which returns the multimedia objects most similar to the one expressed in the query (if any).

16 Claudio Gennaro ISDSI 2009 16 Given a global class G with m attributes of which k multimedia attributes, denoted by G.S 1,…,G.S k (as photo and description in the class Hotel) and h standard attributes, denoted by G.A 1,…,G.A h, a query on G (global query) is a conjunctive query, expressed in a simple abstract SQL-like syntax as: SELECT G.A l,…,G.S j FROM G WHERE G.A x op 1 val 1 AND G.A y op 2 val 2... ORDER BY G.S w (Q 1 ), …, G.S z (Q 2 ) LIMIT K Query the SPDO

17 Claudio Gennaro ISDSI 2009 17 Query unfolding To answer a global query on G, the query must be rewritten as an equivalent set of queries (local queries) expressed on the local classes L(G) belonging to G. the query rewriting is performed by means of query unfolding, which consists of the following four steps: 1.Computation of Local Query conditions 2.Computation of Residual Conditions 3.Fusion of local answers 4.Application of the Residual Condition

18 Claudio Gennaro ISDSI 2009 18 Query Fusion: Ranking Why? –Modern multimedia content managers typically return multimedia objects (i.e., which support similarity) in decreasing order of relevance, that is, so that the “best” answers are on the top; –we want to preserve this knowledge at global level; –However, since we cannot exploit scores we use the rank as indicator of the relevance of the record returned.

19 Claudio Gennaro ISDSI 2009 19 Ranking the results our problem falls into the category of the partial rank aggregation problems, in which we merge top-k lists rather than fully ranked lists, We use a simple but yet effective aggregation function for ordinal ranks is the median function: –The score of an object its median position in all the returned lists. The median function is demonstrated by Fagin et al., to be near-optimal, even for top-k or partial lists. The algorithm MEDRANK is based on median rank aggregation

20 Claudio Gennaro ISDSI 2009 20 The MedRank algorithm R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1 2 3 4 Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking

21 Claudio Gennaro ISDSI 2009 21 The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2 3 4

22 Claudio Gennaro ISDSI 2009 22 The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3 4

23 Claudio Gennaro ISDSI 2009 23 The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3C 4

24 Claudio Gennaro ISDSI 2009 24 The MedRank algorithm R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3C 4D Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking

25 Claudio Gennaro ISDSI 2009 25Example We would like to found image about the “Arch of Triumph of Rome by night”. and we assume to have two DMSs containing images of monuments in the world, the first DMS1 with geographical coordinates search capabilities, and the second one DMS2 with image similarity search capabilities;

26 Claudio Gennaro ISDSI 2009 26Example GDMS1DMS2 URL (join)urlurl_address Subjectsubjecttype Img-img GeoCoord -

27 Claudio Gennaro ISDSI 2009 27 SELECT … FROM DMS1 WHERE subject=“Monument” ORDER BY GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR Unfortunately if I just for geo coordinates giving the coordinates of Rome as input I found a lot of images of the Colosseum

28 Claudio Gennaro ISDSI 2009 28 SELECT … FROM DMS2 WHERE type=“Monument” ORDER BY Img(URL), STOP AFTER 5 ORDER BY Image(URL) Roma. Palazzo della Civiltà del Lavoro. EUR And if I just search for similarity an image of the “Arch of Triumph of Rome by night” I found a lot of images about the Arch of Triumph of Paris, which is very similar but more famous.

29 Claudio Gennaro ISDSI 2009 29 SELECT … FROM WorldMonuments WHERE Subject=“Monument” ORDER BY Img(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"EORDER BY Image(URL) Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR

30 Claudio Gennaro ISDSI 2009 30 Conclusion and future work We presented a methodology implemented in a tool that allows a user to create and query an integrated view of data and multimedia sources. Future work will be devoted to experiment the tool in real scenarios. In particular, our tool will be exploited for integrating business catalogs related to the area of “tiles”. –We think that such data may provide useful test cases because of the need of connecting data about the features of the tiles with their images.

31 Claudio Gennaro ISDSI 2009 31 THE END

32 Claudio Gennaro ISDSI 2009 32 Building the Data Ontology: MOMIS MOMIS* (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources Semantic Integration of Information A common data model ODLI3 (derived from ODL-ODMG and I3) & mapped into OLCD description logics Tool-supported techniques to construct the Global Virtual View (GVV) Local sources wrapping Local Schema Annotation w.r.t. a common lexical ontology (WordNet) Semi-automatic discovery of relationships between local schemata Clustering techniques to build the GVV & mappings between the GVV and local schemata (Mapping Table) automatic GVV Annotation w.r.t. a common lexical ontology & OWL exportation Global Query Management Including services and multimedia data sources 25/03/200832 D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini: "Synthesizing an Integrated Ontology ", IEEE Internet Computing Magazine, September-October 2003,42-51. S. Bergamaschi, S. Castano, M. Vincini "Semantic Integration of Semistructured and Structured Data Sources", SIGMOD Record Special Issue on Semantic Interoperability in Global Information, Vol. 28, No. 1, March 1999. *

33 Claudio Gennaro ISDSI 2009 33 MOMIS architecture SYNSET 2 SYNSET # SYNSET 4 SYNSET 1 MANUAL ANNOTATION SEMI-AUTOMATIC ANNOTATION INFERRED RELATIONSHIPS LEXICON DERIVED RELATIONSHIPS SCHEMA DERIVED RELATIONSHIPS Common Thesaurus COMMON THESAURUS GENERATION USER SUPPLIED RELATIONSHIPS ODLI3 LOCAL SCHEMA N WRAPPING ODLI3 LOCAL SCHEMA 1 … … GVV GENERATION MAPPING TABLES GLOBAL CLASSES

34 Claudio Gennaro ISDSI 2009 34 Mapping definition in MOMIS Mappings among a Global Class G of the GVV and its local classes are represented by a Mapping Table Global-as-View (GAV) mappings: for each global class G a view V G over the local classes of G is defined by a Full- Join Merge Operator: 1.Outer Join : to include into the result all tuples of all local sources 2.Merge : to perform data reconciliation (Resolution functions)

35 Claudio Gennaro ISDSI 2009 35 Building the Mappings: an example from T_L1 outer join T_L2 DollarEuro(mean_price) Data Conversion Functions on (T_L1.Name = T_L2.denomination) Join Attribute Join Conditions Full Join Select name, avg(T_L1.price_avg, T_L2.mean_price) as price, T_L1.Stars, … Resolution Functions avg(L1,L2) Full Join Merge Mapping Table of the global Class Hotel = {L1.resort, L2.hotel}

36 Claudio Gennaro ISDSI 2009 36 Global Query Management The querying problem: How to answer queries expressed on the GS (global queries)? In a Virtual Data Integration system, data reside at the data sources then the query processing is based on Query rewriting : to rewrite a global query as an equivalent set of queries expressed on the local schemata data sources (local queries). GAV approach: query rewriting is performed by unfolding, i.e. by expanding a global query on G according to the view associated to G  Query Optimization Techniques for the Full-Join Merge Operator  Motivation : 1.full outer join queries are very expensive, especially in a distributed environment 2.only limited optimization is performed on full outer join

37 Claudio Gennaro ISDSI 2009 37 An example of Full-Join Merge Optmization SELECT * FROM G WHERE city LIKE "%Modena%" AND price < 200 Apply resolution functions: price =AVG()Apply residual constraints : price < 200Result LQ1= SELECT * FROM L1 WHERE City LIKE "%Modena%" LQ2= SELECT * FROM L2 WHERE location LIKE "%Modena%" LQ1 FULL JOIN LQ2 AND stars = 4 RIGHT JOIN AND free_wifi = true INNER JOIN

38 Claudio Gennaro ISDSI 2009 38 MILOS Metadata Editor: Visual Basic (SOAP Comm.) Repository Metadata Integrator: Access to documents Access to metadata Metadata indepence (SOAP Web Service) MultiMedia doc. serv.: Allows homoneous acces to heterogeneous media (SOAP Web Service) XML Search Engine: Structure search Fielded search Full text search Multimedia search Schema independent XQuery support (SOAP Web Service) Metadata independence: The schema seen in the interface logic can be different of the one(s) used in the repository Retrieval Interface: JSP (SOAP Comm.)

39 Claudio Gennaro ISDSI 2009 39 MILOS (2) The MILOS system is based on a three–tier distributed architecture: Client tier This is the top most level of the system. It contains client application that interacts with MILOS and that displays results to user applications. Business logic It manages query processing by integrating and aligning information stored in the databases. It performs reconciliation of retrieved data by managing ranking. Data tier It is composed of the Large Object Database, that physically stores multimedia documents managed by the system and the metadata database, where all metadata associated with the multimedia items are stored. Multimedia metadata are represented in the data tier in XML formats. MILOS adopts a native XML database, which supports XML query language standards and offers advanced search and indexing functionality on arbitrary XML documents. MILOS XML database provides full–text search, automatic classification, and feature similarity search functionalities. the Large Object Database permits clients of MILOS to deal with multimedia in an uniform way.

40 Claudio Gennaro ISDSI 2009 40 The MedRank algorithm Whenever there are multiple multimedia attributes strange side effects can affect the precision of the answer. Example: –Suppose we have two image database consisting of monument images. MS1: provides image similarity and geografic coordinates MS2: provides only image similarity –The query consists of a sample image and a point coordinates

41 Claudio Gennaro ISDSI 2009 41 SELECT … FROM WorldMonuments ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"EORDER BY Image(URL) Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR

42 Claudio Gennaro ISDSI 2009 42 DMS: Assumptions The rationale of the above assumptions is that our aim is to work in a general environment with heterogeneous DMSs for which we do not have any knowledge of their scoring functions. –The motivation is that the final scores themselves are often the result of the contributions of the scores of each attribute. A scoring function is therefore usually defined as an aggregation over partial heterogeneous scores (e.g., the relevance for text-based IR with keyword queries, or similarity degrees for color and texture of images in a multimedia database). –Even in the simpler case of single multimedia attributes the knowledge of the scores become meaningless outside the context in which they are evaluated. As an example consider the TF * IDF scoring function used by normal text search engines. The score of a document depends upon the collection statistics and search engines could use different scoring algorithms. However, the above assumptions of considering a local DMS as a black box that does not return any score associated to result elements, do not presume that local DMSs do not use internally scoring functions for combing different multimedia attributes. –Typically modern multimedia systems use fuzzy logic to aggregate scores of different multimedia attributes that are graded in the interval [0,1]. Classical examples of thesefunctions are the min and mean functions.

43 Claudio Gennaro ISDSI 2009 43 Each atomic predicate P i and similarity predicate in the global query are rewritten into corresponding constraints supported by the local classes. For example, the constraints stars = 3 is translated into a constrain Stars = 3 considering the local class resort and is not translated into any constraint considering the local class hotel. Computation of Local Query conditions

44 Claudio Gennaro ISDSI 2009 44 Computation of Residual Conditions Conditions on not homogeneous standard attributes cannot be translated into local conditions: they are considered as residual and have to be solved at the global level.

45 Claudio Gennaro ISDSI 2009 45 Computation of Residual Conditions for multimedia attribute we use the MOST_SIMILAR. For example, suppose we are searching for images similar to one specified in the query by means of ’ORDER BY’ clause. If we retrieve two or more multimedia objects with one or more corresponding images, MOST_SIMILAR function will simply select the image that is more similar to the query image. However since we do not know scores, how do we evaluate similarity?

46 Claudio Gennaro ISDSI 2009 46 Computation of Residual Conditions Rank Based Similarity: we simply exploit the rank of the objects in the returned list as indicator of similarity between the attributes values belonging to the objects. This aspect is related with the problem of the fusion

47 Claudio Gennaro ISDSI 2009 47 Fusion of local answers For each local source involved in the global query, a local query is generated and executed on the local sources. The local answers are fused into the global answer on the basis of the mapping query q G defined for G, i.e. by using the Full Outerjoin-merge (FOJ) operation. –Computation of the full outer join of local answers (FOJ). The result of this operation is ordered on the basis of the multimedia attributes specified in the query, this aspect is deeply examined in the next Slide. –Application of the Resolution Functions : for each attribute GA of the global query the related Resolution Function is applied to FOJ

48 Claudio Gennaro ISDSI 2009 48 Ranking the results In principle, if we had ALL the (fused) records of the result set we can exploit an optimal rank aggregation method based on a distance measure to quantify the disagreements among different rankings. In this respect the overall ranking is the one that has minimum distance to the different rankings obtained from different sources. Several different distance measures are available in literature. However, the difficult of solving the problem of distance-based rank aggregation is related to the choice of the distance measure and its corresponding complexity that can be even NP-Hard in some cases (see Kendall distance). However, fortunately, our case falls into this category of the partial rank aggregation problems, in which we measures the distance between only the top-k lists rather than fully ranked lists.

49 Claudio Gennaro ISDSI 2009 49 Example 1 R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D A: ( 1, 2, 3 ) B: ( 1, 1, 2 ) C: ( 3, 3, 4 ) D: ( 3, 4, 4 ) R 1B 2A 3C 4D 1 http://www.cs.helsinki.fi/u/tsaparas/InformationNetworks/lectures/lecture10.ppt

50 Claudio Gennaro ISDSI 2009 50 Combining rankings In many cases the scores are not known –e.g. meta-search engines – scores are proprietary information … or we do not know how they were obtained –one search engine returns score 10, the other 100. What does this mean? … or the scores are incompatible –apples and oranges: does it make sense to combine price with distance? In this cases we can only work with the rankings


Download ppt "Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos."

Similar presentations


Ads by Google