Claudio Gennaro ISDSI 2009 1 Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos.

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

Three-Step Database Design
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Searching for data and services F. Guerra 1, A. Maurino 2, M. Palmonari.
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB unimo Semantic Analysis for an Advanced ETL framework S.Bergamaschi 1, F.
Università di Modena e Reggio Emilia ;-)WINK Maurizio Vincini UniMORE Researcher Università di Modena e Reggio Emilia WINK System: Intelligent Integration.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Chapter 10: Designing Databases
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
07 - Special Session on Agricultural Metadata & Semantics Antonio Sala - Università di Modena e Reggio Emilia 1 Creating and Querying.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Web Information Retrieval
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
Heterogeneous Data Warehouse Analysis and Dimensional Integration Marius Octavian Olaru XXVI Cycle Computer Engineering and Science Advisor: Prof. Maurizio.
Galia Angelova Institute for Parallel Processing, Bulgarian Academy of Sciences Visualisation and Semantic Structuring of Content (some.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
A Graphical Environment to Query XML Data with XQuery
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
CH 11 Multimedia IR: Models and Languages
Kmi.open.ac.uk Semantic Execution Environments Service Engineering and Execution Barry Norton and Mick Kerrigan.
Automatic Data Ramon Lawrence University of Manitoba
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Organizing Information Digitally Norm Friesen. Overview General properties of digital information Relational: tabular & linked Object-Oriented: inheritance.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Querying Structured Text in an XML Database By Xuemei Luo.
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Dimitrios Skoutas Alkis Simitsis
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
NeP4B Aims and Innovations: Toward a Unified View of Data and Services Carlo Batini Matteo Palmonari Andrea Maurino University of Milan-Bicocca Italy Sonia.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Information Retrieval
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Object storage and object interoperability
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
1 Chapter 2 Database Environment Pearson Education © 2009.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Of 24 lecture 11: ontology – mediation, merging & aligning.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Introduction Multimedia initial focus
Unified Modeling Language
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Integration for Relational Web
Data Model.
Database Systems Instructor Name: Lecture-3.
Multimedia Information Retrieval
Presentation transcript:

Claudio Gennaro ISDSI Query Processing in a Mediator System for Data and Multimedia D. Beneventano 1, C. Gennaro 2, M. Mordacchini 2, R. Carlos Nana Mbinkeu 1 1 DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 2 ISTI – CNR, via Moruzzi 1, Pisa, Italy

Claudio Gennaro ISDSI Outline 1.Motivation 2.The system and scenario overview 3.Querying an ontology of data and multimedia sources mapping Query unfolding for multimedia conditions ranking 4.Conclusion and future work

Claudio Gennaro ISDSI Motivation We proposed a method for building a populated domain ontology representative of a set of web data sources. The method exploits the capabilities of a mediator system (MOMIS) to create an integrated view of a set of data sources, i.e. a domain ontology schema, and a set of annotations linking data to the integrated view. We extend that approach with multimedia sources, thus obtaining a methodology for building and querying an ontology representing data and multimedia sources. 1.There are several use cases where applications interact with ontologies of data and multimedia sources. 2.Multimedia and data sources are usually represented with different models. No standard for representing at the same time data and multimedia sources has been adopted by large communities. 3.Different languages and different interfaces for querying “traditional” and “multimedia” data sources have been developed. The formers rely on expressive languages allowing expressing selection clauses, the latters typically implement similarity search techniques for retrieving multimedia documents similar to the ones provided by the user.

Claudio Gennaro ISDSI Managing a Semantic Peer: MOMIS + MILOS NeP4B Semantic Peer provides a unified access to different data sources referring to the same domain by means of a Semantic Peer Data Ontology (SPDO) of the data i.e. a common representation of all the data sources belonging to the peer. MOMIS (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources MILOS is a general purpose Multimedia Content Management System Manages and serves any multimedia documents Manages any metadata of documents

Claudio Gennaro ISDSI Data and Multimedia Sources (DMSs) Data and Multimedia Source (DMS) is an object oriented database of metadata objects describing a collection of multimedia documents (such as images, videos, etc.) represented with a schema defined in ODL I 3 The DMS schema includes, in general, a set of standard attributes declared using standard predefined ODL I3 types, such as string, double, integer, etc, supporting selection predicates typical of structured and semi- structured data, such as =,,... And multimedia attributes, LMS includes another set of special attributes, declared by means of special predefined classes in ODL I3 which support similarity based searches (Full text search, image similarity, geographical search, etc.)

Claudio Gennaro ISDSI A sample scenario

Claudio Gennaro ISDSI A sample scenario

Claudio Gennaro ISDSI A sample scenario

Claudio Gennaro ISDSI Quering DMSs A DMS M i can be queried using an extension of standard SQL-like syntax SELECT clause. The WHERE clause consists of a conjunctive combination of predicates on the single standard attributes of M i, as in the following: ORDER BY + LIMIT K, specify in practice a top-k similarity query SELECT M i.A k,…, M i.S l,… FROM M i WHERE M i.A x op 1 val 1 AND M i.A y op 2 val 2... ORDER BY M i.S w (Q 1 ), M i.S z (Q 2 ),… LIMIT K

Claudio Gennaro ISDSI Quering DMSs interface city() { // standard attributes attribute string Name; attribute string Zip; attribute string Country; attribute integer Surface; attribute integer Population; // similarity attributes attribute Image Photo; attribute Text Description; attribute GeoCoord GeoPosition, } // query example SELECT Name FROM city WHERE Country = "Italy“ ORDER BY Photo(" GeoCoord(40.25, 14.32), Description("sea mozzarella pizza") LIMIT 100 This query tries to find among all Italian cities the ones that best match the image given as example, the textual description, and are nearest as possible to the geographical point of location 40.25N, 14.32E.

Claudio Gennaro ISDSI DMS: Assumptions Since we would like to build a general purpose framework, we make the following assumptions: The way by which the returned objects are ordered is not known (black box); The DMS does not return scores associated with the objects indicating the relevance of them with respect to the query; If no ORDER BY clause is specified, DMS will return the records sorted in random order.

Claudio Gennaro ISDSI Representing the SPDO We build a conceptualization of a set of DMSs, composed of global classes and global attributes and mappings between the SPDO and the DMS schemata,

Claudio Gennaro ISDSI Mapping The query is defined in a semiautomatic way as follows: –A Mapping Table (MT) is specified for each global class G, whose columns represent the n local classes M 1,…,M n belonging to G and whose rows represent the h global attributes of G. Multimedia attributes can be mapped only onto Global multimedia attributes of the same type. –Join Conditions are defined between pairs of local classes belonging to G and allow the system to identify instances of the same real-world object in different sources.

Claudio Gennaro ISDSI Example of mapping Hotelresorthotel name (join)Namedenomination telephoneTelephonetel faxFaxfax wwwWeb-sitewww room_numRoom_numrooms price (RF)Price_avgmean_price cityCitylocation starsStars– free_wifi– photoPhotoimg descriptionDescriptioncommentary

Claudio Gennaro ISDSI Mapping –Resolution Functions are introduced to solve data conflicts of local attribute values associated to the same real-world object. In our framework we consider and implement some of such resolution functions, in particular, the PREFERRED function, which takes the value of a preferred source and the RANDOM function, which takes a random value. –For what concern the multimedia attributes, we introduce a new resolution function, called MOST_SIMILAR, which returns the multimedia objects most similar to the one expressed in the query (if any).

Claudio Gennaro ISDSI Given a global class G with m attributes of which k multimedia attributes, denoted by G.S 1,…,G.S k (as photo and description in the class Hotel) and h standard attributes, denoted by G.A 1,…,G.A h, a query on G (global query) is a conjunctive query, expressed in a simple abstract SQL-like syntax as: SELECT G.A l,…,G.S j FROM G WHERE G.A x op 1 val 1 AND G.A y op 2 val 2... ORDER BY G.S w (Q 1 ), …, G.S z (Q 2 ) LIMIT K Query the SPDO

Claudio Gennaro ISDSI Query unfolding To answer a global query on G, the query must be rewritten as an equivalent set of queries (local queries) expressed on the local classes L(G) belonging to G. the query rewriting is performed by means of query unfolding, which consists of the following four steps: 1.Computation of Local Query conditions 2.Computation of Residual Conditions 3.Fusion of local answers 4.Application of the Residual Condition

Claudio Gennaro ISDSI Query Fusion: Ranking Why? –Modern multimedia content managers typically return multimedia objects (i.e., which support similarity) in decreasing order of relevance, that is, so that the “best” answers are on the top; –we want to preserve this knowledge at global level; –However, since we cannot exploit scores we use the rank as indicator of the relevance of the record returned.

Claudio Gennaro ISDSI Ranking the results our problem falls into the category of the partial rank aggregation problems, in which we merge top-k lists rather than fully ranked lists, We use a simple but yet effective aggregation function for ordinal ranks is the median function: –The score of an object its median position in all the returned lists. The median function is demonstrated by Fagin et al., to be near-optimal, even for top-k or partial lists. The algorithm MEDRANK is based on median rank aggregation

Claudio Gennaro ISDSI The MedRank algorithm R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking

Claudio Gennaro ISDSI The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2 3 4

Claudio Gennaro ISDSI The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3 4

Claudio Gennaro ISDSI The MedRank algorithm Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3C 4

Claudio Gennaro ISDSI The MedRank algorithm R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D R 1B 2A 3C 4D Access the rankings sequentially –when an element has appeared in more than half of the rankings, output it in the aggregated ranking

Claudio Gennaro ISDSI Example We would like to found image about the “Arch of Triumph of Rome by night”. and we assume to have two DMSs containing images of monuments in the world, the first DMS1 with geographical coordinates search capabilities, and the second one DMS2 with image similarity search capabilities;

Claudio Gennaro ISDSI Example GDMS1DMS2 URL (join)urlurl_address Subjectsubjecttype Img-img GeoCoord -

Claudio Gennaro ISDSI SELECT … FROM DMS1 WHERE subject=“Monument” ORDER BY GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR Unfortunately if I just for geo coordinates giving the coordinates of Rome as input I found a lot of images of the Colosseum

Claudio Gennaro ISDSI SELECT … FROM DMS2 WHERE type=“Monument” ORDER BY Img(URL), STOP AFTER 5 ORDER BY Image(URL) Roma. Palazzo della Civiltà del Lavoro. EUR And if I just search for similarity an image of the “Arch of Triumph of Rome by night” I found a lot of images about the Arch of Triumph of Paris, which is very similar but more famous.

Claudio Gennaro ISDSI SELECT … FROM WorldMonuments WHERE Subject=“Monument” ORDER BY Img(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"EORDER BY Image(URL) Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR

Claudio Gennaro ISDSI Conclusion and future work We presented a methodology implemented in a tool that allows a user to create and query an integrated view of data and multimedia sources. Future work will be devoted to experiment the tool in real scenarios. In particular, our tool will be exploited for integrating business catalogs related to the area of “tiles”. –We think that such data may provide useful test cases because of the need of connecting data about the features of the tiles with their images.

Claudio Gennaro ISDSI THE END

Claudio Gennaro ISDSI Building the Data Ontology: MOMIS MOMIS* (Mediator envirOnment for Multiple Information Sources) is a framework to perform information extraction and integration of heterogeneous, structured and semistructured, data sources Semantic Integration of Information A common data model ODLI3 (derived from ODL-ODMG and I3) & mapped into OLCD description logics Tool-supported techniques to construct the Global Virtual View (GVV) Local sources wrapping Local Schema Annotation w.r.t. a common lexical ontology (WordNet) Semi-automatic discovery of relationships between local schemata Clustering techniques to build the GVV & mappings between the GVV and local schemata (Mapping Table) automatic GVV Annotation w.r.t. a common lexical ontology & OWL exportation Global Query Management Including services and multimedia data sources 25/03/ D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini: "Synthesizing an Integrated Ontology ", IEEE Internet Computing Magazine, September-October 2003, S. Bergamaschi, S. Castano, M. Vincini "Semantic Integration of Semistructured and Structured Data Sources", SIGMOD Record Special Issue on Semantic Interoperability in Global Information, Vol. 28, No. 1, March *

Claudio Gennaro ISDSI MOMIS architecture SYNSET 2 SYNSET # SYNSET 4 SYNSET 1 MANUAL ANNOTATION SEMI-AUTOMATIC ANNOTATION INFERRED RELATIONSHIPS LEXICON DERIVED RELATIONSHIPS SCHEMA DERIVED RELATIONSHIPS Common Thesaurus COMMON THESAURUS GENERATION USER SUPPLIED RELATIONSHIPS ODLI3 LOCAL SCHEMA N WRAPPING ODLI3 LOCAL SCHEMA 1 … … GVV GENERATION MAPPING TABLES GLOBAL CLASSES

Claudio Gennaro ISDSI Mapping definition in MOMIS Mappings among a Global Class G of the GVV and its local classes are represented by a Mapping Table Global-as-View (GAV) mappings: for each global class G a view V G over the local classes of G is defined by a Full- Join Merge Operator: 1.Outer Join : to include into the result all tuples of all local sources 2.Merge : to perform data reconciliation (Resolution functions)

Claudio Gennaro ISDSI Building the Mappings: an example from T_L1 outer join T_L2 DollarEuro(mean_price) Data Conversion Functions on (T_L1.Name = T_L2.denomination) Join Attribute Join Conditions Full Join Select name, avg(T_L1.price_avg, T_L2.mean_price) as price, T_L1.Stars, … Resolution Functions avg(L1,L2) Full Join Merge Mapping Table of the global Class Hotel = {L1.resort, L2.hotel}

Claudio Gennaro ISDSI Global Query Management The querying problem: How to answer queries expressed on the GS (global queries)? In a Virtual Data Integration system, data reside at the data sources then the query processing is based on Query rewriting : to rewrite a global query as an equivalent set of queries expressed on the local schemata data sources (local queries). GAV approach: query rewriting is performed by unfolding, i.e. by expanding a global query on G according to the view associated to G  Query Optimization Techniques for the Full-Join Merge Operator  Motivation : 1.full outer join queries are very expensive, especially in a distributed environment 2.only limited optimization is performed on full outer join

Claudio Gennaro ISDSI An example of Full-Join Merge Optmization SELECT * FROM G WHERE city LIKE "%Modena%" AND price < 200 Apply resolution functions: price =AVG()Apply residual constraints : price < 200Result LQ1= SELECT * FROM L1 WHERE City LIKE "%Modena%" LQ2= SELECT * FROM L2 WHERE location LIKE "%Modena%" LQ1 FULL JOIN LQ2 AND stars = 4 RIGHT JOIN AND free_wifi = true INNER JOIN

Claudio Gennaro ISDSI MILOS Metadata Editor: Visual Basic (SOAP Comm.) Repository Metadata Integrator: Access to documents Access to metadata Metadata indepence (SOAP Web Service) MultiMedia doc. serv.: Allows homoneous acces to heterogeneous media (SOAP Web Service) XML Search Engine: Structure search Fielded search Full text search Multimedia search Schema independent XQuery support (SOAP Web Service) Metadata independence: The schema seen in the interface logic can be different of the one(s) used in the repository Retrieval Interface: JSP (SOAP Comm.)

Claudio Gennaro ISDSI MILOS (2) The MILOS system is based on a three–tier distributed architecture: Client tier This is the top most level of the system. It contains client application that interacts with MILOS and that displays results to user applications. Business logic It manages query processing by integrating and aligning information stored in the databases. It performs reconciliation of retrieved data by managing ranking. Data tier It is composed of the Large Object Database, that physically stores multimedia documents managed by the system and the metadata database, where all metadata associated with the multimedia items are stored. Multimedia metadata are represented in the data tier in XML formats. MILOS adopts a native XML database, which supports XML query language standards and offers advanced search and indexing functionality on arbitrary XML documents. MILOS XML database provides full–text search, automatic classification, and feature similarity search functionalities. the Large Object Database permits clients of MILOS to deal with multimedia in an uniform way.

Claudio Gennaro ISDSI The MedRank algorithm Whenever there are multiple multimedia attributes strange side effects can affect the precision of the answer. Example: –Suppose we have two image database consisting of monument images. MS1: provides image similarity and geografic coordinates MS2: provides only image similarity –The query consists of a sample image and a point coordinates

Claudio Gennaro ISDSI SELECT … FROM WorldMonuments ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"E ) STOP AFTER 5 ORDER BY Image(URL), GeoCoord(41°53'43.68"N, 12°28'56.34"EORDER BY Image(URL) Dist = 1km Dist = 2km Roma. Palazzo della Civiltà del Lavoro. EUR

Claudio Gennaro ISDSI DMS: Assumptions The rationale of the above assumptions is that our aim is to work in a general environment with heterogeneous DMSs for which we do not have any knowledge of their scoring functions. –The motivation is that the final scores themselves are often the result of the contributions of the scores of each attribute. A scoring function is therefore usually defined as an aggregation over partial heterogeneous scores (e.g., the relevance for text-based IR with keyword queries, or similarity degrees for color and texture of images in a multimedia database). –Even in the simpler case of single multimedia attributes the knowledge of the scores become meaningless outside the context in which they are evaluated. As an example consider the TF * IDF scoring function used by normal text search engines. The score of a document depends upon the collection statistics and search engines could use different scoring algorithms. However, the above assumptions of considering a local DMS as a black box that does not return any score associated to result elements, do not presume that local DMSs do not use internally scoring functions for combing different multimedia attributes. –Typically modern multimedia systems use fuzzy logic to aggregate scores of different multimedia attributes that are graded in the interval [0,1]. Classical examples of thesefunctions are the min and mean functions.

Claudio Gennaro ISDSI Each atomic predicate P i and similarity predicate in the global query are rewritten into corresponding constraints supported by the local classes. For example, the constraints stars = 3 is translated into a constrain Stars = 3 considering the local class resort and is not translated into any constraint considering the local class hotel. Computation of Local Query conditions

Claudio Gennaro ISDSI Computation of Residual Conditions Conditions on not homogeneous standard attributes cannot be translated into local conditions: they are considered as residual and have to be solved at the global level.

Claudio Gennaro ISDSI Computation of Residual Conditions for multimedia attribute we use the MOST_SIMILAR. For example, suppose we are searching for images similar to one specified in the query by means of ’ORDER BY’ clause. If we retrieve two or more multimedia objects with one or more corresponding images, MOST_SIMILAR function will simply select the image that is more similar to the query image. However since we do not know scores, how do we evaluate similarity?

Claudio Gennaro ISDSI Computation of Residual Conditions Rank Based Similarity: we simply exploit the rank of the objects in the returned list as indicator of similarity between the attributes values belonging to the objects. This aspect is related with the problem of the fusion

Claudio Gennaro ISDSI Fusion of local answers For each local source involved in the global query, a local query is generated and executed on the local sources. The local answers are fused into the global answer on the basis of the mapping query q G defined for G, i.e. by using the Full Outerjoin-merge (FOJ) operation. –Computation of the full outer join of local answers (FOJ). The result of this operation is ordered on the basis of the multimedia attributes specified in the query, this aspect is deeply examined in the next Slide. –Application of the Resolution Functions : for each attribute GA of the global query the related Resolution Function is applied to FOJ

Claudio Gennaro ISDSI Ranking the results In principle, if we had ALL the (fused) records of the result set we can exploit an optimal rank aggregation method based on a distance measure to quantify the disagreements among different rankings. In this respect the overall ranking is the one that has minimum distance to the different rankings obtained from different sources. Several different distance measures are available in literature. However, the difficult of solving the problem of distance-based rank aggregation is related to the choice of the distance measure and its corresponding complexity that can be even NP-Hard in some cases (see Kendall distance). However, fortunately, our case falls into this category of the partial rank aggregation problems, in which we measures the distance between only the top-k lists rather than fully ranked lists.

Claudio Gennaro ISDSI Example 1 R1R1 1A 2B 3C 4D R2R2 1B 2A 3D 4C R3R3 1B 2C 3A 4D A: ( 1, 2, 3 ) B: ( 1, 1, 2 ) C: ( 3, 3, 4 ) D: ( 3, 4, 4 ) R 1B 2A 3C 4D 1

Claudio Gennaro ISDSI Combining rankings In many cases the scores are not known –e.g. meta-search engines – scores are proprietary information … or we do not know how they were obtained –one search engine returns score 10, the other 100. What does this mean? … or the scores are incompatible –apples and oranges: does it make sense to combine price with distance? In this cases we can only work with the rankings