Center for E-Business Technology Seoul National University Seoul, Korea Optimization of Multi-Domain Queries on the Web Daniele Braga, Stefano Ceri, Florian.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Toward Scalable Keyword Search over Relational Data Akanksha Baid, Ian Rae, Jiexing Li, AnHai Doan, and Jeffrey Naughton University of Wisconsin VLDB 2010.
CSSSIA Workshop – WWW 2008 Speeding up Web Service Composition with Volatile External Information John Harney, Prashant Doshi LSDIS Lab, Dept. of Computer.
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Formal Specification of Topological Relations Erika Asnina, Janis Osis and Asnate Jansone Riga Technical University The 10th International Baltic Conference.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Chapter 8 and 9 Review: Logical Functions and Control Structures Introduction to MATLAB 7 Engineering 161.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Optimal Crawling Strategies for Web Search Engines Wolf, Sethuraman, Ozsen Presented By Rajat Teotia.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Lesley Charles November 23, 2009.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
A Collaborative and Semantic Data Management Framework for Ubiquitous Computing Environment International Conference of Embedded and Ubiquitous Computing.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
Ferdowsi University of Mashhad 1 Automatic Semantic Web Service Composition based on owl-s Research Proposal presented by : Toktam ghafarian.
Searching for the Best Engine Presented by Gong GI Hyun, IDS Lab., Seoul National University.
Service Marts: a Service Framework for Search Computing Alessandro Campi Andrea Maesani.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 13: Query Processing
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Contents. Goal and Overview. Ingredients. The Page Model.
Database Management System
Computational Models Database Lab Minji Jo.
Prepared by : Ankit Patel (226)
Chapter 12: Query Processing
Software Engineering: A Practitioner’s Approach, 6/e Chapter 23 Estimation for Software Projects copyright © 1996, 2001, 2005 R.S. Pressman & Associates,
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Lecture 2- Query Processing (continued)
Software Engineering: A Practitioner’s Approach, 6/e Chapter 23 Estimation for Software Projects copyright © 1996, 2001, 2005 R.S. Pressman & Associates,
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Center for E-Business Technology Seoul National University Seoul, Korea Optimization of Multi-Domain Queries on the Web Daniele Braga, Stefano Ceri, Florian Daniel, Davide Martinenghi Dipartimento di Elettronica e Informazione – Politecnico di Milano VLDB Presented by Babar Tareen, IDS Lab., Seoul National University Based on Conference Presentation

Copyright  2008 by CEBT Mutli-Domain Queries  Queries that can be answered by combining knowledge from two or more domains  Example Where can I attend an interesting database workshop close to a sunny beach? Who are the strongest experts on service computing based upon their recent publication record and accepted European projects ? Can I spend an April week-end in a city served by a low-cost direct flight from Milano offering a Mahler's symphony? 2

Copyright  2008 by CEBT Intro  General-purpose search engines (e.g. Yahoo, Google) Very large search space, yet Not able to index deep Web data  Domain-specific search engines (e.g. an airline’s flight search form, Amazon’s book search facility) Typically of high quality, but Limited to restricted domains  We lack the ability to answer multi-domain queries 3

Copyright  2008 by CEBT In general: “Given a query over a set of services, find the query plan that minimizes the expecte d execution cost according to a given met ric in order to obtain the best k answers.” Scenario: a multi-domain query Reference query: –“Find all database conferences in the next six months in locations where the average temperature is at least 28°C degrees and for which a cheap travel solution including a luxury accommodation exists.” Answering this query requires: –Finding interesting conferences in the desired timeframe via online services by the scientific community; –Understanding whether the conference location is served by low-cost flights; –Finding luxury hotels close to the conference location with available rooms; and –Checking the expected average temperature of the location 4

Copyright  2008 by CEBT Overall Picture 5

Copyright  2008 by CEBT Preliminaries – (1)  Characteristics of information sources (services) Search services: return answers in ranking order Exact services: indistinguishible tuples (no ranking) Services have access patterns – Combination of Input and Output parameters corresponding to different ways of invocation 6

Copyright  2008 by CEBT Preliminaries – (2)  Characteristics of information sources (services) Expected result size per invocation (ERSPI): – proliferative (ERSPI>1) – selective (0≤ERSPI≤ 1) services Chunking/paging of result sets: bulk vs. chunked services  Joins Can be considered system services ERSPI: selectivity of the join condition, ERSPIs of services – Product of the ERSPI values of the services multiplied by the selectivity of the join condition 7

Copyright  2008 by CEBT Preliminaries – (3)  Query plan: indicates the invocations of services and their conjunctive composition through joins Represented as directed acyclic graphs (DAGs) Nodes = atoms in the conjuncitve query (service, join) Arcs = precedence constaints + data flows Joins: join strategy + number of fetches per service 8 Directed Acyclic Graph

Copyright  2008 by CEBT Preliminaries – (4)  Cost metrics: associate a cost to a plan Sum cost metric = sum of the costs of each operator Execution time metric = expected time from query input to result output Request-response cost metric = special case of sum cost metric where each invocation has a costs of 1 9

Copyright  2008 by CEBT Optimization Approach  Exploring a highly combinatorial solution space 1 st Phase: selection of a given query rewriting such that every service is called with one of available access patterns 2 nd Phase: selection of query plan 3 rd Phase: assignment of the exact number of fetches to be performed over chunked services 10

Copyright  2008 by CEBT Services, access patterns, queries  Web services and access patterns: The example query (in Datalog-like syntax): Services with alternative access patterns 11

Copyright  2008 by CEBT Query plans  Representation as DAGs Placing a node = invoking the respective service/join Two nodes connected by an arc = sequential execution Two nodes without connection = parallel execution  Graphical notation (note the parallel vs. pipe join): 12

Copyright  2008 by CEBT Joing strategies for parallel joins  Nested loop: one service “dominates” the other  Merge-scan: no a-priori distinction of services 13

Copyright  2008 by CEBT Annotated query plans  In order to estimate the number of tuples in output, we further need to know: The number of tuples in output of each service The number of fetches for each chunked service The join strategy for each parallel join  The final annotation is the output of the optimization 14

Copyright  2008 by CEBT Instrumented branch and bound Possible service combinations: Not feasible: City would need to be an input parameter to the query! α 1 has more input fields than α 2  Access pattern selection Heuristic: “Bound is better” = the more input fields in the access pattern, the better  Query plan selection Heuristic: “Selective and parallel are better” = selective services in series (with increasing ERSPI) and proliferative services in parallel  Chunked service selection Heuristic: “Greedy and square are better” = either we increment the number of fetches to chunked services individually (greedy) or together (square) 15

Copyright  2008 by CEBT Final annotation of query plan Execution time cost metric: Service characterization: Fetching factors: Annotated query plan 16

Copyright  2008 by CEBT Query execution  Execution environment Service registration: signature, patterns, ERSPI, repsonse times, chunk sizes, indication of join strategy,... Service orchestration: query execution Multi-threading: to leverage parallelisms  Logical caching (speed + elimination of duplicates) No cache = each call individually repeated One-call cache = caching of the last call to each service Optimal cache = all calls to all services are cached 17

Copyright  2008 by CEBT # of calls under varying chache settings 18

Copyright  2008 by CEBT Results of the optimal plan  Screenshot of the prototype query engine 19

Copyright  2008 by CEBT Conclusion  In this work, we have defined an formal model for the optimization of multi-domain queries over web services (conjunctive queries) defined query plans similar to relational physical access plans derived an optimization technique based on a classical branch and bound technique given experimental evidence that the proposed model fits real world settings (existing web service and wrapped ones)  Next Generic query engine + declarative rep. of query plans User interface for the mashup of sevices/queries 20

Copyright  2008 by CEBT Discussion  Very Simple Experimental Setup  No details about Semi-automatically generated Wrappers  How to decide which service to select for a specific domain?  How to map Input Output parameters between different services?  If we have to pre-program the system for new domains, it is like developing a special purpose application  How effective is the system for answering Multi-Domain Queries? 21