DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang 04-04-2006.

Slides:



Advertisements
Similar presentations
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Sanjay Agrawal Microsoft Research Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research DBXplorer: A System for Keyword Based Search over.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Discovering Queries based on Example Tuples
1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Space-for-Time Tradeoffs
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Foundations of Relational Implementation n Defining Relational Data n Relational Data Manipulation n Relational Algebra.
Tries Standard Tries Compressed Tries Suffix Tries.
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Querying Structured Text in an XML Database By Xuemei Luo.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Session 1 Module 1: Introduction to Data Integrity
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Database: SQL, MySQL, LINQ and Java DB © by Pearson Education, Inc. All Rights Reserved.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
BY: Mark Gruszecki.  What is a Recursive Query?  Definition(s) and Algorithm(s)  Optimization Techniques  Practical Issues  Impact of each Optimization.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
 2012 Pearson Education, Inc. All rights reserved.
Data Integration for Relational Web
Keyword Searching and Browsing in Databases using BANKS
Efficient Subgraph Similarity All-Matching
Database Systems Summary and Overview
Answering Cross-Source Keyword Queries Over Biological Data Sources
A Framework for Testing Query Transformation Rules
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Probabilistic Ranking of Database Query Results
Kittiya Poonsilp, Rujijan Vichivanives, Attakorn Poonsilp
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang

2 Outline Introduction Overview of DBXplorer Publish Search Generalized Matches Conclusion

3 Introduction Internet search engines have popularized keyword- based search. Traditional database management systems do not support keyword-based search. e.g. search the Microsoft intranet on ‘Jim Gray’ to obtain matched rows, i.e., rows in the database where ‘Jim Gray’ occur. In this paper, DBXplorer, an efficient and scalable keyword search utility for relational databases, is described. The goal is to enable such searches without necessarily requiring the users to know the schema of the respective databases.

4 Overview of DBXplorer Given a set of query keywords, DBXplorer returns all rows (either from single tables, or by joining tables connected by foreign-key joins) such that the each row contains all keywords. Two Steps: Publish Search

5 Overview of DBXplorer - Publish Publish - a preprocessing step that enables databases for keyword search by building the symbol table and associated structures Step 1: A database is identified, along with the set of tables and columns within the database to be published. Step 2: Auxiliary tables are created for supporting keyword searches. The most important structure is a symbol table S that is used at search time to efficiently determine the locations of query keywords in the database (i.e., the tables, columns, rows they occur in).

6 Overview of DBXplorer - Search Search - gets matching rows from the published databases. Step 1: Searching the symbol table Step 2: Enumerating Join Trees. Step 3: Identify matching rows. The final rows are ranked and presented to the user.

7 Overview of DBXplorer Architecture of DBXplorer The publish component provides interfaces to (a)select a database, (b)select tables/columns within the database to publish, and (c)modify/remove/maintain the publication. For a given set of keywords, the search component provides interfaces to (1)retrieve matching databases from a set of published databases, and (2)selectively identify tables, columns/rows that need to be searched within each database identified in step (1).

8 Publish – Symbol Table The symbol table is the key data structure used to look up the respective locations of query keywords in the database. An important design consideration is deciding the location granularity (a) column level granularity (Pub-Col) (i.e., list of table.column ) (b) cell level granularity (Pub-Cell) (i.e., list of table.column.rowid). The Pub-Col symbol table alternative is almost always better than the Pub-Cell table, unless certain columns do not have indexes.

9 Publish – Symbol Table Store symbol tables (Pub-Col) in databases – compression algorithms FK-Comp CP-Comp (a)Partition H into a minimum number of bipartite cliques (a bipartite clique is any subgraph of H with a maximal number of edges). (b) Compress each clique. Uncompressed hash table Compressed hash table ColumnsMap table (map the symbol table S to a bipartite graph H having two node sets, HashVal and ColId, where every row (v, c) in table S corresponds to an edge in H.) v2v2 v3v3 v4v4 c1c1 c2c2 x

10 Search Step 1 – Search the symbol table (using generated SQL) to identify the database tables, columns/cells that contain at least one of the keywords in the query. Step2 – enumerate Join Trees Step3 – identify matching rows

11 Search – Enumerate Join Trees Identify and enumerate all potential subsets of tables in the database that, if joined, might contain rows having all keywords. The resulting relation will contain all potential rows having all keywords specified in the query. If we view the schema graph G as an undirected graph, this step enumerates join trees, i.e., sub- trees of G such that: (a)the leaves belong to MatchedTables and (b)together, the leaves contain all keywords of the query Join Trees

12 Search – Identify matching rows The input to this final search step is the enumerated join trees. Each join tree is then mapped to a single SQL statement that joins the tables as specified in the tree, and selects those rows that contain all keywords. The retrieved rows are ranked before being output.

13 Generalized Matches – Token Matches Token matches – the keyword in the query matches only a token or sub-string of an attribute value (e.g., retrieve rows of address by specifying only a street name). Pub-Prefix method – based on the following crucial observation: B+ tree indexes can be used to retrieve rows whose cell matches a given prefix string. During publishing of a database, for every keyword K, the entry (hash(K), T.C, P) is kept in the symbol table if there exists a string in column T.C which (a) contains a token K, and (b) has prefix P.

14 Generalized Matches – Token Matches Database table T Pub-Prefix table Let the hash values of the searchable tokens i.e., ‘string’, ‘ball’ and ‘round’ be 1, 2 and 3 respectively RowIdC 1this is a string 2this string 3this is a ball 4no string 5any ball is round

15 Conclusion This paper discusses DBXplorer, a system that enables keyword-based search in relational databases. DBXplorer uses symbol table alternatives to store the location of keywords in database. DBXplorer support exact matches and generalized matches. DBXplorer has been implemented using a commercial relational database and web server and allows users to interact via a browser front-end.