DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.

Slides:



Advertisements
Similar presentations
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Introduction to Rails.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Sanjay Agrawal Microsoft Research Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research DBXplorer: A System for Keyword Based Search over.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
What is a Database By: Cristian Dubon.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Introduction to Databases CIS 5.2. Where would you find info about yourself stored in a computer? College Physician’s office Library Grocery Store Dentist’s.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Chapter 6: The Traditional Approach to Requirements
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Component 4/Unit 6f Topic VI: Create simple querying statements for the database The SELECT statement Clauses Functions Joins Subqueries Data manipulation.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
 2004 Prentice Hall, Inc. All rights reserved. 1 Segment – 6 Web Server & database.
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
ITGS Databases.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Indexes and Views Unit 7.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 5 Index and Clustering
The Relational Model. 2 Relational Model Terminology u A relation is a table with columns and rows. –Only applies to logical structure of the database,
Session 1 Module 1: Introduction to Data Integrity
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 6 This material was developed by Oregon Health & Science.
Database: SQL, MySQL, LINQ and Java DB © by Pearson Education, Inc. All Rights Reserved.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 4 The Relational Model Pearson Education © 2009.
CSC314 DAY 8 Introduction to SQL 1. Chapter 6 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SQL OVERVIEW  Structured Query Language  The.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Practical Database Design and Tuning
Chapter 1 Introduction.
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
 2012 Pearson Education, Inc. All rights reserved.
Databases and Information Management
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Practical Database Design and Tuning
Databases and Information Management
Databases and Information Management
Manipulating Data Lesson 3.
Use of SQL – The Patricia database
Presentation transcript:

DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande

Contents 1. Introduction 2. Overview of DBXplorer 3. Symbol Table Design - Publish 4. Keyword Search 5. Support for Generalized Matches 6. Conclusion

Introduction Internet search engines have popularized keyword-based search. Searching on traditional database management system is done through customized applications which are closely tied to the database schema. Traditional database management systems do not support keyword- based search.  e.g. search the Microsoft intranet on ‘Jim Gray’ to obtain matched rows, i.e., rows in the database where ‘Jim Gray’ occur. In this paper, DBXplorer, an efficient and scalable keyword search utility for relational databases, is described.  The goal is to enable such searches without necessarily requiring the users to know the schema of the respective databases.

Example Keywords “Programming” by “Ritchie” Less probability of presence of both keywords in single row of table For result, rows need to be generated by joining tables on the fly (all possible combinations) Searching for a book Authors AuthorsBooks BooksBookStores Store

Overview of DBXplorer Objective - Given a set of query keywords, DBXplorer returns all rows (either from single tables, or by joining tables connected by foreign-key joins) such that the each row contains all keywords. Applying IR techniques from the documents world to databases is difficult, because of  Database normalization – by which logical units of information may be fragmented and scattered across several tables  Matching row may be obtained by joining several tables on the fly IR techniques use Inverted Lists = Symbol Table in databases Symbol Table - Stores the information about keywords at different granularities (column/row), i.e. for each keyword stores the list of all rows

Overview of DBXplorer - Publish Enabling keyword search in DBXplorer requires 2 steps  Publish enables database for keywords by building Symbol table and associated structures  Search retrieves matching rows from published database Steps in Publish  Step 1: A database is identified, along with the set of tables and columns within the database to be published.  Step 2: Auxiliary tables [like Symbol table] are created for supporting keyword searches.

Overview of DBXplorer - Search Steps in Search  Looks up symbol table to find tables / columns which contain keywords  Identify all possible joins (subsets of table if joined) whose rows might contain required keywords  Generate SQL statements for each join (gives rows which contain all keywords), rank rows and return

Architecture of DBXplorer The publish component provides interfaces to  select a database,  select tables/columns within the database to publish, and  modify/remove/maintain the publication. For a given set of keywords, the search component provides interfaces to  retrieve matching databases from a set of published databases, and  selectively identify tables, columns/rows that need to be searched within each database identified in step (1). Architecture of DBXplorer

Symbol Table Design Exact match problem considered only Symbol Table (S) - stores the information about keywords at different granularities (column/row), i.e. for each keyword stores the list of all rows Column Level granularity (Pub-Col)  For every keyword S maintains list of all database columns (i.e. table.column) Cell level granularity (Pub-cell)  For every keyword S maintains list of all database cells (i.e. table.column.rowid)

Symbol Table - Design factors Space and Time Requirements  Size: pub-col are smaller than pub-cell since repetition of keyword in a column does not increases entries in case of pub- col  Time: pub-col takes less time to build Keyword search performance  Depends on efficient generation and execution of SQL statement (built from symbol table entries)  pub-cell returns more number of SQL statements than pub-col as for a keyword in column there are multiple entries Ease of maintenance  Insert/Update: required for insertion of distinct new value in case of pub-col while pub-cell needs for every update/insert  Same for delete

Storing Symbol Table Store symbol tables (pub-col) in database as (keyword hash, column Id) FK Compression (Foreign Key)  If there is foreign key relationship between c1 and c2, store only c1 CP Compression  Partition H into a minimum number of bipartite cliques (a bipartite clique is any subgraph of H with a maximal number of edges).  Compress each clique. Stores symbol table (pub-cell) in database as (keyword hash, list of all cellids) Uncompressed hash table Compressed hash table ColumnsMap table v2v2 v3v3 v4v4 c1c1 c2c2 x

Search - Enumerating Join Trees Step1 - Looks up symbol table to find tables / columns which contain keywords Step2 - Enumerate join trees  Identify and enumerate all potential subsets of tables in the database that, if joined, might contain rows having all keywords.  The resulting relation will contain all potential rows having all keywords specified in the query. Join Trees If we view the schema graph G as an undirected graph, this step enumerates join trees, i.e., sub- trees of G such that (a) the leaves belong to MatchedTables (b) together, the leaves contain all keywords of the query keywords

Search – Identify matching rows The input to this final search step is the enumerated join trees. Each join tree is then mapped to a single SQL statement that joins the tables as specified in the tree, and selects those rows that contain all keywords. The retrieved rows are ranked before being output. Rows ranked by number of joins involved (ties broken arbitrarily) (same as keywords occurring close to one another in documents are ranked higher) Join Trees

Generalized Matches – Token Matches Token matches - the keyword in the query matches only a token or sub-string of an attribute value (e.g., retrieve rows of address by specifying only a street name). Pub-Prefix method  B+ tree indexes can be used to retrieve rows whose cell matches a given prefix string  This clause is of the form WHERE T.C LIKE ‘P%K%’  During publishing of a database, for every keyword K, the entry (hash(K), T.C, P) is kept in the symbol table if there exists a string in column T.C which contains a token K, and has prefix P

Generalized Matches - Token Matches Database table T Pub-Prefix table Let the hash values of the searchable tokens i.e., ‘string’, ‘ball’ and ‘round’ be 1, 2 and 3 respectively RowIdC 1this is a string 2this string 3this is a ball 4no string 5any ball is round Consider searching keyword “string” Pub-Prefix table returns prefixes “th” and “no” and subsequent SQL will contain (T.C LIKE ‘th%string%’) OR (T.C LIKE ‘no%string%’)

Conclusion This paper discusses DBXplorer, a system that enables keyword-based search in relational databases. DBXplorer uses symbol table alternatives to store the location of keywords in database. DBXplorer support exact matches and generalized matches upto some extent.