Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.

Similar presentations


Presentation on theme: "Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina."— Presentation transcript:

1 Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina

2 Introduction The database selection problem for relational data sources  Demand for keyword-based searches of structured databases  What about searches over multiple structured databases?

3 Current Approach Document collections are summarized with keyword frequency lists Inadequate because…  Relational tables are typically normalized  Results must consider number of join operations Example: query Q = {multimedia, database, VLDB}

4 The Usefulness of a Database in Answering a Keyword Query A database is useful if it has high quality results to the keyword query. What are “high quality” results?  The results contain all the query keywords.  There is a keyword relationship – the query keywords can be connected meaningfully in the database.

5 This paper proposes… A method that summaries the relationships between keywords in a database. A ranking method based on the keyword summaries in order to select the most useful databases for a given keyword query.

6 Why Summaries? We have an equation which measures the usefulness of a database DB to query Q But calculating this score isn’t feasible for every database in the system! So we estimate the usefulness of the database using a summary of the relationships between all pairs of keywords in the database.

7 Results of a Keyword Query A minimal tree of tuples containing all the query keywords * tuples in the tree are connected according to their relationships defined in the database schema The more distant the relationships, the weaker the relationships connecting the tuples. distance - the number of join operations in a joining sequence of tuples Examples: t 1 t 2 t 3, distance = 2 t 4, distance = 0 DB2 of Figure 1, Q = {multimedia, binder}  t 1 t 5 t 12 or t 4 t 9 t 12, distance = 2  t 15 t 10 t 1 t 5 t 12, distance = 4

8 The Matrices Each relational database DB has several types of matrices  one D matrix – represents the presence or absence of each keyword in each tuple in DB  multiple T matrices – represents the relationship between tuples in a relational database at each distance  construct the W d matrices – the frequencies of keyword pairs at distance d; Example: Figure 3

9 Calculate the Key Relationship Matrix (KRM) w d (k i, k j ) is the frequency of d-distance joining sequences to connect the pair of keywords k i and k j is the maximum number of join operations (user-specified) Example: Figure 4 Q = {database:VLDB} for DB2 in Figure 3, the only non-zero is when d = 2 1/(2+1) * 1 occurrence = 0.33

10 Implementation in SQL The generation of the matrices: D, T 1, T 2, … T and W 0, W 1, …, W, for each DB, can be performed conveniently inside the local RDBMS using SQL. Tables are created for each matrix. Tables can be maintained dynamically when tuples are added to the database or old tuples are deleted.

11 Data Sources Selection Using KRM Estimate the score of the data source DB to Q through the scores of the individual pairs. With the KR-summary, we can effectively rank a set of databases. A higher score of the data source DB to Q indicates a better ranking.

12 Definitions & Metrics for Comparing Experimental Results real ranking – the “real” score of each database based on Equation 2-1 recall – compares the accumulated score of the top l databases selected based on the summaries of the source databases against the total available score when we select top l databases according to real ranking (summaries vs. real ranking) precision – measures the fraction of the top l selected databases which have high quality results

13 Experimental Results Observations: The selection performance of KR-summaries generally gets better as gets larger. The precision and recall performance for different values of tends to cluster into groups. There are big gaps in both precisions and recalls between KR-summaries when 0 ≤ ≤ 1 and when is greater.

14 Conclusion “The estimation method is effective in assigning high ranks to useful databases, although less relevant or irrelevant databases might be selected.”


Download ppt "Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina."

Similar presentations


Ads by Google