Presentation is loading. Please wait.

Presentation is loading. Please wait.

NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.

Similar presentations


Presentation on theme: "NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality."— Presentation transcript:

1 NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality services Critical Challenges: Efficiency Result presentation …… 1. Internet users search information with search engines Expect to query databases in the same way Not know the database schema and SQL 2. Hidden Web problem Most of data on the Web are stored in databases. They are hidden to Web search engines because of the mismatch of search interfaces between Web search engines and databases 3. Integrating different classes of information systems Modern information systems manage many kinds of data structured relational data semi-structured XML documents unstructured text documents …… Keyword search: to provide a unified query language/interface NUITS High efficiency User-friendly result display Supporting advanced keyword queries Characteristics of NUITS Input: In addition to simple keyword queries (a set of keywords), advanced queries with specified conditions are also supported Output: Search results can be organized into clusters, facilitating users' quick browsing Search Engine: Adopt a new and efficient search algorithm ArchitectureArchitecture

2 Keyword Query Specification Simple keyword: just a keyword eg: database Typed keyword: a type can be either a relation-name or attribute-name eg: Paper:database Author:* Writer:* Conditional keyword: conditions associated with a keyword eg: database year>2000 database year~2000 Keyword query: a set of keywords associated with Boolean operators Q ::= p | (Q) | Q AND Q | Q OR Q | NOT Q p: a keyword which can be a simple keyword, typed keyword, or conditional keyword Q: a keyword query eg: Ullman AND (database OR algorithm) TreeCluster: Clustering Results In order to assist users to find the needed results, we propose to organize the result trees into two levels of clusters  Structural-Level Clustering: Clustering trees using tree isomorphism. The two trees below are isomorphic, and both mean the two authors Hristidis and Papakonstantinou coauthor papers Search Algorithm NUITS adopts a data-graph-based search algorithm, and each result is a tuple-connection-tree The algorithm proposes a dynamic programming approach to find the optimal top-1 with a low time complexity It computes top-k minimum cost tuple-connection-trees one- by-one incrementally, and does not need to compute or sort all result trees in order to find the top-k results An example of result treeAnother example of result tree The structural pattern of the above two trees  Content-Level Clustering: Based on keyword frequencies and content similarity. If the size of the cluster is larger than the user-given threshold after structural-level clustering, the content- level clustering further clusters result trees eg: Considering a keyword query Gray Database, we cluster the results based on content of author names, because Gray occurs less frequently than Database in DBLP dataset. Then there may be clusters for different authors, such as Jim Gray or W.A. Gray Note: All examples in this poster are based on DBLP dataset

3 Structural pattern Implementation of TreeCluster TreeCLuster is cost effective: Use labels to represent schema information of each result tree and reformulate the clustering problem as a problem of judging whether labeled trees are isomorphic Rank user keywords according to their frequencies in databases, and further partition the large clusters based on keyword nodes Give each cluster a readable description, and present the description and each result graphically Structural-Level Clusters Content-Level Clusters Tuple-connection-tree ArchitectureArchitecture


Download ppt "NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality."

Similar presentations


Ads by Google