NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
XML DOCUMENTS AND DATABASES
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Information Retrieval and Databases: Synergies and Syntheses IDM Workshop Panel 15 Sep 2003 Jayavel Shanmugasundaram Cornell University.
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Keyword Proximity Search on XML Graphs Vagelis Hristidis Yannis Papakonstatinou Andrey Presenter: Feng Shao.
CAREER: Towards Unifying Database Systems and Information Retrieval Systems NSF IDM Workshop 10 Oct 2004 Jayavel Shanmugasundaram Cornell University.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Modern Information Retrieval Chapter 4 Query Languages.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Overview of Search Engines
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Querying Structured Text in an XML Database By Xuemei Luo.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
The european ITM Task Force data structure F. Imbeaux.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Data Mining for Web Intelligence Presentation by Julia Erdman.
1 Of Crawlers, Portals, Mice and Men: Is there more to Mining the Web? Jiawei Han Simon Fraser University, Canada ACM-SIGMOD’99 Web Mining Panel Presentation.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
CHORUS What is « Search » A functional view Henri Gouraud WP2.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Traffic Source Tell a Friend Send SMS Social Network Group chat Banners Advertisement.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Toshiyuki Shimizu (Kyoto University)
Declarative Creation of Enterprise Applications
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Introduction to XML IR XML Group.
Presentation transcript:

NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality services Critical Challenges: Efficiency Result presentation …… 1. Internet users search information with search engines Expect to query databases in the same way Not know the database schema and SQL 2. Hidden Web problem Most of data on the Web are stored in databases. They are hidden to Web search engines because of the mismatch of search interfaces between Web search engines and databases 3. Integrating different classes of information systems Modern information systems manage many kinds of data structured relational data semi-structured XML documents unstructured text documents …… Keyword search: to provide a unified query language/interface NUITS High efficiency User-friendly result display Supporting advanced keyword queries Characteristics of NUITS Input: In addition to simple keyword queries (a set of keywords), advanced queries with specified conditions are also supported Output: Search results can be organized into clusters, facilitating users' quick browsing Search Engine: Adopt a new and efficient search algorithm ArchitectureArchitecture

Keyword Query Specification Simple keyword: just a keyword eg: database Typed keyword: a type can be either a relation-name or attribute-name eg: Paper:database Author:* Writer:* Conditional keyword: conditions associated with a keyword eg: database year>2000 database year~2000 Keyword query: a set of keywords associated with Boolean operators Q ::= p | (Q) | Q AND Q | Q OR Q | NOT Q p: a keyword which can be a simple keyword, typed keyword, or conditional keyword Q: a keyword query eg: Ullman AND (database OR algorithm) TreeCluster: Clustering Results In order to assist users to find the needed results, we propose to organize the result trees into two levels of clusters  Structural-Level Clustering: Clustering trees using tree isomorphism. The two trees below are isomorphic, and both mean the two authors Hristidis and Papakonstantinou coauthor papers Search Algorithm NUITS adopts a data-graph-based search algorithm, and each result is a tuple-connection-tree The algorithm proposes a dynamic programming approach to find the optimal top-1 with a low time complexity It computes top-k minimum cost tuple-connection-trees one- by-one incrementally, and does not need to compute or sort all result trees in order to find the top-k results An example of result treeAnother example of result tree The structural pattern of the above two trees  Content-Level Clustering: Based on keyword frequencies and content similarity. If the size of the cluster is larger than the user-given threshold after structural-level clustering, the content- level clustering further clusters result trees eg: Considering a keyword query Gray Database, we cluster the results based on content of author names, because Gray occurs less frequently than Database in DBLP dataset. Then there may be clusters for different authors, such as Jim Gray or W.A. Gray Note: All examples in this poster are based on DBLP dataset

Structural pattern Implementation of TreeCluster TreeCLuster is cost effective: Use labels to represent schema information of each result tree and reformulate the clustering problem as a problem of judging whether labeled trees are isomorphic Rank user keywords according to their frequencies in databases, and further partition the large clusters based on keyword nodes Give each cluster a readable description, and present the description and each result graphically Structural-Level Clusters Content-Level Clusters Tuple-connection-tree ArchitectureArchitecture