CAREER: Towards Unifying Database Systems and Information Retrieval Systems NSF IDM Workshop 10 Oct 2004 Jayavel Shanmugasundaram Cornell University.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
XML and Databases By Jared Foster. What is XML? Extensible Markup Language (XML) Similar to HTML XML is about 5 years old Allows information and services.
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
Information Retrieval and Databases: Synergies and Syntheses IDM Workshop Panel 15 Sep 2003 Jayavel Shanmugasundaram Cornell University.
COMP630 Paper Presentation by Haomian(Eric) Wang.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.
Overview of Search Engines
Welcome to CPSC 534B: Web Data Integration & Management Laks V.S. Lakshmanan Rm. CICSR Main Mall.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
2 September 2005VLDB Tutorial on XML Full-Text Search XML Full-Text Search: Challenges and Opportunities Jayavel Shanmugasundaram Cornell University Sihem.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Database Solutions for Storing and Retrieving XML Documents.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB Summarized.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Querying Structured Text in an XML Database By Xuemei Luo.
Flexible Text Mining using Interactive Information Extraction David Milward
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
ORBIS & PORTALS E-Journal Workshop Michael Markwith, TDNet Inc. Reed College Library May 9, 2002.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
1 The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search Sihem Amer-Yahia AT&T Labs Research - USA Database Department.
Gökay Burak AKKUŞ Ece AKSU XRANK XRANK: Ranked Keyword Search over XML Documents Ece AKSU Gökay Burak AKKUŞ.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
2 September 2005VLDB Tutorial on XML Full-Text Search XML Full-Text Search: Challenges and Opportunities Jayavel Shanmugasundaram Cornell University Sihem.
Core Integration Web Services Dean Krafft, Cornell University
XML and Database.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Soon Joo Hyun Database Systems Research and Development Lab. US-KOREA Joint Workshop on Digital Library t Introduction ICU Information and Communication.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
©2007 Really Strategies, Inc. CONFIDENTIAL 1 Native XML Content Management Philadelphia XML Users’ Group.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
One Platform for Mining Structured and Unstructured Data: Dream or Reality? VLDB Panel 13 Sep 2006 Jayavel Shanmugasundaram Yahoo! Research.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign RankFP : A Framework for Rank Formulation and Processing Hwanjo Yu, Seung-won.
Presented by: Shahab Helmi Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin.
Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University.
Database Research for the Current Millennium ICDE Panel 1 Apr 2004 Jayavel Shanmugasundaram Cornell University.
Text Search over XML Documents Jayavel Shanmugasundaram Cornell University.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Overview of XML Data Management Research at Cornell Jayavel Shanmugasundaram Cornell University.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
Proposal for Term Project
XRANK: Ranked Keyword Search over XML Documents
eXtensible Markup Language (XML)
TargetDB and PEPCDB •
Information Retrieval and Web Design
Metadata supported full-text search in a web archive
Introduction to XML IR XML Group.
Presentation transcript:

CAREER: Towards Unifying Database Systems and Information Retrieval Systems NSF IDM Workshop 10 Oct 2004 Jayavel Shanmugasundaram Cornell University

10000 foot view of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems

10000 foot view of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems Text search in databases Ranking based on structured values

Internet Archive Database Movies Name 10Amateur Film … they stand on the golden gate bridge and … Description Mid … …… 20American Thrift… golden gate bridge with statue of liberty … SELECT * FROM Movies M ORDER BY score(M.description, “golden gate”) FETCH TOP 10 RESULTS ONLY Traditional IR scoring methods (e.g., TF*IDF) often not very meaningful in this context –Developed for stand-alone document collections

Internet Archive Database Movies Name 10Amateur Film … they stand on the golden gate bridge and … Description Mid … …… 20American Thrift… golden gate bridge with statue of liberty … Reviews Name 10bleblanc 2 Rating Mid 20 cooker4 10harry1 Rid alice5904 ………… Statistics Visits Downloads Mid Sid ………… Structured Value Ranking (SVR)

Structured Value Ranking Use structured data values associated with text columns to score results Main technical challenge –Need to produce top-k results efficiently Order inverted lists by score –But scores change frequently [Aizen et al., 2004] Flash crowds on Internet Recent award announcements –How can we process top-k results efficiently while allowing frequent score updates?

Solution Overview Order inverted lists by score –Queries efficient –Score updates slow Order inverted lists by document id –Queries slow –Score updates efficient Hybrid solution: order inverted lists by chunk –Order chunks by score –Order documents within chunk by id Guo et al. [ICDE 2005]

10000 foot view of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems

Applications Content management –Mix of structured and unstructured data Database with date and time of accident (structured data) and accident description (unstructured data) –Semi-structured data Scientific documents, Shakespeare’s plays, … Support flexible keyword search interface over mix of structured and unstructured data –XRANK [Guo et al., SIGMOD 2003]

XML Keyword Search XML and Information Retrieval: A SIGIR 2000 Workshop David Carmel, Yoelle Maarek, Aya Soffer XQL and Proximal Nodes Ricardo Baeza-Yates Gonzalo Navarro We consider the recently proposed language … Searching on structured text is becoming more important with XML … … … Most specific results (exploits structure!) Ranking at granularity of elements (generalizes PageRank)

10000 foot view of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems

Applications The Internet is enabling end-users to directly ask queries and explore results –E.g., Used car marketplace –Find all “bright red ford mustangs” that cost less than 20% of the average price of cars in its class Characteristics of queries –Keyword search (for ease of use) –Complex query operations (information synthesis) –Want to see ranked results!

Towards Unifying DB and IR No standard query language for both DB and IR –SQL, XQuery mostly “database query languages” Have developed TeXQuery: a full-text search extension to XQuery –Amer-Yahia et al. (WWW 2004) –Full composability of database and IR primitives, ranking –Adopted as the precursor to the XQuery full-text extensions currently being developed by the W3C Come see demo tomorrow

Related Work Integrating DB and IR systems –For the most part, treat individual systems as “black boxes” –Our goal is to unify DB and IR systems Search over Semi-Structured Data –Specialized techniques for search semi-structured data –Our goal is to generalize DB and IR techniques Keyword search and ranking in databases

Summary Many emerging applications require a unification of DB and IR techniques –E-commerce applications –Semi-structured documents –Content management Argues for a new generation of systems and techniques that seamlessly provide this capability –SVR, XRank, TeXQuery, … Educational benefit: present unified view of data management –Currently at graduate level –Eventually introduce concepts at undergraduate level