Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda  Ranked retrieval Similarity-based ranking Probability-based ranking.
Final Project of Information Retrieval and Extraction by d 吳蕙如.
From last time What’s the real point of using vector spaces?: A user’s query can be viewed as a (very) short document. Query becomes a vector in the same.
HyKSS: A Multiple Ontology Approach to Hybrid Search Andrew Zitzelberger Brigham Young University MS Thesis Proposal.
Information Retrieval Review
ISP 433/533 Week 2 IR Models.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
A Framework for Pay-as-you-go Extraction Ontology Based Information Retrieval Andrew Zitzelberger.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Recognition and Satisfaction of Constraints in Free-Form Task Specification Muhammed Al-Muhammed.
Modern Information Retrieval Chapter 1 Introduction.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1.
Chapter 5: Information Retrieval and Web Search
Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.
1 The BT Digital Library A case study in intelligent content management Paul Warren
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
1 Searching through the Internet Dr. Eslam Al Maghayreh Computer Science Department Yarmouk University.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 6: Information Retrieval and Web Search
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Linear Writing vs. Interactive Writing MM Writing for Multimedia.
Introduction to Digital Libraries Information Retrieval.
Mr C Johnston ICT Teacher G042 – Lecture 02 Using Logical Operators To Aid Searching.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Sigir’99 Inside Internet Search Engines: Spidering and Indexing Jan Pedersen and William Chang.
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Cross-language Information Retrieval
Information Retrieval on the World Wide Web
Multimedia Information Retrieval
Information Retrieval
So You Have to Write a Research Paper!
CS & CS Capstone Project & Software Development Project
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Online Search Engines IBT
Information Retrieval and Web Design
Recuperação de Informação
Presentation transcript:

Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger

Problem

Constraint Based Queries

Queries Test Queries 1) Find me a Wii game. 2) Find me a Honda for under 15 thousand dollars. 3) Roller Coaster more than 150 feet high 4) mountains at least 15K feet 5) games under $25 6) mountains less than 4 km 7) ps games < $40 8) coasters longer than 1000 feet 9) car for under 5 grand newer than 1990 with less than 115K miles 10) more than 15K miles under 5 grand newer than 2004

Keywords + Semantics Semantic queries are computationally expensive Keyword queries are fast and simple o People are used to keyword queries Synergistic solution: o extract numerical constraints from the query o use keywords to quickly narrow the search space o use constraints as a filter

Data Frames Price internal representation: Double external representation: \$[1-9]\d{0,2}(,\d{3})*| right units: (K)?\s*(cents|dollars|[Gg]rand|...) canonicalization method: toUSDollars comparison methods: LessThan(p1: Price, p2: Price) returns (Boolean) external representation: (less than|<|under|...)\s*{p2}| end

Data Frame Library

Free Form Query Car under 6 grand newer than 1990 with less than 115K miles

Step 1: Condition Extraction Car under 6 grand newer than 1990 with less than 115K miles Extracted Conditions o (Price < 6000) o (Year > 1990) o (Distance < )

Step 2: Remove Condition Values Car under newer than with less than

Step 3: Remove Stopwords Car

Step 4: Perform Keyword Search

Step 5: Filter Document on Constraints Keep page if every constraint is satisfied by at least one extracted value

Experimental Setup 300 web documents o 100 car+trucks pages from o 100 video gaming pages from o 50 mountain pages from o 50 roller coaster pages from 10 queries o 8 with usable conditions 2 data sets o test-development o blind test

Results Summary Precision increase for 56% of queries o 75% for test-dev, 50% for blind-test Precision never worse than keyword query Most effective for short, focused documents

Discussion Issues: 1.inadequate narrowing or ranking of search space 2.noise caused by other numbers Distance <

Future Work Scalability o Indexing data frame extracted terms Precision vs Recall trade-offs Pay-as-you-go search construction

Related Work Question-Answering Systems Keyword search over databases and semantic stores

Questions?

Results (Test-Dev Set)

Results (Blind Test Set)