HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
1 2/14/05CS120 The Information Era Searching the Web Don’t we already know how to do this?
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
HyKSS: A Multiple Ontology Approach to Hybrid Search Andrew Zitzelberger Brigham Young University MS Thesis Proposal.
A Framework for Pay-as-you-go Extraction Ontology Based Information Retrieval Andrew Zitzelberger.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
BYU Craigslist Alerter Oliver Nina, Meher Shaikh Andrew Zitzelberger.
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger.
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Patient Experience Scores on [Hospital/Clinic website] [Dateline]
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Review Analysis WWW2012 Weinan Zhang 29 Feb
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Chapter 6: Information Retrieval and Web Search
Search. Search issues How do we say what we want? –I want a story about pigs –I want a picture of a rooster –How many televisions were sold in Vietnam.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
Facilitating Document Annotation using Content and Querying Value.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Service discovery with semantic alignment Alberto Fernández AT COST WG1 meeting, Cyprus, Dec, 2009.
Yixin Chen and James Z. Wang The Pennsylvania State University
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
ENHANCING CLUSTER LABELING USING WIKIPEDIA David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab SIGIR’09.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,
Facilitating Document Annotation Using Content and Querying Value.
Yoon kyoung-a A Semantic Match Algorithm for Web Services Based on Improved Semantic Distance Gongzhen Wang, Donghong Xu, Yong Qi, Di Hou School.
General Architecture of Retrieval Systems 1Adrienn Skrop.
INFORMATION RETRIEVAL Pabitra Mitra Computer Science and Engineering IIT Kharagpur
OUTLINE Basic ideas of traditional retrieval systems
Cross-language Information Retrieval
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Christian Wolf Jean-Michel Jolion Françoise Chassaing
Information Retrieval and Web Design
Presentation transcript:

HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1

Keyword Search 2

Form Based Search 3

4 over 8,000 meters in elevationless than 100K milesfaster than 100 mph What about?

5

HyKSS Hybrid Keyword and Semantic Search Semantics – extracted annotations – Multiple ontologies Keywords – text 6

Thesis Statement HyKSS (hybrid search) – Outperforms keyword and semantic search – Dynamic query weighting outperforms various other hybrid search approaches – Allows queries over multiple ontologies – Allows pay-as-you-go improvement 7

Extraction Ontologies 8

Data Frames 9

Indexing Architecture 10 Keyword IndexerSemantic Indexer Keyword IndexSemantic Index Document Collection

Indexing Architecture Implementation 11 Keyword Indexer Semantic Indexer Keyword Index Semantic Index Document Collection OntoES Ontology Library Sesame Lucene

Query Processing 12 Free Form Query Execute Query Post-Process Query Combine Results Pre-Process Query Execute Query Post-Process Query Pre-Process Query Keyword ProcessingSemantic Processing

Keyword Query Pre-Processing 13 Remove Lucene special characters (except quotes) Remove (inequality) comparison constraints Remove non-phrase stopwords hondas in "excellent condition" in orem for under 12 grand hondas “excellent condition” orem

Keyword Query Execution and Post-Processing Executed by Lucene Empty Post-Processing step 14

Semantic Query Pre-Processing Individual Ontology Scoring hondas in "excellent condition" in orem for under 12 grand 15

Semantic Query Pre-Processing Ontology Set Creation For each ontology sorted by score: – For each remaining ontology: Add point for each new or subsuming match If added points > 0 add ontology Completely subsumed ontologies are removed during query generation 16

Semantic Query Pre-Processing Ontology Set Creation 17 Price < LocationVehicle ContractualServices Location Vehicle Contractual Services Vehicle_Score + 1 US_City=“orem” Price < Price < ContractualServices_Score + 1 Vehicle_Score US_City=“orem”

Semantic Query Pre-Processing Structured Query Generation Open world assumption SPARQL query 18

Semantic Query Execution and Post-Processing Sesame query execution Semantic ranking: – 1 point for each requested projection satisfied – Normalized by # of projections requested hondas in "excellent condition" in orem for under 12 grand – Projections on Make, Price and US_City 19

Hybrid Query Processing Linear interpolation: – (kw_weight * kw_score) + (sm_weight * sm_score) Dynamic solution: – # keywords remaining (#kw) – concept match score (cms) = ½ * (selections + projections) – kw_weight = #kw/(#kw + cms) – sm_weight = cms/(#kw + cms) 20

Basic Search 21

Results Display 22

23 Form Based Search

Results Display

Experimental Setup – Ontology Libraries 5 Ontology Levels – Number – Generic Units – Vehicle Units – Vehicle – Vehicle+ 25

Experimental Setup – Query Sets 113 syntactically unique queries from database students 60 syntactically unique queries from linguistic students 26

Experimental Setup – Document Collection 250 vehicle advertisements (Craigslist) – 100 training, 50 validation, 100 test 318 mountain pages (Wikipedia) 66 roller coaster (Wikipedia) 88 video game advertisements (Craigslist) 27

Experiments 1)Training queries over test vehicle documents 2)Test queries over test vehicle documents 3)Training queries over test vehicle documents + additional noise 4)Test queries over test vehicle documents + additional noise 5)5 queries over noisy data (Generic Units only) 28

Experiments - Metric Mean Average Precision 29

Experimental Results 30

Experimental Results 31

Experimental Results 32

Conclusions Hybrid search outperforms keyword and semantic search HyKSS’s dynamic query weighting approach outperforms various other weighting techniques Using multiple does not outperform selecting and using a single ontology 33

External Image Citations Slide 2 Google search screenshot: (07/30/11) Slide 3 partial car search form screenshots: (07/30/11) Slide 4 mountain image: (04/26/11) Slide 4 car image: (04/26/11) Slide 4 roller coaster image: (04/26/11) Slide 4 Wikipedia logo: (04/26/11) Slide 4 craigslist logo: (04/26/11) 34