Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.

Slides:



Advertisements
Similar presentations
Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
Advertisements

Chen Li ( 李晨 ) Chen Li Search As You Type Joint work with colleagues at UCI and Tsinghua.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Spatio-temporal Databases
Introduction to Information Retrieval
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Spatio-temporal Databases Time Parameterized Queries.
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
ADVISE: Advanced Digital Video Information Segmentation Engine
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
DBease: Making Databases User-Friendly and Easily Accessible Guoliang Li, Ju Fan, Hao Wu, Jiannan Wang, Jianhua Feng Database Group, Department of Computer.
Experimenting Lucene Index on HBase in an HPC Environment Xiaoming Gao Vaibhav Nachankar Judy Qiu.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
A fast algorithm for the generalized k- keyword proximity problem given keyword offsets Sung-Ryul Kim, Inbok Lee, Kunsoo Park Information Processing Letters,
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
PEERSPECTIVE.MPI-SWS.ORG ALAN MISLOVE KRISHNA P. GUMMADI PETER DRUSCHEL BY RAGHURAM KRISHNAMACHARI Exploiting Social Networks for Internet Search.
The Sweet Spot between Inverted Indices and Metric-Space Indexing for Top-K–List Similarity Search Evica Milchevski , Avishek Anand ★ and Sebastian Michel.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Efficient Semantic Based Content Search in P2P Network Heng Tao Shen, Yan Feng Shu, and Bei Yu.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Presented By Amarjit Datta
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Sudhanshu Khemka.  Treats each document as a vector with one component corresponding to each term in the dictionary  Weight of a component is calculated.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Query processing: optimizations Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 2.3.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01
An Efficient Algorithm for Incremental Update of Concept space
Proposal for Term Project
Information Retrieval in Practice
Information Retrieval and Web Search
Spatio-temporal Databases
Query Caching in Agent-based Distributed Information Retrieval
Data Mining Chapter 6 Search Engines
6. Implementation of Vector-Space Retrieval
A Small and Fast IP Forwarding Table Using Hashing
How to Read a Paper (Practice: CCS’14)
Efficient Cache-Supported Path Planning on Roads
Spatio-temporal Databases
Information Retrieval and Web Design
Discussion Class 9 Google.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented by: Priagung Khusumanegara 1

System finds answers to a query instantly while user types in keywords character-by-character. Fuzzy search improves user search experiences by finding relevant answers with keywords similar to query keywords. A main computational challenge in this paradigm is the high speed requirement At the same time, we also need good ranking functions that consider the proximity of keywords to compute relevance scores 2 Abstract

Problem Statement & Proposed Solution Problem Statement: o Achieving efficient time & space complexities. Solution: o Index phrases with proper indexing scheme and o Develop an incremental-computation algorithm for efficiently segmenting a query into phrases and computing relevant answers. Result Metrics: Experimental study on real data sets to show the tradeoffs between time, space, and quality of these solutions. 3

General Idea of Instant Search Instant search returns the answers immediately based on a partial query a user has typed in Many users prefer the experience of seeing the search results instantly and formulating their queries accordingly instead of being left in the dark until they hit the search button 4

Architecture Phrase Validator: When a search server receives a request, it first identifies all the valid phrases in the query that are in the dictionary D, and intersects their inverted lists. The Phrase Validator identifies the phrases (called “valid phrases”) in the query that are similar to a term in the dictionary D. 5

Architecture (Cont’d) Query Plan Builder: After identifying the valid phrases, the Query Plan Builder generates a Query Plan Q, which contains all the possible valid segmentations in a specific order. The ranking of Q determines the order in which the segmentations will be executed. 6

Architecture (Cont’d) Index Searcher: After Q is generated, the segmentations are passed into the Index Searcher one by one until the top-k answers are computed, or all the segmentations in the plan are used. 7

Architecture (Cont’d) Cache Module: The Phrase Validator uses the Cache module to validate a phrase without traversing the trie from scratch, While the Index Searcher benefits from the Cache by being able to retrieve the answers to an earlier query to reduce the computational cost. 8

Computing Valid Phrases 9

Generating Valid Segmentations 10

Incremental Computation of Valid Phrases 11

Example Table for Architecture Explanation This data is structured in indexed format. Two types of indices are used to structure this data 1. Trie Indices 2. Forward Indices 12

Index Structure Indices o Trie o Forward 13

Experiments In the experiments, they implemented the following method: 1.FindAll (“FA”) 2.QuerySegmentation (“QS”) 3.Term Pair (“TP”) 14

Efficiency of Computing Valid Phrases 15

Query Time 16

Cache Hit Rate 17

Scalability 18

Conclusion They studied how to improve ranking of an instant- fuzzy search system by considering proximity information when we need to compute top-k answers They presented an incremental-computation algorithm for finding the indexed phrases in a query efficiently The experiments on real data showed the efficiency of the proposed technique for 2-keyword and 3- keyword queries that are common in search applications. 19