Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn

Slides:



Advertisements
Similar presentations
George Anadiotis, Spyros Kotoulas and Ronny Siebes VU University Amsterdam.
Advertisements

MapReduce.
P2PIR'06: "Distributed Cache Table (DCT)" Gleb Skobeltsyn, Karl Aberer D istributed T able: Efficient Query-Driven Processing of Multi-Term Queries in.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Engineering a Set Intersection Algorithm for Information Retrieval Alex Lopez-Ortiz UNB / InterNAP Joint work with Ian Munro and Erik Demaine.
Case Study: BibFinder BibFinder: A popular CS bibliographic mediator –Integrating 8 online sources: DBLP, ACM DL, ACM Guide, IEEE Xplore, ScienceDirect,
Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis.
A Distributed Indexing Strategy for Efficient XML Retrieval Efficiency Issues in Information Retrieval Workshop 30th European Conference on Information.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
ODISSEA: a Peer-to-Peer Architecture for Scalable Web Search and IR Torsten Suel with C. Mathur, J. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, K. Shanmugasunderam.
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
1 Unique identifiers for the Web Zoltan Miklos Joint work with Gleb Skobeltsyn, Saket Sathe, Nicolas Bonvin, Philippe Cudré-Mauroux, Ekaterini Ioannou,
Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,
Supporting Ranked Search in Parallel Search Cluster Networks Fang XiongQiong LuoDyce Jing Zhao {xfang, luo, Hong Kong University of.
OverCite: A Cooperative Digital Research Library Jeremy Stribling, Isaac G. Councill, Jinyang Li, M. Frans Kaashoek, David Karger, Robert Morris, Scott.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
G.Skobeltsyn | Query-Driven Indexing for Scalable P2P Text Retrieval Query-Driven Indexing for Scalable P2P Text Retrieval Infoscale’07, June 6-8, 2007.
G.Skobeltsyn | Query-Driven Indexing for P2P Text Retrieval Query-Driven Indexing for P2P Text Retrieval The Future of Web Search Bertinoro,
PNear Combining Content Clustering and Distributed Hash-Tables Ronny Siebes Vrije Universiteit, Amsterdam The netherlands
MPI Informatik 1/17 Oberseminar AG5 Result merging in a Peer-to-Peer Web Search Engine Supervisors: Speaker : Sergey Chernov Prof. Gerhard Weikum Christian.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Network Technologies essentials Week 9: Distributed file sharing & multimedia Compilation made by Tim Moors, UNSW Australia Original slides by David Wetherall,
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network Toan Luu, Gleb Skobeltsyn, Fabius Klemm, Maroje Puh, Ivana Podnar Zarko, Martin.
Semantic based P2P System for local e-Government Fernando Ortiz-Rodriguez 1, Raúl Palma de León 2 and Boris Villazón-Terrazas 2 1 1Universidad Tamaulipeca.
Heavy-Tailed Distribution and Multi-Keyword Queries Surajit Chaudhuri, Kenneth Church, Arnd Christian K ö nig, Liying Sui Microsoft Corporation SIGIR 2007.
Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys ICDE 2007 Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys Ivana.
National Institute of Advanced Industrial Science and Technology Query Processing for Distributed RDF Databases Using a Three-dimensional Hash Index Akiyoshi.
Efficient Processing of Top-k Spatial Preference Queries
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Data Indexing in Peer- to-Peer DHT Networks Garces-Erice, P.A.Felber, E.W.Biersack, G.Urvoy-Keller, K.W.Ross ICDCS 2004.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Bandwidth-Efficient Continuous Query Processing over DHTs Yingwu Zhu.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
G.Skobeltsyn | Web Text Retrieval with a P2P Query-Driven Index Web Text Retrieval with a P2P Query-Driven Index Gleb Skobeltsyn EPFL, Lausanne Switzerland.
What’s an HRIS? Any system that supports any aspect of the HR function Primary function: –Documenting the employment relation in all it’s complexity Basic.
Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas † George Pallis.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
INTERNET AND . WHAT IS INTERNET The Internet can be defined as the wired or wireless mode of communication through which one can receive, transmit.
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
The Internet and WWW. This term we shall be learning more about the Internet and World Wide Web and their benefits and uses. We shall also be learning.
Gleb Skobeltsyn Flavio Junqueira Vassilis Plachouras
Proposal for Term Project
The Internet.
The Internet An Overview.
All About the Internet.
CloudAnt: Database as a Service (DBaaS)
Bookmark-driven Query Routing in Peer-to-Peer Web Search
Presentation transcript:

Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn * I.Podnar is currently affiliated with University of Zagreb, Croatia ** The work presented in this paper was (partly) carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European projects BRICKS (507457) and ALVIS (002068). * I.Podnar is currently affiliated with University of Zagreb, Croatia ** The work presented in this paper was (partly) carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European projects BRICKS (507457) and ALVIS (002068). G.Skobeltsyn, T.Luu, I.Podnar *, M.Rajman, K.Aberer Experiments: retrieval quality of the query-driven index when compared to Google Our goal: Our goal:Features: Features: Low bandwidth bounded size - Low bandwidth during retrieval as posting lists of bounded size are transmitted, adaptspopularity - The content of the index adapts to the current query popularity distribution, Tradeoff - Tradeoff between retrieval quality and index size (i.e., indexing cost). Scalable full text web retrieval in a structured P2P network. Processing the query abc with a query-driven index More details in: Skobeltsyn et al: “Query-Driven Indexing for Scalable Peer-to- Peer Text Retrieval”, in Infoscale’07, Suzhou, China, 2007 Skobeltsyn et al: “Web Text Retrieval with a P2P Query-Driven Index”, in SIGIR’07, Amsterdam, The Netherlands, Alvis project web site: Overlap achieved for different sizes of the query log measured in number of days with QF min =1, DF max =600 Overlap achieved for different values of DF max with QF min =1 Overlap achieved for different values of QF min /3 months with DF max =600 what did babe ruth do in the 1920 >id=481, q=“what did babe ruth do in the 1920” “1920 babe ruth”, qf=0 ----> 100% “1920 babe”, qf= > 9% 1920 ruth33% + “1920 ruth”, qf= > 33% babe ruth 69% + “babe ruth”, qf= > 69% - “1920”, qf= > 1% - “babe”, qf= > 2% - “ruth”, qf= > 7% % Size: 192, Keys used: 2, 94% Top-20 overlap measure: Google compare top-DF max Google results indexed Use Google to answer a query and compare it to the union of top-DF max Google results for each of its indexed keys, indexed QF min Keys are indexed if contained in more than QF min queries in the global query history. Example of resolving a query: truncated posting lists carefully selected term combinations A distributed query-driven index – maintains truncated posting lists (TPLs), storing top-DF max document references, for carefully selected term combinations (keys) top-k currently indexed To process a multi-term query abc we compute the top-k results by collecting (truncated) posting lists for currently indexed combinations, e.g., ab or bc. popularnon-redundant We maintain a global query history and use it to identify popular (qf≥QF min ) and non-redundant combinations Distributed query-driven index: single term Distributed single term index – maintains global posting lists for each single term in a DHT intersects To process a multi-term query abc it intersects the full posting lists of a, b and c. unscalable Intersections lead to unscalable retrieval traffic The naïve approach: