Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn

Similar presentations


Presentation on theme: "Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn"— Presentation transcript:

1 Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn gleb.skobeltsyn@epfl.ch http://lsirpeople.epfl.ch/skobelts * I.Podnar is currently affiliated with University of Zagreb, Croatia ** The work presented in this paper was (partly) carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European projects BRICKS (507457) and ALVIS (002068). * I.Podnar is currently affiliated with University of Zagreb, Croatia ** The work presented in this paper was (partly) carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European projects BRICKS (507457) and ALVIS (002068). G.Skobeltsyn, T.Luu, I.Podnar *, M.Rajman, K.Aberer Experiments: retrieval quality of the query-driven index when compared to Google Our goal: Our goal:Features: Features: Low bandwidth bounded size - Low bandwidth during retrieval as posting lists of bounded size are transmitted, adaptspopularity - The content of the index adapts to the current query popularity distribution, Tradeoff - Tradeoff between retrieval quality and index size (i.e., indexing cost). Scalable full text web retrieval in a structured P2P network. Processing the query abc with a query-driven index More details in: Skobeltsyn et al: “Query-Driven Indexing for Scalable Peer-to- Peer Text Retrieval”, in Infoscale’07, Suzhou, China, 2007 Skobeltsyn et al: “Web Text Retrieval with a P2P Query-Driven Index”, in SIGIR’07, Amsterdam, The Netherlands, 2007 http://globalcomputing.epfl.ch/alvis Alvis project web site: http://globalcomputing.epfl.ch/alvis Overlap achieved for different sizes of the query log measured in number of days with QF min =1, DF max =600 Overlap achieved for different values of DF max with QF min =1 Overlap achieved for different values of QF min /3 months with DF max =600 what did babe ruth do in the 1920 >id=481, q=“what did babe ruth do in the 1920” “1920 babe ruth”, qf=0 ----> Ov@100= 100% “1920 babe”, qf=0 ---------> Ov@100= 9% 1920 ruth33% + “1920 ruth”, qf=1 ---------> Ov@100= 33% babe ruth 69% + “babe ruth”, qf=495 -------> Ov@100= 69% - “1920”, qf=716 ------------> Ov@100= 1% - “babe”, qf=3196 -----------> Ov@100= 2% - “ruth”, qf=1653 -----------> Ov@100= 7% 192 294% Size: 192, Keys used: 2, Overlap@100: 94% Top-20 overlap measure: Google compare top-DF max Google results indexed Use Google to answer a query and compare it to the union of top-DF max Google results for each of its indexed keys, indexed QF min Keys are indexed if contained in more than QF min queries in the global query history. Example of resolving a query: truncated posting lists carefully selected term combinations A distributed query-driven index – maintains truncated posting lists (TPLs), storing top-DF max document references, for carefully selected term combinations (keys) top-k currently indexed To process a multi-term query abc we compute the top-k results by collecting (truncated) posting lists for currently indexed combinations, e.g., ab or bc. popularnon-redundant We maintain a global query history and use it to identify popular (qf≥QF min ) and non-redundant combinations Distributed query-driven index: single term Distributed single term index – maintains global posting lists for each single term in a DHT intersects To process a multi-term query abc it intersects the full posting lists of a, b and c. unscalable Intersections lead to unscalable retrieval traffic The naïve approach:


Download ppt "Query-Driven Indexing for Peer-to-Peer Text Retrieval ** WWW 2007 Banff, Canada Contact: Gleb Skobeltsyn Contact: Gleb Skobeltsyn"

Similar presentations


Ads by Google