Paraskevi Raftopoulou, Euripides G.M. Petrakis

Slides:



Advertisements
Similar presentations
Jan SedmidubskyOctober 28, 2011Scalability and Robustness in a Self-organizing Retrieval System Jan Sedmidubsky Vlastislav Dohnal Pavel Zezula On Investigating.
Advertisements

Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford ICDCS 2002.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Small-world Overlay P2P Network
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
A Scalable Semantic Indexing Framework for Peer-to-Peer Information Retrieval University of Illinois at Urbana-Champain Zhichen XuYan Chen Northwestern.
Dissemination protocols for large sensor networks Fan Ye, Haiyun Luo, Songwu Lu and Lixia Zhang Department of Computer Science UCLA Chien Kang Wu.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany
PNear Combining Content Clustering and Distributed Hash-Tables Ronny Siebes Vrije Universiteit, Amsterdam The netherlands
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
P2P Concept Search Fausto Giunchiglia Uladzimir Kharkevich S.R.H Noori April 21st, 2009, Madrid, Spain.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Rendezvous Regions: A Scalable Architecture for Service Location and Data-Centric Storage in Large-Scale Wireless Sensor Networks Karim Seada, Ahmed Helmy.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Semantic Overlay Networks in P2P systems A. Crespo, H. Garcia-Molina Speaker: Pavel Serdyukov Tutor: Jens Graupmann.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Peer to Peer Network Design Discovery and Routing algorithms
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Nov. 29, 2006GLOBECOM /17 A Location-based Directional Route Discovery (LDRD) Protocol in Mobile Ad-hoc Networks Stephen S. Yau, Wei Gao, and Dazhi.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Integrating Semantics-Based Access Mechanisms with P2P File Systems
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CS 5565 Network Architecture and Protocols
Cohesive Subgraph Computation over Large Graphs
Virtual Direction Routing
Collection Fusion in Carrot2
Peer-to-Peer Data Management
A Study of Group-Tree Matching in Large Scale Group Communications
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Wireless Sensor Network Architectures
On Growth of Limited Scale-free Overlay Network Topologies
ODMRP Enhancement.
Database Performance Tuning and Query Optimization
Content-Based Music Information Retrieval in Wireless Ad-hoc Networks
Plethora: Infrastructure and System Design
Zhichen Xu, Mallik Mahalingam, Magnus Karlsson
Early Measurements of a Cluster-based Architecture for P2P Systems
by Saltanat Mashirova & Afshin Mahini
Student: Fang Hui Supervisor: Teo Yong Meng
Joydeep Chandra, Santosh Shaw and Niloy Ganguly
Mobile P2P Data Retrieval and Caching
Ying Dai Faculty of software and information science,
Spectrum Sharing in Cognitive Radio Networks
Motion-Aware Routing in Vehicular Ad-hoc Networks
Chapter 11 Database Performance Tuning and Query Optimization
A Semantic Peer-to-Peer Overlay for Web Services Discovery
DNSR: Domain Name Suffix-based Routing in Overlay Networks
Presentation transcript:

iCluster: a Self-Organising Overlay Network for P2P Information Retrieval Paraskevi Raftopoulou, Euripides G.M. Petrakis Dept. of Electronic and Computer Engineering Technical University of Crete (TUC) Chania, Crete, Greece ECIR 2008

Motivation and Objectives Information sharing in P2P networks can be slow or inaccurate large & distributed data collections random overlay networks & blind query forwarding We want to provide real-time, low-overhead, dynamic sharing of available information Query processing can be speeded-up by restricting the search to peers similar to the query 4/12/2018 iCluster - ECIR08

Outline Background iCluster Extensions Semantic overlay networks Rewiring strategies iCluster Overview Protocols Evaluation Extensions 4/12/2018 iCluster - ECIR08

P2P Indexing Distributed Hash Tables (DHTs)  provide fast lookup mechanisms  lay aside peer autonomy  strictly coupled with the data model and the query language Semantic Overlay Networks (SONs)  support peer autonomy  loose coupling of peers  specialisation assumption 4/12/2018 iCluster - ECIR08

Semantic Overlay Networks virtually connected peers based on their content routing indices with links to other peers peers connected to each other are called neighbors support rich data models and expressive query languages provide semantic (and social) information about peers p3 p2 p6 p7 p8 p5 p1 p4 physical network overlay network p3 p2 p6 p7 p8 p5 p1 p4 p3 p2 p6 p7 p8 p5 p1 p4 RI4 p1 p7 Are formed by neighborhoods of logically interconnected peers 4/12/2018 iCluster - ECIR08

Rewiring strategies Techniques for self-organising peers: abandon old connections and create new ones periodic process Inspired by the ‘small world effect’ reach anybody in a small number of routing hops inter-cluster or long-range links intra-cluster or short-range links 4/12/2018 iCluster - ECIR08

Searching Queries are propagated to the regions of the network having more chances to match the query query query 4/12/2018 iCluster - ECIR08

What has been proposed? SONs created using similar documents [Klampanos et al. 04] concepts from ontologies [Schmitz 04] Hierarchies of clusters [Doulkeridis et al. 06] bottlenecks at cluster gateways complex structure to maintain Peer specialisation assumption (one interest/peer) not a realistic assumption too general / too specific content 4/12/2018 iCluster - ECIR08

iCluster An approach towards efficient peer organisation into SONs to support IR functionality iCluster is automatic (no user intervention) general (no previous knowledge & all data types) adaptive (adjusts to dynamic content changes) efficient (fast query processing) accurate (high recall) 4/12/2018 iCluster - ECIR08

Contributions Drops specialisation assumption multiple peer interests evolving peer interests better representation of peer contents Novel strategy for self-organising peers any peer may exploit a rewiring message reserve a portion of the RI for long-range links Supports IR in a dynamic environment data model independent 4/12/2018 iCluster - ECIR08

Peer join When a peer p joins the network applies a local clustering algorithm computes its interests {c1, c2, …, cL} connects to λ peers for each interest ck These (randomly selected) links will be refined using peer rewiring c1 c2 c3 RI1 RI2 RI3 p contents 4/12/2018 iCluster - ECIR08

Peer rewiring A peer p computes its intra-cluster similarity (average similarity with its short-range links) initiates rewiring if similarity ≥ threshold θ sends a message (msg) with its interest to m neighbors All peers receiving msg append their interest and forward msg to m neighbors The message is sent back to p when TTL = 0 4/12/2018 iCluster - ECIR08

Peer rewiring (insights) Each peer starts the rewiring process independently of other peers based on its local view of the network Message is forwarded non-deterministically: to m peers most similar to p or to m randomly selected peers All peers receiving msg update both short- & long-range links 4/12/2018 iCluster - ECIR08

Query processing (1/2) Locate a cluster of peers similar to the query A peer p issuing the query q initiates a message (msg) compares q against its interests if sim(q,p) ≥ θ broadcast msg in p’s neighborhood if sim(q,p) < θ forward msg to the m peers most similar to q 4/12/2018 iCluster - ECIR08

Query processing (2/2) Find the documents similar to the query Each peer p receiving a msg compares q against its interests if sim(q,p) ≥ θ p matches q against its locally stored content p retrieves similar documents Pointers to similar documents are sent to the initiator peer p Candidate answers are ordered by similarity to q and returned to the user 4/12/2018 iCluster - ECIR08

Experimental setting Data sets Experimental set-up a subset of OHSUMED TREC: 32K medical documents, 10 classes TREC-6: 556K general interest documents, 100 classes Experimental set-up 2K peers 10 links per peer (10% long-range links) 5 initial network topologies (5 runs for each topology) Experiments with different parameter values (θ, TTL, …) Compare against the [Schmitz 04] approach exhaustive search by flooding 4/12/2018 iCluster - ECIR08

Performance measures (weighted) Clustering coefficient measures cluster cohesion: is p linked with similar peers? ratio of similarity between peers over total number of links in a peer’s neighborhood Communication overhead Organisation messages: to organise peers into clusters Search messages: to answer a query Retrieval accuracy (recall) 4/12/2018 iCluster - ECIR08

Recall 4/12/2018 iCluster - ECIR08

Recall / Search message 4/12/2018 iCluster - ECIR08

Discussion / Observations More effective organisation and better recall for less messages (rewiring & search) Effects of different parameters Rewiring similarity threshold θ affects clustering cohesion Query effectiveness depends on rewiring similarity threshold θ and on query TTL IR can be highly improved by peer organisation high clustering can raise ‘caveman worlds’ long-range links ensure inter-cluster connectivity 4/12/2018 iCluster - ECIR08

Future Work Study the effect of churn in terms of peers in terms of content Assume different query distributions more realistic will improve query processing Extend the proposed protocols to support information filtering functionality 4/12/2018 iCluster - ECIR08

Thank U! 4/12/2018 iCluster - ECIR08

Weighted clustering coefficient 4/12/2018 iCluster - ECIR08

Search messages 4/12/2018 iCluster - ECIR08