Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Slides:



Advertisements
Similar presentations
A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance Found in: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO.
Advertisements

Scalable Content-Addressable Network Lintao Liu
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
Expediting Searching Processes via Long Paths in P2P Systems 05/30 IDEA Lab.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer-to-peer Multimedia Streaming and Caching Service Jie WEI, Zhen MA May. 29.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
A Trust Based Assess Control Framework for P2P File-Sharing System Speaker : Jia-Hui Huang Adviser : Kai-Wei Ke Date : 2004 / 3 / 15.
Responder Anonymity and Anonymous Peer-to-Peer File Sharing. by Vincent Scarlata, Brian Levine and Clay Shields Presentation by Saravanan.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 CAPS: A Peer Data Sharing System for Load Mitigation in Cellular Data Networks Young-Bae Ko, Kang-Won Lee, Thyaga Nandagopal Presentation by Tony Sung,
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
HERO: Online Real-time Vehicle Tracking in Shanghai Xuejia Lu 11/17/2008.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
ECO-DNS: Expected Consistency Optimization for DNS Chen Stephanos Matsumoto Adrian Perrig © 2013 Stephanos Matsumoto1.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Serverless Network File Systems Overview by Joseph Thompson.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
PRoPHET+: An Adaptive PRoPHET- Based Routing Protocol for Opportunistic Network Ting-Kai Huang, Chia-Keng Lee and Ling-Jyh Chen.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
SocialVoD: a Social Feature-based P2P System Wei Chang, and Jie Wu Presenter: En Wang Temple University, PA, USA IEEE ICPP, September, Beijing, China1.
Universitatea Politehnica Bucureşti - Facultatea de Automatică şi Calculatoare TOWARDS A SECURE DATA SHARING PEER-TO-PEER NETWORK BASED ON GEOMETRIC AND.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Early Measurements of a Cluster-based Architecture for P2P Systems
Presentation transcript:

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary, USA 2 Los Alamos National Laboratory, USA 3 Michigan State University, USA

Peer-to-Peer Search Two Performance Objectives –Individual peer: improve the search quality –Internet management: minimize the search cost Fast, fast, fast, and the more the better! P2P user Don’t be so greedy, the Internet is shared by all the people! Network manager

Existing Solutions Generally aim to one of the two objectives and have performance limits to the other Flooding: –Most effective for user’s experience –Least efficient for network resource utilization Random walk: –Traffic efficient, but –Long response time and limited number of search results

Super-Node Architecture Super-node –Index server for its leaf nodes Problems –Index based search has limits Hard for full-text search Impossible for encrypted content search – Not responsible for the content quality of its leaf nodes –The structure becomes large and inefficient. A leaf node has to connect to multiple super-nodes to avoid single point failure Generating an increasingly large number of super-nodes

Gnutella Population in One Day (2003) number of peers number of super peers One super node only connects to 3-4 peers in average!

Outline Our Measurement Study CAC: Constructing Content Abundant Cluster SPIRP: Selectively Prefetching Indices from Responding Peers CAC-SPIRP: Combining CAC and SPIRP Performance Evaluation Conclusion

Our Measurement Study Existing measurement studies –A small percentage of popular files account for most shared storage and transmissions in P2P systems –A small amount of peers contribute majority number of files in P2P. –They are only the indirect evidence of content locality Some files may be never accessed, or accessed rarely Our purpose –Fully understand the localities in the peer community and individual peers –Get first-hand traces for our simulation study

Trace Collection Four-day crawling on the Gnutella network –Open source code of LimeWire Gnutella –Session based collection (for the whole life time of peers) Query sending traces by different peers –25,764 peers –409,129 queries Content indices of different peers –Full indices of 18,255 peers –37% free riders

Top Content Providers (in percentage) Queries Replied by Top Query Responders (%) Results Replied by Top Result Providers (%) Content Locality in the Peer Community A small group of peers can reply nearly all queries and provide most of results Number of Queries Percentage of Peers (%) Percentage of Peers (%) Number of Results

The Localities of Search Interests of Individual Peers A peer can get search results from a small number of its top query responders: they share the same search interests Similar to the idea in Locality of Interest scheme, but our conclusion is based on real P2P systems Top Query Responders Top Result Providers top 1 top 10 top 5% top 10% top 20% Query Contributions (%) Result Contributions (%)

Reorganizing the P2P Management Structure Clustering those small number of content abundant peers Prefetching indices from those top query responders

CAC: Constructing Content Abundant Cluster Objectives –Clustering those small number of content abundant peers in P2P overlay –Providing high quality and fast service Content Abundant Cluster –An overlay on top of P2P network –Self-evaluate, self-identify, and self-organize –Persistent public service for all peers in the system –Strong content-based (not index-based)

ClusteringLeveling CAC: System Structure C A C X Dynamic Update

CAC: Search Operations Queries are sent to CAC first –Up-flowing operation –Flooding in CAC Unsatisfied queries are propagated from CAC to the whole system –Down-flooding operation –Propagated from low levels to high levels

Up-flowing C A C

Down-flooding C A C Unused links

SPIRP: Selectively Prefetching Indices from Responding Peers Basic operations –Peer I initiates a query q Query hits: displays the results Misses: sends q –Peer R responds query q sends query results as well as piggybacks indices of all shared files –Peer I receives response Display the searching results as well as stores piggybacked indices Indices updating –Active updating indices by responding peers –Updating indices demanded by requesting peers Replacement of file indices

Where are these files? Pop music Classic music SPIRP Technique ♫ ♫ R1 R2 Query = “Beethoven mp3” I

SPIRP Technique pop classic NULL R1 R2 Query = “Beetle mp3” Where are these files? I

SPIRP Technique classic pop R1 R2 Query = “Beetle mp3” I

SPIRP Technique classic pop R1 R2 Query = “Beetle mp3” No enough space to save indices I

SPIRP Technique classic pop ♫ ♫ R1 R2 Replace complete I Query = “Beetle mp3”

CAC-SPIRP CAC: application level infrastructure –Significantly reducing bandwidth consumption –Good response time when queries success in CAC –Long response time when queries fail in CAC SPIRP: client-oriented and overlay independent –Significantly reducing response time –Small traffic when queries can be satisfied in cache –Same traffic as flooding when cache misses CAC-SPIRP –Easy to combine the two techniques –Consider the trade-off between the two performance objectives –Has both merits of search quality and search cost

Simulation Environment Content trace and query trace –4 day Gnutella crawling in our measurement Overlay topology –Traces by Clip2 Distributed Search Solutions Session duration –Pareto distribution fitted from measurement results P(x) = * x

Evaluation Metrics Query success rate –CAC: success rate in CAC (normalized to flooding) –SPIRP: success rate in local cache (normalized to flooding) Overall network traffic –accumulated communication traffics for all queries, responses, and index transferring (normalized to flooding) Average response time –use the number of routing hops (normalized to flooding) Evaluate for different query satisfactions –1, 10, 50 results, representing different user demands

Performance Evaluation for CAC Cluster Size (In Percentage of P2P Network Size) 5% top content abundant peers are good enough for cluster construction Overall Traffic (Normalized) Cluster Size (In Percentage of P2P Network Size) Cluster Size (In Percentage of P2P Network Size) Success Rate in CAC (normalized) Avg Response Time (Normalized) Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = Cluster Size (In Percentage of P2P Network Size)

CAC Member Selection Success Response Rate of Content-Abundant Peers Success Rate in CAC (normalized) Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Avg Response Time (Normalized) Overall Traffic (Normalized) Success response rate of CAC Peers Success Response Rate of CAC Peers Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Minimum Results = 1 Minimum Results = 10 Minimum Results = 50 Overall traffic is not sensitive to CAC member quality Traffic can be significantly reduced even for randomly selected CAC members CAC down flooding is very efficient

CAC-SPIRP Overall Performance Peers having 1 to 5 queries satisfied Peers having 10 to 20 queries satisfied Peers having 30 to 40 queries satisfied Peers having at least 50 queries satisfied Peers having 1 to 5 queries satisfied Peers having 10 to 20 queries satisfied Peers having 30 to 40 queries satisfied Peers having at least 50 queries satisfied Query Satisfaction = 1 Query Satisfaction = 10 Query Satisfaction = Size of Incoming Index Set Buffer (in M Bytes) Average Response Time (Normalized) Success Rate in Local Cache Overall Traffic (Normalized) Size of Incoming Index Set Buffer (in M Bytes) Size of Incoming Index Set Buffer (in M Bytes) 0 CAC-SPIRP reduces both the overall traffic and response time significantly

Conclusion CAC-SPIRP fundamentally addresses the P2P search problem by a re-organization. –Exploiting organizational content locality CAC: a content abundant cluster provides high quality and fast services. –Exploiting user content locality SPIRP: a client prefetching technique to speed up search by avoiding unnecessary queries