Content Distribution in Unstructured Peer-to-Peer Networks Daniel Stutzbach Committee Members: Professor Reza Rejaie Professor Ginnie Lo Professor Art.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Intel Research Seattle Sylvia Ratnasamy, Lee Breslau, Scott Shenker, and Nick Lanham.
Network Coding in Peer-to-Peer Networks Presented by Chu Chun Ngai
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Company Confidential 1 © 2005 Nokia V1-Filename.ppt / yyyy-mm-dd / Initials Towards a mobile content delivery network with a P2P architecture Carlos Quiroz.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Network Coding for Large Scale Content Distribution Christos Gkantsidis Georgia Institute of Technology Pablo Rodriguez Microsoft Research IEEE INFOCOM.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
Gossip-based Search Selection in Hybrid Peer-to-Peer Networks M. Zaharia and S. Keshav D.R.Cheriton School of Computer Science University of Waterloo,
CSc 461/561 CSc 461/561 Peer-to-Peer Streaming. CSc 461/561 Summary (1) Service Models (2) P2P challenges (3) Service Discovery (4) P2P Streaming (5)
Characterizing the Two-Tier Gnutella Topology  Gnutella, FastTrack, and eDonkey use two-tier overlay topologies.  Our initial study focuses on Gnutella.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Making Gnutella-like P2P Systems Scalable Presented by: Karthik Lakshminarayanan Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott.
Measuring and Analyzing the Characteristics of Napster and Gnutella Hosts S. Saroiu, P. Gummadi, and S. Gribble Multimedia Systems Journal Volume 8, Issue.
Understanding Mesh-based Peer-to-Peer Streaming Nazanin Magharei Reza Rejaie.
Understanding Churn in Peer-to-Peer Networks Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Internet Measurement Conference.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Searching in Unstructured Networks Joining Theory with P-P2P.
Amir Rasti Daniel Stutzbach Reza Rejaie The ION P2P Project University of Oregon On the Long-term Evolution of the Two-Tier.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Peer-to-Peer Networks University of Jordan. Server/Client Model What?
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Efficient P2P Searches Using Result-Caching From U. of Maryland. Presented by Lintao Liu 2/24/03.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
Temporal-DHT and its Application in P2P-VoD Systems Abhishek Bhattacharya, Zhenyu Yang & Shiyun Zhang.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau Parts of it has been adopted from.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania Decentralized systems February 15, 2016.
Large-Scale Monitoring of DHT Traffic Ghulam Memon – University of Oregon Reza Rejaie – University of Oregon Yang Guo – Corporate Research, Thomson Daniel.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Introduction to BitTorrent
Peer-to-Peer Data Management
A Measurement Study of Peer-to-Peer File Sharing Systems
GIA: Making Gnutella-like P2P Systems Scalable
Presentation transcript:

Content Distribution in Unstructured Peer-to-Peer Networks Daniel Stutzbach Committee Members: Professor Reza Rejaie Professor Ginnie Lo Professor Art Farley

Why Peer-to-Peer? Introduction

Why Study Peer-to-Peer? Peers on the edge band together to share resources. Peer-to-peer can self-scale. Peer-to-peer applications are becoming increasingly popular. File Sharing: Kazaa, Gnutella Bandwidth Sharing: BitTorrent Cycle Sharing: UW found that file-sharing uses 3 times as much bandwidth as the Web. Introduction

Challenges in Peer-to-Peer Peer-to-Peer is more complicated. Discovering and managing resources is more difficult because: The resources are distributed. Peers are not under the control of one authority. Peers are unstable. Peers have heterogeneous resources to provide. It’s harder to measure a deployed system. Introduction

Research on Peer-to-Peer Characterizing deployed systems: Provides insight into how P2P systems behave in the real world. Is needed to develop accurate models for simulation. Design of new techniques to leverage peer- to-peer resources: Overlay construction Search mechanisms Resource allocation Introduction

Topics Covered Peer-to-Peer Search Keyword search for matching filenames Example implementations: Gnutella, Kazaa, eDonkey 2000 Example topologies: ad-hoc mesh, ultrapeers Searching for files to transfer Peer-to-Peer Transfer Spread parts of files quickly to scale and alleviate flash-crowds Example implementations: eDonkey 2000, BitTorrent, Slurpie Example topologies: ad-hoc mesh, a forest of trees Searching for peers with the right blocks or more bandwidth Peer-to-Peer Streaming Spread parts of a stream quickly to scale and alleviate flash-crowds Example implementations: CoopNet, SplitStream, PRO Example topologies: ad-hoc mesh, a forest of trees Searching for peers with the right layers, more bandwidth, or lower delay. Introduction

Relationship of Topics Streaming Similar Functions Search Transfer Components of File-Sharing Introduction

Overview Background Measurement and Characterization Design: Peer-to-Peer Search Peer-to-Peer Transfer Peer-to-Peer Streaming Introduction

Peer-to-Peer Search To the user, all file-sharing programs look pretty much the same: like a search engine. However, they operate in different ways. Background: Peer-to-Peer Search

Gnutella Classic Background: Peer-to-Peer Search

Ultrapeers Leaves Background: Peer-to-Peer Search

Peer-to-Peer Transfer Background: Peer-to-Peer Transfer A source has been located. Now we need to download the file. Split the file into blocks. Download blocks from wherever we can.

Peer-to-Peer Streaming A source has been located. Now we need to view the stream. Split the file into encoded sub-streams. Listen to as many sub-streams as we can. Timing matters. But we don’t need all streams. Background: Peer-to-Peer Streaming

Characterization Measurement Techniques Characterizations Measurement

Measurement Techniques There are five basic approaches to measuring peer-to-peer systems: Interception Participation Crawling Probing Centralized Measurement: Techniques

Interception Pros: It can monitor many users. It can observe transfers. It can observe throughput. Cons: It captures a biased cross-section. It misses quiet peers. Measurement: Techniques

Participation Pros: It can capture a cross-section of overlay traffic. It can compare different open-source implementations. Cons: It assumes the measurement node is “typical”. It tells us nothing about atypical nodes. It’s harder to do with closed-source software. Measurement: Techniques

Crawling Pros: It can captures the topology. It provides a global perspective. It can captures the entire peer population. Cons: The network changes while the crawler runs. It’s hard to verify the accuracy of the crawler. Measurement: Techniques

Probing Pros: It’s easy to do. It can capture many peer characteristics. Cons: Sample population may be biased based on: Degree Availability Files shared Measurement: Techniques

Centralized Pros: It provides global knowledge of some aspects. It’s easy to do for systems with a central component. Cons: Most peer-to-peer systems don’t have a centralized component. Measurement: Techniques

Characterization Churn File Characteristics Peer Characteristics Query Characteristics Topology Implementation Characteristics Measurement: Characterization

Churn CitationSystems Observed Session Time [SGG02]Gnutella, Napster50% <= 60 min. [CLL02]Gnutella, Napster31% <= 10 min. [SW04]Kazaa50% <= 1 min [BSV03]Overnet50% <= 60 min. [GDS+03]Kazaa50% <= 2.4 min Adapted from [RGK04] Measurement: Characterization: Churn

Churn: Open Issues Existing results are not consistent. The implications of churn on the topology are not well-understood. The downtime distribution is unknown. Correlations between uptime, downtime, and future up and downtimes have not been examined. Measurement: Characterization: Churn

File Characteristics: Storage The popularity of files stored follows a Zipf distribution Measurement: Characterization: File Characteristics

Zipf From [FHKM04] Measurement: Characterization: File Characteristics

File Characteristics: Storage The popularity of files stored follows a Zipf distribution The 10% most popular files make 50% of all stored bytes. The most popular files are around 4 MB. However, the 3% of files which are videos make up 21% of stored bytes. Most files are shared by a small fraction of users. 25%-67% of users share no files at all. Measurement: Characterization: File Characteristics

File Characteristics: Clustering 30% of files have a correlation of at least 60% with at least one other file. If two peers have 10 files in common, there’s an 80% they have at least one more file in common. Generating a graph by treating users as nodes and assigning edges where there are more than N files in common, results in a small world. Measurement: Characterization: File Characteristics

File Characteristics: Transfers 90% of files transferred are smaller than 10 MB. Most bytes transferred are part of files larger than 700 MB. The most popular files are roughly equal in popularity, while unpopular files follow Zipf. Measurement: Characterization: File Characteristics

Not-So-Zipf Taken from [GDS+03] Measurement: Characterization: File Characteristics

File Characteristics: Transfers 90% of files transferred are smaller than 10 MB. Most bytes transferred are part of files larger than 700 MB. The most popular files are roughly equal in popularity, while unpopular files follow Zipf. The most popular 5% of transferred files account for 50% of all transfers. That’s around 45,000 songs which can be stored in 175 GB. An inverse cache can result in a savings between 67%-86%. Measurement: Characterization: File Characteristics

File Characteristics: Open Issues The shift in popularity of files over time is not well understood, requiring observations over several months. It would be interesting to see if correlations between files can be used to predict which files a user will want. No studies have characterized the swarming download feature included in many modern file-sharing applications. Measurement: Characterization: File Characteristics

Query Characteristics The most popular queries are of relatively equal popularity, while less popular queries follow Zipf. There is little relationship between sharing many files and responding to many queries. 40% of queries are duplicates. Measurement: Characterization: Query Characteristics

Query Characteristics: Open Issues The relationship between query, transfer, and file popularities are not well-understood. Queries are composed of several search terms, which we know little about. We don’t know how long query results are typically valid for. Measurement: Characterization: Query Characteristics

Topology: Open Issues The most recent published topology data is from mid At the time, Gnutella had around 50,000 peers. Today, it has more than 1 million. The introduction of Ultrapeers has drastically altered the topology. Those crawls took at an hour or more to complete, but the median peer lifetime may be just a few minutes. No topology studies have been done on peer- to-peer networks other than Gnutella. Measurement: Characterization: Topology Characteristics

Characterization Summary Churn File Characteristics Peer Characteristics Query Characteristics Topology Implementation Characteristics Measurement: Characterization

Designing Peer-to-Peer Search The Convention Wisdom: Flooding doesn’t scale. Improvements: Use ultrapeers Walk instead of flood Index replication Interest-based short-cuts Consider distributed hash tables (DHTs) Overlay-to-Internet topology matching Design: Search

Walking Directed walks are globally efficient, but send all the query traffic to certain nodes. Random walks are efficient over an ultrapeer network, but are slow for less popular results. K-random walks are efficient and fast. Issues: How far does it scale? There are subtle issues that make it hard to implement. Design: Search: Walking

Index Replication It’s the other side of the coin: bringing the indexing information closer to the queries. Ultrapeers are a special case of index replication. Most file-sharing systems use some type of proportional indexing. Napster and DHT-based systems use uniform indexing. However, the optimal index replication system is square-root indexing. [CS02] Open Issues: Highly distributed, unstructured indexing schemes may not work well under heavy churn. Additional indexing doesn’t help for queries that have no matches. Design: Search: Index Replication

Distributed Hash Tables Using a DHT to index all the files in a file-sharing network is daunting. Let’s just index the unpopular files, using a heuristic to decide which files are unpopular. Issues: How good are the heuristics in practice? DHTs are difficult to incrementally deploy. Design: Search: DHT

Designing Peer-to-Peer Search Distributed search has improved dramatically from the early pure- flooding Gnutella. We don’t have a good “network health” meter to tell us how well a file-sharing network is performing. Design: Search

Designing Peer-to-Peer Transfer No papers on incremental improvements. Systems: BitTorrent Slurpie Rateless Codes Design: Transfer

BitTorrent Unique features: Connect, and the data will come. Reward those uploading the most. Modeling studies show… BitTorrent handles an initial flash-crowds well. However, it does not do as well if a second a second flash-crowd. Issues: We don’t know enough about where BitTorrent’s bottlenecks are, when it works well, and when it doesn’t. Design: Transfer

Slurpie Unique features: Estimates the total peer network size Constant load on the root server Chooses a block to download first, then looks for peers Open Issues: Just a prototype Limited head-to-head experiments against BitTorrent Design: Transfer

Rateless Codes Unique features: Other systems need to worry about finding the right block. Using encoding, we can make nearly any block the right block. Modern rateless codes can generate “practically infinite” codes in O(1) time per block. Issues: It’s unknown if finding the right block is a significant problem in existing systems. Design: Transfer

Peer-to-Peer File Transfers: Open Issues We don’t have a solid understanding of where the bottlenecks are in existing systems. We don’t have good metrics for determining where they are. We don’t have good models for comparing new systems against old ones. Design: Transfer

Peer-to-Peer Streaming User characteristics are well-understood based on measurements of client-server streaming. Multiple Descriptor Coding: the breakthrough CoopNet uses multiple trees, organized by a central server. SplitStream uses multiple trees, organized using a DHT. PRO proposes a decentralized, gossip scheme to build a loose mesh. Open Issues: CoopNet and SplitStream are delay-optimized, not bandwidth-optimized. PRO is still in development. No significant deployment. Design: Streaming

Conclusion While much has been done in the area of peer-to-peer content distribution, there are still many open avenues. I presently have papers under submission regarding: Developing metrics to measure the accuracy of a new, efficient topology crawler Characterizing the modern Gnutella topoloy Characterizing churn in peer-to-peer networks Demonstrating the effectiveness of peer-to-peer file transfers to handle flash crowds Conclusion