CS 525 Advanced Distributed Systems Spring 09 Indranil Gupta Lecture 2 Introduction to Peer to Peer Systems January 22, 2009.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
INF 123 SW ARCH, DIST SYS & INTEROP LECTURE 12 Prof. Crista Lopes.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 9-10: Peer-to-peer Systems All slides © IG.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Gnutella, Freenet and Peer to Peer Networks By Norman Eng Steven Hnatko George Papadopoulos.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Indranil Gupta (Indy) September 21, 2010 Lecture 9 Peer-to-peer Systems I Reading: Gnutella paper on website  2010, I. Gupta Computer Science 425 Distributed.
Security in P2P Networks A study of the gnutella protocol and it’s weaknesses By: Imran Qureshi Date: December 9, 2004.
CS 525 Advanced Distributed Systems Spring 2015 Indranil Gupta (Indy) Lecture 4, 5 Peer to Peer Systems January 29, 2015 All slides © IG 1.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
KaZaA: Behind the Scenes Shreeram Sahasrabudhe Lehigh University
P2P File Sharing Systems
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Lecture 12 Peer-to-Peer systems (Search Capabilities in Distributed Systems) Sections 10.1, 10.2, plus Paper “The Gnutella Protocol Specification v0.4”
Indranil Gupta (Indy) September 26, 2013 Lecture 10 Peer-to-peer Systems I Reading: Gnutella paper on website  2013, I. Gupta Computer Science 425 Distributed.
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)

Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Lecture 13 Peer-to-Peer systems (Part II) CS 425/ECE 428/CSE 424 Distributed Systems (Fall 2009) CS 425/ECE 428/CSE 424 Distributed Systems (Fall 2009)
Introduction of P2P systems
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Sep 15-17, 2015 Lecture 7-8: Peer-to-peer Systems All slides © IG.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Indranil Gupta (Indy) Lecture 4 Peer to Peer Systems January 30, 2014 All Slides © IG CS 525 Advanced Distributed Systems Spring 2014.
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Indranil Gupta (Indy) September 27, 2012 Lecture 10 Peer-to-peer Systems II Reading: Chord paper on website (Sec 1-4, 6-7)  2012, I. Gupta Computer Science.
Peer-to-Peer By Rui Zhang, Chen Teng, Li Dong, Quanshuan He & Yongzheng Zhang.
NETE4631 Network Information Systems (NISs): Peer-to-Peer (P2P) Suronapee, PhD 1.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
A Reputation-Based Approach for Choosing Reliable Resources in Peer-to-Peer Networks E. Damiani S. De Capitani di Vimercati S. Paraboschi P. Samarati F.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
1 Indranil Gupta (Indy) Lecture 4 Peer to Peer Systems January 28, 2010 All Slides © IG CS 525 Advanced Distributed Systems Spring 2010.
1 Indranil Gupta (Indy) Lecture 4 Peer to Peer Systems January 27, 2011 All Slides © IG CS 525 Advanced Distributed Systems Spring 2011.
CS 525 Advanced Distributed Systems Spring 2016 Indranil Gupta (Indy) Lecture 4, 5 Peer to Peer Systems January 28, 2016 All slides © IG 1.
Distributed Systems Lecture 10 P2P systems 1. Previous lecture Leader election – Problem – Algorithms 2.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Lecture 7: Peer-to-peer Systems I
BitTorrent Vs Gnutella.
Lecture 7: Peer-to-peer Systems I
CS 525 Advanced Distributed Systems Spring 2017
Internet Networking recitation #12
CS 525 Advanced Distributed Systems Spring 2018
Unstructured Routing : Gnutella and Freenet
Lecture 7: Peer-to-peer Systems I
Presentation transcript:

CS 525 Advanced Distributed Systems Spring 09 Indranil Gupta Lecture 2 Introduction to Peer to Peer Systems January 22, 2009

Announcement – Classroom Change! From next week onwards, lecture will be in –1131 Siebel Center (Starting Tuesday 27 January) –NOT in 1302 Siebel Center (different class here)

D.S. Theory Peer to peer systems Cloud Computing Sensor Networks

(This week and next) D.S. Theory (Next-next week) Basics (Next-next week) Peer to peer systems Cloud Computing

Why Study Peer to peer systems? To understand how they work To build your own peer to peer system To understand the techniques and principles within them To modify, adapt, reuse these techniques and principles in other related areas –Cloud computing –Sensor networks To grow the body of knowledge about distributed systems

Some Questions Why do people get together? –to share information –to share and exchange resources they have books, class notes, experiences, videos, music cd’s How can computers help people –find information –find resources –exchange and share resources

Existing technologies: The Web! –Search engines –Forums: chat rooms, blogs, ebay –Online business But, the web is heavy weight if you want specific resources: say a Beatles’ song “PennyLane” A google search will give you their bio, lyrics, chords, articles on them, and then perhaps the mp3 But you want only the song, nothing else! If you can find a peer who wouldn’t mind exchanging her Beatles mp3 songs for your UIUC Homeocoming videos, that would be great! –Napster: a solution light weight that was lighter than the Web

A Brief History [6/99] Shawn Fanning (freshman Northeastern U.) releases Napster online music service [12/99] RIAA sues Napster, asking $100K per download [3/00] 25% UWisc traffic Napster, many universities ban it [00] 60M users [2/01] US Federal Appeals Court: users violating copyright laws, Napster is abetting this [9/01] Napster decides to run paid service, pay % to songwriters and music companies [Today] Napster protocol is open, people free to develop opennap clients and servers

Napster Structure S S S P P PP P P Client machines (“Peers”) napster.com Servers Store their own files Store a directory, i.e., filenames with peer pointers Filename Info about PennyLane.mp :1006 …..

Napster Operations Client Connect to a Napster server Upload list of music files that you want to share –Server maintains list of tuples Search –Send server keywords to search with –(Server searches its list with the keywords) –Server returns a list of hosts - tuples - to client –Client pings each host in the list to find transfer rates –Client fetches file from best host All communication uses TCP

Napster Search S S S P P PP P P Peers napster.com Servers Store their own files Store peer pointers for all files 3. Response 1. Query 2. All servers search their lists (ternary tree algo.) 4. ping candidates 5. download from best host

Problems Centralized server a source of congestion Centralized server single point of failure No security: plaintext messages and passwds napster.com declared to be responsible for users’ copyright violation –“Indirect infringement”

Gnutella Eliminate the servers Client machines search and retrieve amongst themselves Clients act as servers too, called servents [3/00] release by AOL, immediately withdrawn, but 88K users by 3/03 Original design underwent several modifications

Gnutella P P P P P P Servents (“Peers”) P Connected in an overlay graph (== each link is an implicit Internet path) Store their own files Also store “peer pointers”

How do I search for my Beatles file? Gnutella routes different messages within the overlay graph Gnutella protocol has 5 main message types –Query (search) –QueryHit (response to query) –Ping (to probe network for other peers) –Pong (reply to ping, contains address of another peer) –Push (used to initiate file transfer) We’ll go into the message structure and protocol now (note: all fields except IP address are in little-endian format)

Descriptor ID Payload descriptor TTL Hops Payload length Descriptor Header Type of payload 0x00 Ping 0x01 Pong 0x40 Push 0x80 Query 0x81 Queryhit Decremented at each hop, Message dropped when ttl=0 ttl_initial usually 10 Incremented at each hop ID of this search transaction Number of bytes of message following this header Payload Gnutella Message Header Format

Minimum Speed Search criteria (keywords) Query (0x80) 0 1 ….. Payload Format in Gnutella Query Message

Gnutella Search P P P P P P P Who has PennyLane.mp3? Query’s flooded out, ttl-restricted, forwarded only once TTL=2

Num. hits port ip_address speed (fileindex,filename,fsize) servent_id n n+16 QueryHit (0x81) : successful result to a query Results Unique identifier of responder; a function of its IP address Info about responder Payload Format in Gnutella Query Reply Message

Gnutella Search P P P P P P P Who has PennyLane.mp3? Successful results QueryHit’s routed on reverse path

Avoiding excessive traffic To avoid duplicate transmissions, each peer maintains a list of recently received messages Query forwarded to all neighbors except peer from which received Each Query (identified by DescriptorID) forwarded only once QueryHit routed back only to peer from which Query received with same DescriptorID Duplicates with same DescriptorID and Payload descriptor (msg type) are dropped QueryHit with DescriptorID for which Query not seen is dropped

After receiving QueryHit messages Requestor chooses “best” QueryHit responder –Initiates HTTP request directly to responder’s ip+port GET /get/ / /HTTP/1.0\r\n Connection: Keep-Alive\r\n Range: bytes=0-\r\n User-Agent: Gnutella\r\n \r\n Responder then replies with file packets following this message: HTTP 200 OK\r\n Server: Gnutella\r\n Content-type:application/binary\r\n Content-length: 1024 \r\n \r\n HTTP is the file transfer protocol. Why? Why the “range” field in the GET request? What if responder is behind firewall that disallows incoming connections?

Dealing with Firewalls P P P P P P P Requestor sends Push to responder asking for file transfer Has PennyLane.mp3 But behind firewall (Why is the Push routed and not sent directly?)

servent_id fileindex ip_address port Push (0x40) same as in received QueryHit Address at which requestor can accept incoming connections

Responder establishes a TCP connection at ip_address, port specified. Sends GIV : / \n\n Requestor then sends GET to responder (as before) and file is transferred What if requestor is behind firewall too? –Gnutella gives up –Can you think of an alternative solution?

Ping-Pong Peers initiate Ping’s periodically Ping’s flooded out like Query’s, Pong’s routed along reverse path like QueryHit’s Pong replies used to update set of neighboring peers to keep neighbor lists fresh in spite of peers joining, leaving and failing Port ip_address Num. files shared Num. KB shared Pong (0x01) Ping (0x00) no payload

Gnutella Summary No servers Peers/servents maintain “neighbors”, this forms an overlay graph Peers store their own files Queries flooded out, ttl restricted QueryHit (replies) reverse path routed Supports file transfer through firewalls Periodic Ping-pong to continuously refresh neighbor lists –List size specified by user at peer : heterogeneity means some peers may have more neighbors –Gnutella found to follow power law distribution: P(#links = L) ~ (k is a constant)

Problems Ping/Pong constituted 50% traffic –Solution: Multiplex, cache and reduce frequency of pings/pongs Repeated searches with same keywords –Solution: Cache Query, QueryHit messages Modem-connected hosts do not have enough bandwidth for passing Gnutella traffic –Solution: use a central server to act as proxy for such peers –Another solution:  FastTrack System (in a few slides)

Problems (contd.) Large number of freeloaders –70% of users in 2000 were freeloaders –Only download files, never upload own files Flooding causes excessive traffic –Is there some way of maintaining meta-information about peers that leads to more intelligent routing?  Structured Peer-to-peer systems e.g., Chord System (next lecture)

FastTrack Hybrid between Gnutella and Napster Takes advantage of “healthier” participants in the system Underlying technology in Kazaa, KazaaLite, Grokster Proprietary protocol, but some details available Like Gnutella, but with some peers designated as supernodes

A FastTrack-like System P P P P Peers S S Supernodes P

FastTrack (contd.) A supernode stores a directory listing ( ), similar to Napster servers Supernode membership changes over time Any peer can become (and stay) a supernode, provided it has earned enough reputation –Kazaalite: participation level (=reputation) of a user between 0 and 1000, initially 10, then affected by length of periods of connectivity and total number of uploads –More sophisticated Reputation schemes invented, especially based on economics (See P2PEcon workshop) A peer searches by contacting a nearby supernode

Wrap-up Notes Applies to all p2p systems How does a peer join the system –Send an http request to well known url –Message routed (after DNS lookup) to a well known server that then initializes new peers’ neighbor table –Server only maintains a partial list of online clients Lookups can be speeded up by having each peer cache: –Queries and their results that it sees –All directory entries (filename,host) mappings that it sees –The files that pass through it

Summary Napster: protocol overview, more details available on webpage Gnutella protocol FastTrack protocol Protocols continually evolving, software for new clients and servers conforming to respective protocols: developer forums at –Napster: –Gnutella: Others –Peer to peer working groups:

DHT=Distributed Hash Table A hash table allows you to insert, lookup and delete objects with keys A distributed hash table allows you to do the same in a distributed setting (objects=files) Napster, Gnutella, FastTrack are all DHTs (sort of) So is Chord, a structured peer to peer system that we study next

As you leave Today’s Lecture… What new design techniques have you learnt during this lecture? Think about at least 3 new techniques you have learnt in this class.

For this Weekend 1/27: Read “Chord” paper from website 1/29: Read the papers on the Grid and Cloud Computing Look for prospective presentation sessions – sign up early and you’ll get your top preferences No reviews required yet

Course Projects Project –Conference-quality paper –Novel idea solving useful problem, backed up with good implementation and experimental evaluation Take a look at conference papers arising out of previous versions of this course (CS598IG/CS525) –Fall 03: 9/12 project papers in conferences and journals –Fall 04, Spring 06, Spring 07, Spring 08: Many under review in conferences and workshops, similar success rates expected Focus this semester: Data-intensive Cloud Computing –Other project areas also allowed

Projects and Testbeds Please form your project groups soon (2 members per group) (tentative) Your group can request Indy for an account on at most one of the following testbeds: –PlanetLab: –Emulab: –Cirrus cloud at UIUC: There are limited accounts on each system. You will be given an account on only one of these testbeds, so please choose carefully depending on your project requirements! –PlanetLab: wide-area networking 2-3 slices available –Emulab: cluster computing, with access to barebones hardware < 5 accounts available –Cirrus: Hadoop and Pig jobs (primarily)

Announcement – Classroom Change! From next week onwards, lecture will be in –1131 Siebel Center (Starting Tuesday 27 January) –NOT in 1302 Siebel Center (different class here)