May, A Portal-based P2P System for the Distribution and Management of Large Data Sets Rahim Lakhoo (Raz) and Prof Mark Baker ACET,

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

The BitTorrent Protocol. What is BitTorrent?  Efficient content distribution system using file swarming. Does not perform all the functions of a typical.
Incentives Build Robustness in BitTorrent Bram Cohen.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
A Scalable Virtual Registry Service for jGMA Matthew Grove CCGRID WIP May 2005.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
Presented by Stephen Kozy. Presentation Outline Definition and explanation Comparison and Examples Advantages and Disadvantages Illegal and Legal uses.
Part 1: Overview of Web Systems Part 2: Peer-to-Peer Systems Internet Computing Workshop Tom Chothia.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
© Lethbridge/Laganière 2001 Chap. 3: Basing Development on Reusable Technology 1 Let’s get started. Let’s start by selecting an architecture from among.
High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on.
Wide-area cooperative storage with CFS
“Multi-Agent Systems for Distributed Data Fusion in Peer-to-Peer Environment” Smirnova Vira ”Cheese Factory”/
Cambodia-India Entrepreneurship Development Centre - : :.... :-:-
Introduction to client/server architecture
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Client/Server Architectures
SOA, BPM, BPEL, jBPM.
Privacy in P2P based Data Sharing Muhammad Nazmus Sakib CSCE 824 April 17, 2013.
BitTorrent Presentation by: NANO Surmi Chatterjee Nagakalyani Padakanti Sajitha Iqbal Reetu Sinha Fatemeh Marashi.
Peer to Peer Network Anas Hardan. What is a Network? What is a Network? A network is a group of computers and other devices (such as printers) that are.
BitTorrent Internet Technologies and Applications.
BitTorrent How it applies to networking. What is BitTorrent P2P file sharing protocol Allows users to distribute large amounts of data without placing.
Forensics Investigation of Peer-to- Peer File Sharing Networks Authors: Marc Liberatore, Robert Erdely, Thomas Kerle, Brian Neil Levine & Clay Shields.
1 BitTorrent System Efrat Oune Bar-Ilan What is BitTorrent? BitTorrent is a peer-to-peer file distribution system (built for intensive daily use.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
DISTRIBUTED COMPUTING
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
A P2P file distribution system ——BitTorrent Pegasus Team CMPE 208.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
Bit Torrent A good or a bad?. Common methods of transferring files in the internet: Client-Server Model Peer-to-Peer Network.
David A. Bryan, PPSP Workshop, Beijing, China, June 17th and 18th 2010 PPSP Protocol Considerations.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Crystal-25 April The Rising Power of the Web Browser: Douglas du Boulay, Clinton Chee, Romain Quilici, Peter Turner, Mathew Wyatt. Part of a.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Information Grid Services in the Polish Optical Internet PIONIER Cezary Mazurek, Maciej Stroiński, Jan Węglarz.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Experience Sharing in Mobile Peer Communities EPI Planete, INRIA International Consortium Meeting (Oulou) 10 June, 2009.
interactive logbook Paul Kiddie, Mike Sharples et al. The Development of an Application to Enhance.
A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.
GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content Ioannis Konstantinou.
A Scalable Virtual Registry Service for jGMA Matthew Grove DSG Seminar 3 rd May 2005.
Tycho: A General Purpose Virtual Registry and Asynchronous Messaging System Matthew Grove ACET Invited Talk February 2006.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
Revision Unit 1 – The Online World Online Services Online Documents Online Communication Cloud Computing The Internet Internet Infrastructure Internet.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
An Architecture for Internet Data Transfer Niraj Tolia, Michael Kaminsky, David G. Andersen, and Swapnil Patil NSDI ’ Eunsang Cho.
4WARD Networking of Information 4WARD WP6. © 4WARD Consortium Confidential Networking of Information Telephony Interconnecting wires 4WARD Future Internet.
Outline Introduction and motivation, The architecture of Tycho,
An example of peer-to-peer application
Self Healing and Dynamic Construction Framework:
CHAPTER 3 Architectures for Distributed Systems
Software Testing and Maintenance Designing for Change
Replication Middleware for Cloud Based Storage Service
Outline Midterm results summary Distributed file systems – continued
Introduction to Operating Systems
Introduction to Operating Systems
Outline Chapter 2 (cont) OS Design OS structure
Presentation transcript:

May, A Portal-based P2P System for the Distribution and Management of Large Data Sets Rahim Lakhoo (Raz) and Prof Mark Baker ACET, University of Reading Web:

May, Outline Motivation. A Portal-based P2P System: –High-level View, –Overview, –Components. P2P Simulators: –Our requirements, –Simulators investigated, –Issues, –Experiences. Summary. Conclusions.

May, Motivation Sloan Digital Sky Survey (SDSS) - uses a telescope to take optical images of the sky. Scientific projects such as SDDS are producing and working with very large data sets. Current methods for distributing the content involve: –Physically shipping disk drives, –Splitting and the point-to-point transfer from one location to another. Data sets are growing for projects like SDSS. –Currently, 5 Tbytes, –Set to be ~15 Tbytes by the end of the project. Storage and bandwidth is costly and limited, and the data sets will inevitably get larger. Managing and maintaining these large data sets is difficult, will will only become harder over time.

May, Motivation P2P is being used by normal people to download multimedia. A popular example is BitTorrent. It’s success surrounds its protocol, which makes users share their bandwidth with other people trying to download the same file. BitTorrent Concepts: –Files are split into small pieces called ‘chunks’, –Chunks are seeded (uploaded) by a user, –Users download a ‘torrent’ file which has information about a file. –A user loads the ‘torrent’ into an application which then downloads chunks from different peers, –A ‘tracker’ tracks which peers have what chunks. Peer-to-Peer (P2P) systems offer a potential way to manage and distribute data sets.

May, High-level View Data sets such as SDSS are currently kept in a storage mechanism, such as a RAID array. A bootstrapping service is set up and has access to the SDSS data. The data is split into chunks and distributed to the Portal P2P services, hosted by different portals. Users who access the portal can contribute resources to help store and distribute the data. These are the Mini Peers. The Portal P2P services propagate the Mini Peers with parts of the data set. Any other project partners who want a copy of the data can join the P2P network and download parts of the data set from Portal and Mini Peers.

May, Overview Ideas are loosely based around the concepts of BitTorrent and Freenet. The P2P System consists of: –A distributed registry, for storing information for the network peers and also provides a tracker, –A Bootstrapping Service, which splits the data set into chunks to be distributed by the peers, –A Portal P2P Service, which provides storage and management of the data: This service also propagates chunks to the Mini Peers. –Mini Peers, donate bandwidth and disk space to the network.

May, Overview

May, Overview The registry (VR) provides the distributed tracker: –A tracker helps peers locate other peers with chunks to download. The Bootstrapper initiates the propagation of the data set to the peers. The Portal P2P service manages the Mini Peers. The portal has management and monitoring tools for the data set. All peers volunteer resources to the P2P network.

May, The Virtual Registry The Virtual Registry (VR) is provided by Tycho. Tycho is a wide-area asynchronous message passing system with a integrated distributed registry. The VR can store information which can be searched and retrieved by peers on the network. Tycho uses HTTP/HTTPS,Sockets/SSL for communications. The VR will provide the distributed P2P tracker service, for finding peers with chunks to download.

May, The Virtual Registry

May, The Virtual Registry Tycho has a Service Oriented Architecture that uses the concept of producers and consumers. In our system, each Tycho mediator has a consumer and producer, for communications. Mediators provide the VR with a distributed data store, which uses HSQLDB as its database. Local communications are via Sockets/SSL and wide-area communications via HTTP/HTTPS.

May, The Bootstrapper A bootstrapping service is needed to propagate the Portal P2P service with parts of the data set. This service splits the data set into chunks. Each chunk has an associated hash value, which is stored in the Virtual Registry. The bootstrapping service needs access to the original data set(s).

May, The Bootstrapper

May, The Bootstrapper The bootstrapping service needs to propagate different chunks to different Portals concurrently. Hash values and metadata about the data set and chunks is stored in the VR. This service is also used if a requested chunk that is not found on the P2P network, due to chunk corruption. In this case, the missing chunk needs to be replaced in the P2P system.

May, The Portal P2P Service The Portal P2P service is a plug-in component for portals. This service stores and serves chunks of the data set to other peers in the network. The portal service propagates chunks to the Mini peers. The monitoring and management of the data set is handled by the portlet tools and the P2P service. The portal service uses Tycho to synchronise management tools across all portals in the network.

May, The Portal P2P Service

May, The Portal P2P Service Each Portal P2P service needs access to a storage mechanism, for parts of the data set. The storage resources provided by the portals provides space for a copy of the large data set. The Portal P2P service also provides parts of the data set to other peers in the P2P network. The Portal provides users with an environment for managing and monitoring the data set collaboratively between peers.

May, The Mini Peers Mini peers donate bandwidth and storage space to the network. Mini peers will interact with the P2P network via their Web browser. Mini peers will store chunks that are useful for other peers. Mini peers aim to help other peers download and distribute the data set.

May, The Mini Peers

May, The Mini Peers Client-side Web browser technologies such as Ajax and JavaScript, will be used for the Mini Peer. They will utilise the VR to publish parts of the data set, to share with other peers in the network. Mini Peers will store chunks locally on a users machine.

May, P2P Simulators - Requirements We wanted to use a simulator to help test and develop our P2P system with greater assurance. Running the P2P system in a simulator would allow us to configure scenarios for studying system behaviour. Our requirements for a simulator were: –Have support for customised P2P protocols, –Provide facilities for hierarchical topologies, –Provide visualisations, –Provide reasonably accurate results in terms of ‘real-world’ performance, –Have good support and documentation, –Be capable of interfacing with the Java.

May, P2P Simulators There are many network simulators, some are more suited to P2P then others. Simulators investigated include: –NS-2 with NAM, –PeerSim, –PlanetSim, –OMNet++ and OverSim, –General Purpose Simulator (GPS), –AgentJ, –P2PSim.

May, Issues We short listed three simulators: –General Purpose Simulator (GPS), –AgentJ, –OverSim. GPS –Difficult to implement our own protocol as the simulator is tightly coupled to the BitTorrent protocol, –Stability issues were seen with larger simulations. AgentJ –Requires a normal Java application, –Does not support TCP in the simulation environment. OverSim –Java support is limited and restricting. It is not possible to implement a whole simulation with the provided Java support.

May, Experiences No simulator completely fulfilled our requirements. We could not successfully implement our Portal-based P2P system in these simulators. Some of the simulators are complex and take extensive time to learn. Stability issues were seen with some of the simulators. Code written for a simulation is specific to a particular simulator. The code cannot be reused in the later stages of development. The time taken to implement our P2P system in a simulator, does not merit many advantages.

May, Summary We are developing a Portal-based P2P system to help the scientific community to manage, store and distribute large data sets. Our Portal-based P2P system introduces the concept of data sets being collaboratively downloaded and managed. The Portal-based P2P system has four main components: –Virtual Registry, –Bootstrapping service, –Portal P2P service, –Mini peers. We attempted to simulate our design and idea with one of the P2P simulators. We have investigated and tested several P2P simulators for their suitability to emulate our design. We found that the simulators we studied we inflexible, unstable, and not easy to use - basically we would have spent more time fixing them, than actually physically implementing and testing our design on a cluster.

May, Conclusions Distributing and managing large data sets is difficult for projects such as SDSS. P2P simulators are not as useful as first thought. We will implement our Portal-based P2P system and test it on a suitable test bed, i.e. a cluster. Once the development of our P2P system has reached a suitable stage, we may consider systems such as PlanetLab. –PlanetLab provides time on a real network with 100’s of nodes, hosted by academic institutes. P2P systems are known to be an efficient way to distribute files and are becoming increasingly popular. Implementation should be at a suitable stage for preliminary testing in a few months.

May, References Tycho - tycho tycho Further Information - vre/docs.php vre/docs.php

May, Thank you for listening Questions?