Presentation on theme: "Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet Allen Miu, Eugene Shih 6.892 Class Project December 3,"— Presentation transcript:
Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet Allen Miu, Eugene Shih 6.892 Class Project December 3, 1999
Overview Problem Statement Advantages/Disadvantages Operation of Paraloading Goals of Experiment Setup of Experiment Current Results Summary Questions
Problem Statement: Is “Paraloading” Good? Paraloading is the downloading from multiple mirror sites in parallel. Paraloader Mirror B Mirror C Mirror A
Advantages of Paraloading Performance is proportional to the realized aggregate bandwidth of the parallel connections Less prone to complete download failures compared to the single connection download Facilitates dynamic load balancing among parallel connections Facilitates reliable, out-of-order delivery (similar to Netscape)
Disadvantages of Paraloading Can be overly aggressive Consumes more server resources Overhead costs for scheduling, maintaining buffers, and sending block request messages Only effective when mirror servers are available
Step 1: Obtain Mirror List Paraloader Mirror List Mirror A Mirror B Mirror C Hard-coded DNS?
Step 2: Obtain File Length Paraloader Mirror A Mirror B Mirror C
Step 3: Send Block Requests Paraloader Mirror A Mirror B Mirror C
Step 4: Re-order Paraloader Mirror A Mirror B Mirror C
Step 5: Send Next Request Paraloader Mirror A Mirror B Mirror C
Goals of Experiment Main goal: To compare the performance of serial and parallel downloading To verify the results of Rodriguez et al. To examine whether varying the degree of parallelism, the number of mirror servers used, affects performance To gain experience with paraloading and to find out what issues are involved in designing efficient paraloading systems
Experiment Setup Implemented a paraloader application in Java, using HTTP1.1 (range-requests and persistent connections) Files are downloaded at MIT from 3 different sets (kernel, mars, tucows) of 7 mirror servers Degree of parallelism examined: M = 1, 3, 5, 7 Downloaded a 1MB and a 300KB file (S = 1MB, 300KB) in 1 hour intervals for 7 days Block Size = 32KB
Results Paraloading decreases download time over the average single connection case Speedup is far from optimal case (aggregate bandwidth) –Block request gaps result in wasted bandwidth Gaps are proportional to RTT –Congestion at client? Possible but unlikely.
Summary of Contributions Implemented a paraloader Verified that paraloading indeed provides performance gain… sometimes –Increasing degree of parallelism improves overall performance Performance gains are not as good as those reported by Rodriguez et al.
Future Work Examine how block size affects performance gain Examine cost of paraloading Implement and test various optimization techniques Perform measurements at different client sites
Paraloading Will Not Be Effective In All Situations Clients should have enough “slack” bandwidth capacity to open more than one connection Parallel connections are bottleneck disjoint Target data on mirror servers is consistent and static Security and authentication services are installed where appropriate Data transport is reliable Mirror locations are quickly and easily obtained
Step-by-step Process of the Block Scheduling Paraloading Scheme 1. Obtain a list of mirror sites 2. Open a connection to a mirror server and obtain file length 3. Divide file length into blocks 4. Send a block request to each open connection 5. Wait for a response 6. Send a new block request to the first connection that finished downloading a block 7. Loop back to 5 until all blocks are retrieved
Paraloading is Not a Well-studied Concept Byers et al. proposed using Tornado codes to facilitate paraloading. Rodriguez et al. proposed the block scheduling paraloading scheme that is used in our project