Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.

Slides:



Advertisements
Similar presentations
Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.
Advertisements

CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
Simulation Evaluation of Hybrid SRPT Policies
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.
1 Connection Scheduling in Web Servers Mor Harchol-Balter School of Computer Science Carnegie Mellon
Job scheduling Queue discipline.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
1Chapter 05, Fall 2008 CPU Scheduling The CPU scheduler (sometimes called the dispatcher or short-term scheduler): Selects a process from the ready queue.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-3 CPU Scheduling Department of Computer Science and Software Engineering.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
CPU S CHEDULING Lecture: Operating System Concepts Lecturer: Pooja Sharma Computer Science Department, Punjabi University, Patiala.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
CE Operating Systems Lecture 7 Threads & Introduction to CPU Scheduling.
Carnegie Mellon University Computer Science Department 1 OPEN VERSUS CLOSED: A CAUTIONARY TALE Bianca Schroeder Adam Wierman Mor Harchol-Balter Computer.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Analysis of SRPT Scheduling: Investigating Unfairness Nikhil Bansal (Joint work with Mor Harchol-Balter)
1 Mor Harchol-Balter Carnegie Mellon with Nikhil Bansal with Bianca Schroeder with Mukesh Agrawal.
Queuing Theory Simulation & Modeling.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
CPU Scheduling G.Anuradha Reference : Galvin. CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California.
OPERATING SYSTEMS CS 3502 Fall 2017
Abhinav Kamra, Vishal Misra CS Department Columbia University
Looking at the Server-side of P2P Systems
Chapter 6: CPU Scheduling
Process Scheduling B.Ramamurthy 11/18/2018.
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Process Scheduling B.Ramamurthy 12/5/2018.
Elders know best Lifespan-based ideas in P2P systems
Friendships that last Peer lifespan and its role in P2P protocols
Admission Control and Request Scheduling in E-Commerce Web Sites
Chapter 6: CPU Scheduling
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Uniprocessor scheduling
Size-Based Scheduling Policies with Inaccurate Scheduling Information
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University

2 What is the Server-side? No architecture distinction between “client” and “server” for a P2P system Heterogeneity of peers –Some peers act more like servers – Server Side –Some act more like clients – Client Side Server-side is important for P2P performance –Little attention has been given

3 Outline Background and Motivation –Why scheduling the server-side? Traces Collection and Study Scheduling Methodology Evaluation Conclusions

4 Background Peers in a P2P data-sharing system –Example - Gnutella –Query, query answer – Phase 1 –download, upload – Phase 2 –Role as a client Send queries, downloading objects –Role as a server Answer queries, uploading objects Little research attention

5 Background (Cont.) “Shark Tale” ? Peer 3 got it! “Taxi” ? No idea! P1P1 Query Query Reply Query Query Reply Phase 1: Queries and query replies in the P2P file-sharing system P2P2 P4P4 P3P3

6 Background (Cont.) P2P2 Phase 2: Download/Upload shared files Little attention given to the server-side so far… Give me “Taxi” Job Queue Give me “Shark Tale” P4P4 P3P3 P1P1

7 Motivation Server-side is a key performance bottleneck of P2P data-sharing system –80% of download requests get rejected due to saturation of server capacity [Saroiu 2002] User-limited capacity, particularly, number of server threads –50% of all objects downloads take more than one day [Gummadi 2003] Our goal –Server load characterization and analysis –New scheduling policies to shorten average response time for each download

8 Challenge Introduction of SRPT into web server scheduling has been very successful, but are more tricky for P2P server side… Requests are often not for whole objects P2P servers are conservative with resource consumption Popular P2P servers often operate under overloaded conditions Fetch-at-most-once behavior makes object popularity NOT Zipf distribution [Gummadi 2003] New scheduling policies based on P2P’s own characteristics are needed

9 Outline Background and Motivation –Why scheduling the server-side? Traces Collection and Study Scheduling Methodology Evaluation Conclusions

10 Trace Collection and Study Trace Collection Methodology –Build “honey pots” Passive monitoring of query strings Download hot contents based on query popularity –Run “honey pots” Make collected objects available to the community Record incoming download requests –Arrival time, object name, requested size, downloaded size, service time, … –Findings reported here based Gnutella traces

11 Traces in the Study Different connection type, server thread number, shared object number, request number Connection Type Number of Threads Number of Objects Number of Requests 100Mbps Ethernet 2001,533300, Mbps Ethernet 1001,533150, Mbps Ethernet ,000 Cable Modem201,53340,000

12 Server Workload Distribution of job interarrival time? Distribution of job size? What is the performance bottleneck? –Why scheduling?

13 Job Interarrivals Job interarrivals can be well modeled by an exponential distribution –Coefficient of determination –Almost straight line in the semi-log plot

14 Job Arrivals are Independent Effectively nil –Jobs arrivals are independent of each other –Significant difference with web server

15 Job Sizes Three different job sizes –Full object size –Requested data chunk size Unique for P2P server A request typically only for a small chunk size –Served data chunk size Unique for P2P server Abort transfer, switch to another one Known only after job is done

16 Job Sizes (Cont.) Three different job sizes –Differs by several orders of magnitude –Approximated by Bounded Pareto distribution Object Size Served Chunk Size Requested Chunk Size

17 Server Resource Utilization Resource utilization are conservative –Only run at background of normal computers –Set upper-bound for Number of server threads Aggregate bandwidth usage for upload –For our busiest honey-pot 1.2% to 20.0% CPU utilization Up to 20MBytes memory usage –Bottleneck resource The set of server threads for uploading

18 Given the total number of concurrent jobs that a server can take, how to schedule incoming jobs so that the mean response time is minimized? Our Scheduling Problem

19 Outline Background and Motivation –Why scheduling the server-side? Traces Collection and Study Scheduling Methodology Evaluation Conclusions

20 Scheduling Policies Shortest Remaining Processing Time (SRPT) –Always choose the process with the shortest remaining processing time to serve First-Come-First-Served (FCFS) –Serve incoming download requests based on arrival order –Used by Gnutella for its job scheduling Processor Sharing (PS) –Each job gets equal amount of service time in turn

21 SRPT Studied since the 1960s [Schrage 1968] Used for various applications –Packet network scheduling [Bux 1983] –Scheduling for web servers [Harchol-Balter 2001] Optimal for mean response time of jobs for a general G/G/1 queuing system Problem –In most cases, service time is unknown until the job is done

22 SRPT for P2P Servers Main Challenge –How to estimate service time for a request is not that clear! File size / Requested Chunk size / Served chunk size? One possible approach –Use request chunk size as the scheduling metric SRPT-CS – Uses requested chunk size Two optimal approaches –Use served chunk size as the scheduling metric SRPT-SS – Uses served chunk size –Ideal SRPT How well can they do?

23 Approximating ideal SRPT Depends on the correlations between Requested Chunk Size, Served Chunk Size and Service time But these correlations are weak Why? –Client can exit anytime during transmission –Client can switch to other servers for a data chunk –Bandwidth bottlenecks exist somewhere else StatsService TimeService Chunk Size Requested Chunk Size Service Time Served Chunk Size Requested Chunk Size

24 Outline Background and Motivation –Why scheduling the server-side? Traces Collection and Study Scheduling Methodology Evaluation Conclusions

25 Evaluation Evaluation Setup –Using a general purpose queuing simulator –Various scheduling policies –Trace driven simulations Queue capacity 500 System load between 0.1 and 10 Time slice of 0.01 seconds for PS scheduling Metric –Mean response time –Rejection rate –Mean slowdown

26 Improved Mean Response Time FCFS PS SRPT-CS SRPT-SS SRPT Ideal SRPT is the best SRPT-CS does much better than FCFS and PS

27 With Lowest Rejection Rate SRPT-based scheduling policies actually reject less jobs than FCFS and PS SRPT-CS & SRPT-SS SRPT FCFS

28 Without Compromising Fairness SRPT-based scheduling policies don’t starve large jobs Mean slowdown for 10% largest jobs

29 Summary Server-side of P2P is critical to overall system performance Not much can be learned from web server scheduling SRPT-based scheduling policies can help –Lowest mean response time –Lowest rejection rate –Without compromising fairness Chunk size is a reasonable estimator for service time –SRPT-CS outperforms FCFS and PS

30 Ongoing Work Large performance gaps between SRPT-CS, SRPT-SS, and SRPT –Only SRPT-CS can be directly implemented –Possible solution – predicting served chunk size and service time using time series analysis Traces representativeness Performance in real implementation Cooperative downloading/uploading? Better estimator

31 For more information Please also see our related work Dong Lu, Huanyuan Sheng, Peter Dinda. "Size-Based Scheduling Policies with Inaccurate Scheduling Information”. In Proc. of MASCOTS, Dong Lu, Peter A. Dinda, Yi Qiao, Huanyuan Sheng and Fabián E. Bustamante. “Applications of SRPT Scheduling with Inaccurate Information”. in Proc. of MASCOTS, 2004.