Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

Slides:

Advertisements

Similar presentations

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

Advertisements

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Jack Lee Yiu-bun, Raymond Leung Wai Tak Department.

Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.

NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.

1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

ZIGZAG A Peer-to-Peer Architecture for Media Streaming By Duc A. Tran, Kien A. Hua and Tai T. Do Appear on “Journal On Selected Areas in Communications,

Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

1 A Comparison of Load Balancing Techniques for Scalable Web Servers Haakon Bryhni, University of Oslo Espen Klovning and Øivind Kure, Telenor Reserch.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.

A Row-Permutated Data Reorganization Algorithm for Growing Server-less VoD Systems Presented by Ho Tsz Kin.

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Presented by: Raymond Leung Wai Tak Supervisor:

RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.

Web Caching Schemes For The Internet – cont. By Jia Wang.

On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.

DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.

Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.

11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.

Server Traffic Management Jeff Chase Duke University, Department of Computer Science CPS 212: Distributed Information Systems.

Computer Science Cataclysm: Policing Extreme Overloads in Internet Applications Bhuvan Urgaonkar and Prashant Shenoy University of Massachusetts.

N-Tier Architecture.

Basic Concepts of Computer Networks

Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,

Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.

Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.

RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.

Module 13: Network Load Balancing Fundamentals. Server Availability and Scalability Overview Windows Network Load Balancing Configuring Windows Network.

Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.

CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.

Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.

Optimal Client-Server Assignment for Internet Distributed Systems.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.

The concept of RAID in Databases By Junaid Ali Siddiqui.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

Tzu-Han Wu Yi-Chi Chiang Han-Yang Ou

Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Web Server Load Balancing/Scheduling

Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng

Web Server Load Balancing/Scheduling

N-Tier Architecture.

Mohammad Malli Chadi Barakat, Walid Dabbous Alcatel meeting

Parallel Programming By J. H. Wang May 2, 2017.

Network Load Balancing

Storage Virtualization

Internet Networking recitation #12

View Change Protocols and Reconfiguration

Database System Architectures

Algorithms for Selecting Mirror Sites for Parallel Download

Presentation transcript:

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement? 3. Simulation and Results --- How it works? 4. Conclusions

Introduction and Motivation Cluster-based network server system The front-end (dispatcher) -- responsible for request distribution The back-end nodes -- responsible for request processing. The front-end makes the distributed nature of the server transparent to the clients. Two major approaches for request distribution: 1) Aiming load balance distribution (for example WRR): All back-end nodes are considered equally capable of serving a given request with considering the current load information of the back-end nodes. Advantage: good load balancing among the back-ends. Disadvantage: Working set > the size of main memory, frequent cache misses. Not scale well to larger working sets Q: Working set ?

Introduction and Motivation 2) Aiming locality distribution (for example LARD): The front-end consider the service/content requested and the current load on the back- end nodes when deciding which back-end node should serve a given request. Advantages: (1) increased performance due to improved hit rates in the back-end's main memory caches (Adding nodes, increasing caches size); (2) increased secondary storage scalability due to the ability to partition the server's database over the different back-end nodes; (3) the ability to employ back-end nodes that are specialized for certain types of requests (e.g., audio and video). Disadvantage: The load between different back-ends might become unbalanced, resulting in worse performance. Objective: Building a LARD cluster is therefore to design a practical and efficient strategy that achieves load balancing and high cache hit rates on the back-ends. TCP handoff protocol that allows the front-end to hand off an established client connection to a back-end node, in a manner that is transparent to clients and is efficient enough not to render the front-end a bottleneck.

Strategies System a. The front-end keep track of open and closed connections, and it can use this information in making load balancing decisions. The outgoing data is sent directly from the back-ends to the clients. b. The front-end limits the number of outstanding requests at the back-ends. This approach allows the front-end more flexibility in responding to changing load on the back-ends. c. Any back-end node is capable of serving any target. d. The front-end use TCP hand off protocol: (1) a client connects to the front-end; (2) the dispatcher at the front-end accepts the connection and hands it off to a back- end using the handoff protocol, The dispatcher is a software module that implements the distribution policy, e.g. LARD. (3) the back-end takes over the established connection received by the hand off protocols; (4) the server at the back-end accepts the created connection (5) the server at the back-end sends replies directly to the client.

Strategies Basic LARD Always assigns a single back-end node to serve a given target, thus making the idealized assumption that a single target cannot by itself exceed the capacity of one node. 1) Algorithms: while (true) fetch next request r; if server[r.target]=null then s, server[r.target]  {least loaded node}; else s  server[r.target]; if (s.load > T high && Exist node with load = 2*T high then s, server[r.target]  {least loaded node}; send r to s;

Strategies.Basic LARD 2) Consideration: (1) Load: number of active connections. T low : the load below which a back-end is likely to have idle resources. T high : the load above which a node is likely to cause substantial delay in serving requests. (2) Load imbalance Not want greatly diverging load values on different back-ends; Not want re-assign targets because of minor or temporary imbalance. (3) The front-end limits the total connections handed to all back-end nodes to the value S = (n-1)*T high +T low -1, where n is the number of back-end nodes. Setting S to this value ensures that at most n-2 nodes can have a load >= T high while no node has load < T low.(how) (4) The load difference between old and new targets is at least T high -T low. The max load imbalance that can arise is 2T high -T low. (why) (5) The setting for T low depends on the speed of the back-end nodes. choosing T high involves a tradeoff. T high -T low should be low enough to limit the delay variance among the back-ends to acceptable levels, but high enough to tolerate limited load imbalance without destroying locality. (how)

Strategies LARD with Replication A single target causes a back-end to go into an overload situation, we should assign several back-end nodes to serve that document, and to distribute requests for that target among the serving nodes. Algorithms: while (true) fetch next request r; if server[r.target]=null then s, serverSet[r.target]  {least loaded node}; else s  {least loaded node in serverSet[r.target]}; m  {most loaded node in serverSet[r.target]}; if (s.load > T high && Exist node with load = 2*T high then p  {least loaded node}; add p to serverSet[r.target]; s  P; if |serverSet[r.target]| > 1 && time() – serverSet[r.target].lastMod > K then remove m from serverSet[r.target]; // the degree of replication send r to s; if serverSet[r.target] changed in this iteration then serverSet[r.target].lastMod  time();

Simulation and Results Simulation model Each back-end node consists of a CPU and locally-attached disk(s). Each node maintains its own main memory cache of configurable size and replacement policy. Caching is performed on a whole-file based. Processing a request requires the following steps: connection establishment --> disk reads (if needed) --> target data transmission --> connection teardown. The input to the simulator is a stream of tokenized target requests, where each token represents a unique target being served.

Simulation and Results Simulation Output Throughput: The number of requests that were served per second by the entire cluster, calculated as the number of requests divided by the simulated time it took to finish serving all the requests. Cache hit ratio: The number of requests that hit in a back-end node's main memory cache divided by the number of requests. Node underutilization: The time that a node's load is less than 40% of T low. The overall throughput is the best summary metric, since it is affected by all factors; The cache hit rate gives an indication of how well locality is being maintained; The node underutilization times indicate how well load balancing is maintained.

Simulation and Results Simulation results strategies thoughput Cache miss ratio Idle time Weight round-robin (WRR) lowest highest lowest Locality-based (LB) lower higher highest Basic LARD (LARD) higher lowest lower LARD with replication (LARD/R) highest lowest lower (why) WRR cannot benefit from added CPU at all, since it is disk bound on this trace. LARD and LARD/R, on the other hand, can make use of the added CPU power, because their cache aggregation makes the system increasingly CPU bound as nodes are added to the system. With LARD/R, additional disks do not achieve any further benefit. This can be expected, as the increased cache, LARD/R causes a reduced dependence on disk speed. WRR, on the other hand, greatly benefits from multiple disks as its throughput is mainly bound by the performance of the disk subsystem. (how)

Conclusions Conclusions: LARD strategy can achieve high cache hit rates and good load balancing in a cluster server. The performance of LARD is better than WRR. To prevent the front-end as bottleneck, TCP handoff protocol is implemented on the front-end and back-end. Q: What is the limits of LARD system? How to improve? The next paper will discuss in details. Let’s go to next ……