1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti.

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
1 Caching in HTTP Representation and Management of Data on the Internet.
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
The Internet Useful Definitions and Concepts About the Internet.
EEC-484/584 Computer Networks Discussion Session for HTTP and DNS Wenbing Zhao
1 A Comparison of Load Balancing Techniques for Scalable Web Servers Haakon Bryhni, University of Oslo Espen Klovning and Øivind Kure, Telenor Reserch.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?
Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
SERVER LOAD BALANCING Presented By : Priya Palanivelu.
1 Web Proxies Dr. Rocky K. C. Chang 6 November 2005.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Web Caching Schemes For The Internet – cont. By Jia Wang.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Web Proxy Server Anagh Pathak Jesus Cervantes Henry Tjhen Luis Luna.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
P2P File Sharing Systems
Chapter 16 – DNS. DNS Domain Name Service This service allows client machines to resolve computer names (domain names) to IP addresses DNS works at the.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
1 Web Server Concepts Dr. Awad Khalil Computer Science Department AUC.
Design and Implement an Efficient Web Application Server Presented by Tai-Lin Han Date: 11/28/2000.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
Working with domains and Active Directory
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
 2001 Prentice Hall, Inc. All rights reserved. 1 Chapter 21 - Web Servers (IIS, PWS and Apache) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Dynamic Content On Edge Cache Server (using Microsoft.NET) Name: Aparna Yeddula CS – 522 Semester Project Project URL: cs.uccs.edu/~ayeddula/project.html.
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
Web Pages with Features. Features on Web Pages Interactive Pages –Shows current date, get server’s IP, interactive quizzes Processing Forms –Serach a.
DNS DNS overview DNS operation DNS zones. DNS Overview Name to IP address lookup service based on Domain Names Some DNS servers hold name and address.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Setup and Management for the CacheRaQ. Confidential, Page 2 Cache Installation Outline – Setup & Wizard – Cache Configurations –ICP.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Improving the WWW: Caching or Multicast? Pablo RodriguezErnst W. BiersackKeith W. Ross Institut EURECOM 2229, route des Cretes. BP , Sophia Antipolis.
Performance Evaluation of Redirection Schemes in Content Distribution Networks Jussi Kangasharju, Keith W. Ross Institut Eurecom Jim W. Roberts France.
Memory Management.
Web Development Web Servers.
The Impact of Replacement Granularity on Video Caching
Web Caching? Web Caching:.
Processes The most important processes used in Web-based systems and their internal organization.
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Virtual LANs.
Internet Networking recitation #12
Net 323 D: Networks Protocols
CSE 461 HTTP and the Web.
Algorithms for Selecting Mirror Sites for Parallel Download
Presentation transcript:

1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti

2 Contents Introduction Issues to be addressed Caching on the Web Issues with Web Caching Design Performance study of Web Caching Deficiencies of Web Caching Caching goes Replication Implementation of Replication Web location and information service (WLIS) Performance study of CgR Conclusion

3 Introduction Enormous Success of the Internet has increased the number of people getting added to the WWW. User’s of WWW have increased exponentially. Increased use of WWW has created problems within the Internet Increased latency over the web due to insufficient bandwidth Network gets congested due to increased traffic Servers get overloaded with request due to increased traffic Solutions to these problems Introduce Caching on the Web Using Caching Proxies Introduce Replication Increase the availability of data by creating mirrored sites Combine the benefits provided by both Caching and Replication Use of active caching scheme such as CgR

4 Issues to be addressed to enhance web’s Infrastructure To preserve the usability of the WWW, the following issues need to be addressed at the server level Document retrieval latency times must be decreased Document availability must be increased, perhaps by distributing documents among several servers The amount of data transferred must be reduced Network access must be redistributed to avoid peak hours.

5 Caching on the web is implemented using Caching proxies. A proxy acts as a mediator between the user’s machine and the outside world. From the user’s point of view, the proxy acts as a Web server: each request is sent to and answered by the proxy. From the server’s point of view, the proxy acts like a client: it forwards requests to the originating server. Data stored is shared among several users, so there is an increased probability of data being accessed more than once. Caching on the web

6 Current Issues in Web caching design Design of the replacement policy on the Web becomes complex If a cache is full when it receives a request to store a large document, then would it be more sensible to replace a single large document than several smaller ones. Needs to consider the best strategy to maximize caching benefits. Determination of the pattern of document access and the time required to reload data is more complicated on the Web Loading time of course depends on the origin of the document Data transferred over international links typically takes longer to retrieve than information from servers in the same country Caching strategies face with the problem of document staleness on the Web When a cached document changes on the originating server, caching proxies are not aware of the changes. Further requests satisfied from the cache will deliver out-of-date information Causes the problem of cache coherency

7 Current Issues in Web caching design (contd..) Web caching proxies can employ time-to-live,TTL to estimate document staleness TTL Implementation procedure A date of last modification is included in every reply from a Web server A TTL timing window based on that date is associated with each document put in the cache On document requested, the proxy checks the timing window. A request occurring within the TTL time frame is served directly from the cache, the assumption being that the document is still current A request occurring after the TTL has expired causes a conditional reload to be performed Originating server will answer either with the new document or a special reply indicating unchanged data

8 Performance Study of Web Caching Locality of reference and the web A very small subset of pages were frequently accessed, while most documents were accessed relatively seldom. On averaging cache hit rates it was found that 14 % of all cached pages were responsible for 42 % of the data served directly from the cache, which occupied only 7% of the overall disk space. Study results indicates that locality of reference exists with respect to web

9 Performance Study of Web Caching (contd..) Performance gains observed in terms of Cache hit rates, Byte hit rates and Transferred data. Theoretically achievable hit rates calculated are to be : Cache hit rate : 56.5% Byte hit rate : 40.6 % Transferred data :3,650,950,731 bytes Practical results obtained were : Cache hit rate : 21.3 % Byte hit rate : 16.6 % Transferred data :4,992,987,253 bytes Study results indicate that only 40 % of the theoretically possible hit rate is achieved

10 Deficiencies of Web Caching Caching alone cannot provide complete solution towards improvement of Web Infrastructure Document retrieval latency times Caching provides only partial solution to this problem Performance study indicate 69.9 % of the pages were retrieved only once Document availability Not possible to check if the document requested is current, in case when originating server is down Reduction in data transfer Caching cannot reduce the amount of data transferred for the documents retrieved only once Redistribution of Network access Caching cannot solve this problem Document loading and staleness checks are made in the critical path at the time of request

11 Caching goes Replication The concept of CgR is to combine caching and replication to achieve the goal mentioned earlier. The basic idea is that of an active caching scheme in which servers can decide which documents should be held where. Its simply transforming the caching servers into replicate servers.

12. Describing the current Web with CgR, caches now become active replicates for certain URL namespaces of some of those servers whose data previously only cached.. Primary server will have to know only the set of its direct replicates. Level 1 replicate servers can in turn have replicates(RS4 and RS5) for which they act as a primary server.. Conversion of caches to Replicated Servers (RS) and conversion of normal WWW servers to Primary Servers (PS), is the central concept of CgR.. The selection of which caching servers to convert to replicate servers can be done manually or automatically based on appropriate heuristics.

13 How replication is implemented Servers initiate propagation by sending a normal HTTP GET request bearing a specific notification to their replica sites. CgR-enhanced replicate servers will interpret this notification as a command to request the data to be replicated. Basic issue is how the client will be able to select replicate servers. This can be done by the use of a CgR-enhanced client-side proxy (CP) CP permits all the action to be performed transparently without modifying the clients or their interface to the Web. Users only has to choose this proxy as their gateway to the internet.

14 How replication is implemented cont. Now CP can directly send HTTP requests to replicate as well as conventional servers. CP may switch RS anytime for load balancing. CP offers only a basic means by which clients can address a group of servers. What still needed is a way to propagate information about the replicate servers that exist and what data they hold.

15 Web location and information service (WLIS) WLIS keeps track of which URL namespaces are replicated and which servers belong to logical groups of primary server and replicate servers. A natural place to implement the WLIS service is the client-side proxy, but can also be included in the primary server or offered by separate WLIS servers. Now the question arises about the creation of WLIS information, since it can’t be entered manually into the system that is meant to be highly scalable.

16. This fig. shows how the distributed WLIS database is set up.. Assuming that initially no WLIS information is available, the CP will forward requests directly to the PS (Step 1).. PS knows about its first-level replicate servers (RS1-RS3) and will include this information in its reply.. The client receives this initial WLIS information and stores it for later use. The next request can be redirected towards one of the RS indicated in the previous WLIS response.. If the decision is made to query the RS1, it will in turn reply with the requested document and a list of its own replicates (RS4 and RS5).

17 Performance study of CgR Cache Hit Rate: Cache hit rate for CgR-enhanced RS with different quotas of its cache assigned to hold replicated data. For small cache size, reserving a cache space for replicates decreased overall performance

18 Transmission times for document requests: Small cache sizes again relates to the lower cache hit rates. When the size are increased to 500 and 800, transmission time are reduced by about 1.5 percent.

19 Conclusion Caching and replication have proved beneficial in many areas of computing but advantages of the combination of these two approaches are manifold. They not only help to reduce the latency but also remedy the extreme variations of network bandwidth. Additionally they provide a more fault-tolerant and evenly balanced system.

20 Thank You