1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.

Slides:



Advertisements
Similar presentations
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC.
Advertisements

Scalable Content-Addressable Network Lintao Liu
Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
CECS 474 Computer Network Interoperability Notes for Douglas E. Comer, Computer Networks and Internets (5 th Edition) Tracy Bradley Maples, Ph.D. Computer.
Cooperative Caching of Dynamic Content on a Distributed Web Server Vegard Holmedahl, Ben Smith, Tao Yang Speaker: SeungLak Choi, DB Lab., CS Dept.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
1 6/14/ :27 CS575Internetworking & Routers1 Rivier College CS575: Advanced LANs Chapter 13: Internetworking & Routers.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
CS335 Networking & Network Administration Tuesday, May 11, 2010.
Chapter 3.2 : Virtual Memory
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
CS335 Networking & Network Administration Tuesday, April 20, 2010.
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Web Caching Schemes For The Internet – cont. By Jia Wang.
ICTTA'04Arwa zabian1 On The Latency of BFS Interval Cooperation Web Caching Arwa Zabian Maurizio Bonuccelli Department of Computer Science University of.
Chapter 9 Classification And Forwarding. Outline.
Gursharan Singh Tatla Transport Layer 16-May
TCP/IP Protocol Suite 1 Chapter 14 Upon completion you will be able to: Unicast Routing Protocols: RIP, OSPF, and BGP Distinguish between intra and interdomain.
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
Chapter 17 Domain Name System
Internet Concept and Terminology. The Internet The Internet is the largest computer system in the world. The Internet is often called the Net, the Information.
Web HTTP Hypertext Transfer Protocol. Web Terminology ◘Message: The basic unit of HTTP communication, consisting of structured sequence of octets matching.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 Internet Protocol. 2 Connectionless Network Layers Destination, source, hop count Maybe other stuff –fragmentation –options (e.g., source routing) –error.
UNIT IP Datagram Fragmentation Figure 20.7 IP datagram.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
SYSTEM ADMINISTRATION Chapter 8 Internet Protocol (IP) Addressing.
Web Performance 성민영 SNU Computer Systems lab.. 2 차례 4 Modeling the Performance of HTTP Over Several Transport Protocols. 4 Summary Cache : A Scaleable.
Dr. Yingwu Zhu Summary Cache : A Scalable Wide- Area Web Cache Sharing Protocol.
Layer 3: Internet Protocol.  Content IP Address within the IP Header. IP Address Classes. Subnetting and Creating a Subnet. Network Layer and Path Determination.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
ICP and the Squid Web Cache Duanc Wessels k Claffy August 13, 1997 元智大學系統實驗室 宮春富 2000/01/26.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
© 2009 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets, 5e By Douglas E. Comer Lecture PowerPoints.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
ICP and the Squid Web Cache Duane Wessels and K. Claffy 산업공학과 조희권.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
PGP & IP Security  Pretty Good Privacy – PGP Pretty Good Privacy  IP Security. IP Security.
Web Server.
Hint-based Acceleration of Web Proxy Cache Daniela Rosu Arun Iyengar Daniel Dias IBM T.J.Watson Research Center Unversity of Yuan Ze,Syslab Mike Tien
Cache Digest Alex Rousskov Duane Wessels National Laboratory for Applied Network Research April 17, 1998 元智大學 資訊工程研究所 系統實驗室 陳桂慧 February 9, 1999.
Sem1 - Module 10 Routing Fundamentals and Subnets
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Internet Cache Protocol Erez Tal Assaf Oren Avner Cohen Submission Date: 5/2/01 Guides: Ran Wolff and Itai Dabran.
Chapter 5 Record Storage and Primary File Organizations
Data Communications and Networks Chapter 6 – IP, UDP and TCP ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
Submitted to: Submitted by: Mrs. Kavita Taneja Jasleen kaur (lect.) Hitaishi verma MMICT & BM MCA 4 th sem.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Internet Networking recitation #12
Net 323: NETWORK Protocols
ECE 544 Protocol Design Project 2016
Edge computing (1) Content Distribution Networks
Network Core and QoS.
Ch 17 - Binding Protocol Addresses
Hash Functions for Network Applications (II)
Lecture 1: Bloom Filters
Network Core and QoS.
Presentation transcript:

1 The Mystery of Cooperative Web Caching

2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces the delay in retrieving a document from the Internet by decreasing the number of b b request directed to it.

3 Cooperative Web Caching : it consists on a set of web caching located in a different places in the Internet and cooperate to each other to improve the performance of the system

4 The main entities in a cooperative web caching are: Proxy Router group proxies and router b b Entities requirements: proxy: the proxy must acts as proxy cache Router must implement interior gateway – exterior gateway Group proxies- router the main requirement is the inter-cache communication. b b The mystery of cooperative web caching is : the inter- cache communication technique

5 The inter – cache communication techniques There are many protocol proposed for the inter- cache communication for a cooperative web caching. b b ICP Internet Cache protocol was proposed by Duane Wessels, K.Claffy 1997 b b Cache digest was proposed by Alex Rousskov, Duane wessels 1998 b b Summary Cache was proposed by Pei Cao in 1998 b b HTCP Hyper Text Caching Protocol was proposed by P.Vixie, D.Wessels 2000 b b CARP Cache Array Routing Protocol was proposed by Vinod Valloppillil, Keith W.Ross 1998

6 1. Internet Cache Protocol b b ICP is a message format protocol when, each cache collects information about the existence of a particular web object in the cache of its neighbours by sending an ICP_query message. b b The message is composed on fixed 20 octets header followed by a variable payload size.

b Opcode field,8 bit, it is an integer number that indicates the state of the message : query- hit – miss- denied b Version field indicate the number of ICP version used b Message length = header length + payload length at maximum 16 Kbytes b Payload : that contains the URL of the requested document, to which is depend the payload length Message lengthVersionOPCODE Request Number Option Option Data Sender Addresses Payload The message Format

8 Message Specification b b A cache send an ICP_query ( Opcode= 1) to all its neighbours to collect information about a particular document. b b The cache that receives the query extracts the URL of the document from the payload and sends a ICP response message ( Opcode =2,3). b b The cache that generate the query collects all the responses and select the best one to send an HTTP request to retrieve the document. b b There are two kinds of message hit- response: ICP_OP_HIT ICP_OP_HIT_Obj

9 Peer Selection : The selection of the best peer to retrieve the document can be done by selection algorithms based on the following parameters: b b RTT measurement : that measure the congestion between two nodes. it is variable with the time. b b Hop count : it is a constant measure.

10 Comparison between ICP format and HTTP message

11 2. Cache Digest Cache digest provides a mechanism for the communication among web caching. The digest contain a list of the URLs of the documents stored in the cache Digest Construction: b b The URLs of the document stored in the cache are indexed in the digest by a keys ( set of bits ) stored in a bloom filter. b b The keys are extracted from the URL by a number of hash functions that determines which bit must turn on and which must turn off. b b a bit turn on if its state change from 0 to 1 b b a bit turn off if its state change from 1 to 0

12 Bloom filter : b b Is a hash coding method, proposed by Burton H.Bloom in 1970 b b is based on the idea to reduce the hash area size that allows a small number of test to be falsely identified without increasing the reject time. b b Reject time :is the time needed to classify that an element does not belong the set of elements stored in the hash. b b The hash area is organised in N cells with N differences keys o…N-1, the document must be codified in N bits. b b Initially all the cells gas empty, all the bits are set of 0, to insert an element it is necessary to generate a set of hash addresses a1…….ad all are set of 1. b b To search an element, it is necessary to generate in the same way a set of hash addresses. If all are set of 1 that means the document is accepted and if any of these addresses are o that means the element is rejected

13 The calculation of the public keys b b The URLs is transformed by the MD5 in a public key (128 bits) which is composed on two parts: a numeric part 1-7 bits, the second parts represent the transformation of the URL. b b The hash function then, assign to each key an index extracted from the URL by doing the following computation : 1. Splitting the 128 bits in N parts 2. Finding the index to each part by calculating the modulo of the digest value to the digest size the digest size = (the number of bits for entry+ the public keys)  cache capacity. 3. Combining the indices of each part to compose the index of the correspondent public key

14 b b Digest Accuracy: b b The calculation of the public keys allows some possibility of errors. There are two kinds of errors: b b 1. False miss b b 2. False hit

15 Digest Requirement: b b - The digest is a large data structure. 200MB-2MB needed to store all the URLs of the documents stored in the cache. b b - It is necessary to do two copies of the digest one stored on the disk and the other in memory for the fast update. How does it work? - the cache exchange its own digest with its neighbours. - the cache digest message is composed on fixed 128 bytes in binary representation in the header which contain the digest specifications followed by the entire digest. - When a miss occurs in the local cache, it fetch in the other digest. - In the case of miss, the cache send an HTTP request to retrieve the document from the opportune location.

16 Conclusion b b Cache digest eliminate the ICP_Query -response message used for the collection of the information about the requested document but, it requires a lot of memory to store it, and it transfers a large quantity of information over the network is proportioned with the size of the digest

17 3. Summary Cache   It is proposed by Pei Cao and group of their student to reduce the internal traffic created by ICP_Query. b b Each proxy keeps a summary of the URLs of the document stored in each participating proxy. b b It scale well, because it can employs a large number of proxies to reduce the web traffic. b b Two main factors influence in the scalability : 1. Updating delay 2. Memory requirement

18 Updating delay: the summary is updated periodically or after a determined threshold of the documents is not reflected in the summary. b b Memory requirement : is depend on the way to represent the summary. b b The summary can be represented in the following way: b b exact directory : it requires a lot of memory, for 100 proxies of 8GB cache and 1 million of documents with average URL length is 50 bytes the space needed to represent the summary is 2MB. b b Server name: it reduces the summary size but, increase the possibility of error. b b Bloom filter : is proposed by Pei Cao to reduce the memory requirement of the summary. The documents are stored in the filter in the same way as cache digest with a difference in the calculation of index when, the hash function doing the following computation: bits are divided in four 32bit word to each is extracted an index by the modulo on the summary size. 2. Each proxy maintains a counter C (l) for each location l

19 There are three kind of errors: b b False hit b b False miss b b Remote hit stale

20 Comparison between the summary representation methods and ICP

21 Comparison between the summary representation methods and ICP Comparison between the summary representation methods and ICP Comparison between the summary representation methods and ICP Comparison between the summary representation methods and ICP Conclusion bbTbbThe memory requirement in the summary cache depends on the size of the individual summary and on the number of proxy

22 4. Protocols comparison Comparison of the three previous protocols in term of network traffic