Presentation is loading. Please wait.

Presentation is loading. Please wait.

Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002.

Similar presentations


Presentation on theme: "Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002."— Presentation transcript:

1 Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA

2 Web Caching 1.Latency, 2.External traffic, 3.Load on web servers and routers. Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.

3 Centralized Web Cache Web Cache Browser Cache Web Server Browser Cache Client InternetCorporate LAN

4 InternetCorporate LAN Cooperative Web Cache Browser Cache Web Server Browser Cache Client Web Cache

5 Internet Decentralized Web Cache Browser Web Server Browser Cache Client Corporate LAN Browser Cache Squirrel

6 Distributed Hash Table Peer-to-peer location service: Pastry Completely decentralized and self-organizing Fault-tolerant, scalable, efficient Operations: Insert(k,v) Lookup(k) k6,v6 k1,v1 k5,v5 k2,v2 k4,v4 k3,v3 node s Peer-to-peer routing and location substrate

7 Why peer-to-peer ? 1.Cost of dedicated web cache No additional hardware 2.Administrative effort Self-organizing network 3.Scaling implies upgrading Resources grow with clients

8 Setting Corporate LAN 100 - 100,000 desktop machines Located in a single building or campus Each node runs an instance of Squirrel Sets it as the browsers proxy

9 Mapping Squirrel onto Pastry Two approaches: Home-store Directory

10 Home-store model client home LAN Internet URL hash

11 Home-store model client home …thats how it works!

12 Directory model Client nodes always cache objects locally. Home-store: home node also stores objects. Directory: the home node only stores pointers to recent clients, and forwards requests.

13 Directory model client home Internet LAN

14 Directory model client home Randomly choose entry from table

15 Directory: Advantages Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects seems to improve load balancing. Home-store scheme can incur hotspots.

16 Directory: Disadvantages Cache insertion only happens at clients, so: active clients store all the popular objects, inactive clients waste most of their storage. Implications: 1.Reduced cache size. 2.Load imbalance.

17 Directory: Load spike example Web page with many embedded images, or Periods of heavy browsing. Many home nodes point to such clients! Evaluate …

18 Trace characteristics Microsoft in : Redmond Cambrid ge Total duration1 day31 days Number of clients36,782105 Number of HTTP requests16.41 million0.971 million Peak request rate606 req/sec186 req/sec Number of objects5.13 million0.469 million Number of cacheable objects 2.56 million0.226 million Mean cacheable object reuse 5.4 times3.22 times

19 Total external traffic 85 90 95 100 105 0.0010.010.1110100 Directory Home-store No web cache Centralized cache Redmond [lower is better] Per-node cache size (in MB) Total external traffic (GB)

20 Total external traffic 5.5 5.6 5.7 5.8 5.9 6 6.1 0.0010.010.1110100 Total external traffic (GB) [lower is better] Per-node cache size (in MB) Directory Home-store No web cache Centralized cache Cambridge

21 LAN Hops 0% 20% 40% 60% 80% 100% 0123456 Total hops within the LAN Redmond CentralizedHome-storeDirectory % of cacheable requests

22 LAN Hops 0% 20% 40% 60% 80% 100% 012345 % of cacheable requests CentralizedHome-storeDirectory Cambridge Total hops within the LAN

23 Load in requests per sec 1 10 100 1000 10000 100000 01020304050 Number of times observed Max objects served per-node / second Home-store Directory Redmond

24 Load in requests per sec 1 10 100 1000 10000 100000 1e+06 1e+07 01020304050 Number of times observed Max objects served per-node / second Home-store Directory Cambridge

25 Load in requests per min 1 10 100 050100150200250300350 Number of times observed Max objects served per-node / minute Home-store Directory Redmond

26 Load in requests per min 1 10 100 1000 10000 020406080100120 Number of times observed Max objects served per-node / minute Home-store Directory Cambridge

27 Fault tolerance Sudden node failures result in partial loss of cached content. Home-store:Proportional to failed nodes. Directory:More vulnerable.

28 Fault tolerance Home-storeDirectory Redmond Mean 1% Max 1.77% Mean 1.71% Max 19.3% Cambrid ge Mean 1% Max 3.52% Mean 1.65% Max 9.8% If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:

29 Conclusions Possible to decentralize web caching. Performance comparable to a centralized web cache, Is better in terms of cost, scalability, and administration effort, and Under our assumptions, the home- store scheme is superior to the directory scheme.

30 Other aspects of Squirrel Adaptive replication –Hotspot avoidance –Improved robustness Route caching –Fewer LAN hops

31 Thanks.

32 (backup) Storage utilization Redmond Home-storeDirectory Total 97641 MB61652 MB Mean per-node 2.6 MB1.6 MB Max per-node 1664 MB

33 (backup) Fault tolerance Home-storeDirectory Equations Mean H/O Max H max /O Mean (H+S)/O Max max(H max,S max )/O Redmond Mean 0.0027% Max 0.0048% Mean 0.198% Max 1.5% Cambridge Mean 0.95% Max 3.34% Mean 1.68% Max 12.4%

34 (backup) Full home-store protocol server client other req home req a : object or notmod from home b : object or notmod from origin 3 1 b 2 (WAN) (LAN) origin b : req

35 (backup) Full directory protocol dir server e : cGET req origin other req home req client req 2 b : not-modified 3 e 3 2 1 c,e : req c,e : object 1 4 a, d 2 a, d : req 1 a : no dir, go to origin. Also d 2 3 1 not-modified object or dele- gate

36 (backup) Peer-to-peer Computing Decentralize a distributed protocol: – Scalable – Self-organizing – Fault tolerant – Load balanced Not automatic!!

37 Decentralized Web Cache Browser Cache Browser Cache Web Server LAN Internet

38 Challenge Decentralized web caching algorithm: Need to achieve those benefits in practice! Need to keep overhead unnoticeably low. Node failures should not become significant.

39 Peer-to-peer routing, e.g., Pastry Peer-to-peer object location and routing substrate = Distributed Hash Table. Reliably maps an object key to a live node. Routes in log 16 (N) steps (e.g. 3-4 steps for 100,000 nodes)

40 Home-store is better! Simpler home-store scheme achieves load balancing by hash function randomization. Directory scheme implicitly relies on access patterns for load distribution.

41 Directory scheme seems better… Avoids storing unnecessary copies of objects. Rapidly changing directory for popular objects results in load balancing.

42 Interesting difference Consider: – Web page with many images, or – Heavily browsing node Directory:many pointers to some node. Home-store:natural load balancing. Evaluate …

43 Fault tolerance Home-storeDirectory Redmond Mean 0.0027% Max 0.0048% Mean 0.2% Max 1.5% Cambrid ge Mean 0.95% Max 3.34% Mean 1.7% Max 12.4% When a single Squirrel node crashes, the fraction of lost cached content is:


Download ppt "Squirrel: A peer-to- peer web cache Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University) PODC 2002."

Similar presentations


Ads by Google