Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson.

Similar presentations


Presentation on theme: "Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson."— Presentation transcript:

1 Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

2 Cache Storage for the Next Billion 2 The Next Billion Developing regions are not all alike Many people have stable food, clean water, reasonable power Connectivity, however, is bad Growing middle class with desire for education & technology These people are the next billion

3 Cache Storage for the Next Billion 3 Bad Networking & Options Africa often backhauled through Europe Satellite latency not fun Ghana: 2Mbps, $6000/month! Emerging option: disk 1TB disk now $200 Even latency better than satellite

4 Cache Storage for the Next Billion 4 Enter the Tiny Laptops Problem – memory in 256MB range

5 Cache Storage for the Next Billion 5 Making Storage Work Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache Preloaded Web sites – Wikipedia, etc Ship disk to schools Update as needed Pull update caches on-demand during peak Push updates off peak, overnight

6 Cache Storage for the Next Billion 6 Deployment Scenarios Special servers per school 2 for redundancy Average school size: 100 students @ 100/laptop, $10K/school Problems 2 servers @ $5K doubles per-school cost Servers dont ride laptop commodity curves Solution: no servers, just laptops

7 Cache Storage for the Next Billion 7 Goal: 1 TB Cache Store on a 256MB Laptop Why caching? Improves Web access Improves WAN access Problem Large disks are really slow Disk storage requires index In-memory indices optimize disk access

8 Cache Storage for the Next Billion 8 Memory Index Sizing Squid: popular HTTP cache 72 bytes/object Web objects average 8KB each 1TB = 125M objects 125M objects = 9GB RAM just for index Commercial caches: better RAM usage 32 bytes/object 1TB disk = 4GB RAM

9 Cache Storage for the Next Billion 9 Revisiting Cache Indexing Seek reduction important Most objects small Access largely random High insert rate Assume hit rate is 50% Assume cachable rate is 50% Insert rate = 25% of request rate High delete rate Caches largely full If insert rate = 25%, delete rate = 25% Deletion using LRU, etc

10 Cache Storage for the Next Billion 10 Restarting the Design Eliminate in-memory index Treat disk like memory Optimize data structures for locality Use location-sensitive algorithms Measure performance Now consider what to add For each addition, measure performance

11 Cache Storage for the Next Billion 11 What This Yields HashCache family One basic storage engine Pluggable algorithms & indexing HashCache proxy Web proxy using HashCache engine

12 Cache Storage for the Next Billion 12 Performance Comparison

13 Cache Storage for the Next Billion 13 Index Bits Per Object 240 576

14 Cache Storage for the Next Billion 14 Index Bits Per Object 240 576 00 11 31 39

15 Cache Storage for the Next Billion 15 HashCache Memory

16 Cache Storage for the Next Billion 16 Storage Limits w/2GB Index

17 Cache Storage for the Next Billion 17 Beyond Diminishing Returns HTTP cachability has upper limit Beyond that, items revalidated helps Revalidation on demand, or background Uncached content still cachable Wide-area accelerators Must still contact servers, though

18 Cache Storage for the Next Billion 18 Why WAN Acceleration? Lots of slowly-changing data Wikipedia News sites Customized sites WAN acceleration middleboxes Custom protocol between boxes Standard protocols to rest of net Less desirable than caches for Web

19 Cache Storage for the Next Billion 19 WAN Acceleration Dilemma WAN accelerators use chunks Transit stream broken into chunks Small chunks = high compression Also lots of small objects Large chunks = high performance But worse for compression Memory & disk important

20 Cache Storage for the Next Billion 20 Merging WAN Acc & HashCache Easily index huge # chunks Small chunks OK Large chunks better Store chunks redundantly Optimize for performance & compression Communicate tradeoffs to cache layer

21 Cache Storage for the Next Billion 21 Deployments Two cache instances deployed Both in Africa Shared machines, multiple services Working with OLPC on deployment Working on licensing Hopefully resolved this year Goal: all-in-one server for schools

22 Cache Storage for the Next Billion 22 Longer Term Goals Effort started around server consolidation Virtualization nice, except for memory Many apps very page-fault sensitive Extracting & sharing components desirable More work in developing regions Even within the US: poor, rural, etc Customization for school-like workloads More work on peak/off-peak behavior


Download ppt "Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson."

Similar presentations


Ads by Google