SEEDING CLOUD-BASED SERVICES: DISTRIBUTED RATE LIMITING (DRL) Kevin Webb, Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren
Seeding the Cloud Technologies to deliver on the promise cloud computing Previously: Process data in the cloud (Mortar) Produced/stored across providers Find Ken Yocum or Dennis Logothetis for more info Today: Control resource usage: “cloud control” with DRL Use resources at multiple sites (e.g., CDN) Complicates resource accounting and control Provide cost control
DRL Overview Example: Cost control in a Content Distribution Network Abstraction: Enforce global rate limit across multiple sites Simple example: 10 flows, each limited as if there was a single, central limiter SrcDst Limiter SrcDst Limiter SrcDst Limiter DRL 10 flows 2 flows 8 flows 20 KB/s 100 KB/s 80 KB/s
Goals & Challenges Up to now Develop architecture and protocols for distributed rate limiting (SIGCOMM 07) Particular approach (FPS) is practical in the wide area Current goals: Move DRL out of the lab and impact real services Validate SIGCOMM results in real-world conditions Provide Internet testbed with ability to manage bandwidth in a distributed fashion Improve usability of PlanetLab Challenges Run-time overheads: CPU, memory, communication Environment: link/node failures, software quirks
PlanetLab World-wide test bed Networking and systems research Resources donated by Universities, Labs, etc. Experiments divided into VMs called “slices” (Vservers) PostgreSQL PLC APIWeb server Linux 2.6 Internet Controller Vservers Slice 1 Linux 2.6 Slice 2 Slice N Vservers Slice 1 Linux 2.6 Slice 2 Slice N Nodes
PlanetLab Use Cases PlanetLab needs DRL! Donated bandwidth Ease of administration Machine room Limit local-area nodes to a single rate Per slice Limit experiments in the wide area Per organization Limit all slices belonging to an organization
PlanetLab Use Cases Machine room Limit local-area nodes with a single rate 1 MBps DRL 5 MBps
DRL Design Each limiter - main event loop Estimate: Observe and record outgoing demand Allocate: Determine rate share of each node Enforce: Drops packets Two allocation approaches GRD: Global random drop (packet granularity) FPS: Flow proportional share Flow count as proxy for demand Input Traffic Output traffic Estimate Allocate Enforce Regular Interval Other Limiters FPS
Implementation Architecture Abstractions Limiter Communication Manages identities Identity Parameters (limit, interval, etc.) Machines and Subsets Built upon standard Linux tools… Userspace packet logging (Ulogd) Hierarchical Token Bucket Mesh & gossip update protocols Integrated with PlanetLab software Input Data Output Data Estimate FPS Enforce Regular Interval Ulogd HTB
Estimation using ulogd Userspace logging daemon Already used by PlanetLab for efficient abuse tracking Packets tagged with slice ID by IPTables Receives outgoing packet headers via netlink socket DRL implemented as ulogd plug-in Gives us efficient flow accounting for estimation Executes the Estimate, Allocate, Enforce loop Communicates with other limiters
Enforcement with Hierarchical Token Bucket Linux Advanced Routing & Traffic Control Hierarchy of rate limits Enforces DRL’s rate limit Packets attributed to leaves (slices) Packets move up, borrowing from parents B C D Y Z A XRoot Packet (1500b) 1000b 100b 600b Packet (1500) 0b 200b
Enforcement with Hierarchical Token Bucket Uses same tree structure as PlanetLab Efficient control of sub-trees Updated every loop Root limits whole node Replenish each level B C D Y Z A XRoot
Citadel Site The Citadel (2 nodes) Wanted 1 Mbps traffic limit Added (horrible) traffic shaper Poor responsiveness (2 – 15 seconds) Running right now! Cycles on and off every four minutes Observe DRL’s impact without ground truth Shaper DRL
Citadel Results – Outgoing Traffic Data logged from running nodes Takeaways: Without DRL, way over limit One node sending more than other Time Outgoing Traffic 1Mbit/s On Off
Citadel Results – Flow Counts Time # of Flows FPS uses flow count as proxy for demand
Citadel Results – Limits and Weights Time Rate Limit FPS Weight
Lessons Learned Flow counting is not always the best proxy for demand FPS state transitions were irregular Added checks and dampening/hysteresis in problem cases Can estimate after enforce Ulogd only shows packets after HTB FPS is forgiving to software limitations HTB is difficult HYSTERESIS variable TCP Segmentation offloading
Ongoing work Other use cases Larger-scale tests Complete PlanetLab administrative interface Standalone version Continue DRL rollout on PlanetLab UCSD’s PlanetLab nodes soon
Questions? Code is available from PlanetLab svn
Citadel Results