Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, cs.berkeley.edu.

Similar presentations


Presentation on theme: "Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, cs.berkeley.edu."— Presentation transcript:

1 Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, adj} @ cs.berkeley.edu

2 RADLAB goals Help Single Operator/ Developer sites become Google-scale Eliminate SW/HW obstacles for scaling Tools to identify/fix problems Use RAMP to move websites from right to left Source: Washington Post 3/31/2006 Top 50 Web Domains

3 RADLAB Goals (2) YouTube.com 2006 Daily Traffic Ranking Challenges: Scalability Configurability Single person operation Cost-effectiveness Reproducibility and Observability Source: Alexa.com

4 RAMP for Time Travel UCB goal for RAMP: Data center in a box Google/Amazon.com O(10,000) processors in data center Anticipate load of 3-6 months in future for fast moving company Smart Workload Generation: “smart gateware” design informed by SML analysis of workload Time Dilation on Emulated Machines: Try software/ config changes and observe behavior prior to deployment Targeted Component-specific Load Generation: Stress-test components in the critical path to determine performance limitations RAMP as emulation environment + workload generator

5 Building Blocks Workload generation engine Parser to extract data from server response Workload description language to specify primitives to compile onto an FPGA Machine Learning techniques to discover web interactions ML to determine interactions User scripts Real System Workload Generator

6 Naïve Workload Generator (CS252 class project by Lorenzo Orecchia and Madhur Tulsiani) Generate the data-set using analytical models server file size distribution request size distribution relative file popularity Derive URL connectivity graph, load in memory Circuit logic to perform random walk on graph. We achieve our goal of scalability: Graph Size: 1048576 Memory Usage: 21 MB Total data size: 21083 MB

7 Scalability: RAMP DRAM limits Given: 2GB DRAMs, 4 DRAM banks per FPGA, 100 MHz clock cycle ~ 10 ns. Per cycle = 21 bits per walk (21Mbits for 1M walks) Assume 10 clock cycles per access => 100 ns 10 million accesses per second per bank per walk Given four DRAM banks = 40M accesses per sec Compare: Google receives 2000 requests per second

8 Scalability: RAMP Ethernet limits Given: 20 10Gbit/sec Ethernet ports per board Assume can generate 100 million accesses/sec Naïve Response-ignorant workload generation: 4 bytes for URL + check sum + header… ~ 32 bytes => 3 GB for 100 million accesses (per second) Smart Response-driven workload generation: Google: 23KB Flickr: 45KB CNN: 100KB Assume up to 200KB response can receive 50K responses per second per port = 1M responses handled by 20 10G ports.

9 Ethernet limits cont. About 1000X Google average with Smart (response-driven) generation Mixed RAMP emulation/workload generation even higher BW inside box Have plenty of headroom to tradeoff speed for greater accuracy of workload

10 Some Open Questions Limits on types of workloads? Workload trace sources? Web services, existing traffic generators, …? Role of Response-Ignorant trace generation? UDP, error/congestion-free TCP, …? Required level of fidelity for Response- Driven trace generation? How much of TCP FSM to model?

11 Question/Feedback?

12 Backup Slides

13 State of the art Hardware vs. software based Hammer, Optixia vs SURGE, SLAMd Tunability vs Automation SPECweb, TPC-W, Harpoon vs Optixia, SLAMd Realistic vs Synthetic SURGE, SLAMd, Harpoon vs TPC-W Generic vs App-Specific SLAMd, Harpoon vs TPC-W, Hammer Open-loop vs Closed-loop Partly-open loop is most realistic for web services

14 Workload Generator Next Steps Handle server responses Include “server response” states in logic Parse server response to identify current state Include think-time distribution User think-time + server response time What happens when things go wrong? Improve temporal/spatial locality Prefetch other URLs a page is linked to Take advantage of Zipfian popularity distribution

15 Sketch of Random Walk Module MEMORYMEMORY

16 Data-set parameters Graph Size Memory Usage Total data size Average file size 409650 KB2138 MB521 KB 655361 MB3121 MB47.6 KB 104857621 MB21083 MB20.11 KB

17 Circuit properties Device : Virtex-E Maximum delay Walk Module: 3.99 ns Memory Module: 5.82 ns Estimated frequency = (1000/9.81) = 101.93 MHz Number of LUTs per walk: 593 Number of slices per walk: 307

18

19


Download ppt "Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, cs.berkeley.edu."

Similar presentations


Ads by Google