Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems Tayler H. Hetherington ɣ Timothy G. Rogers ɣ Lisa Hsu* Mike.

Similar presentations


Presentation on theme: "Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems Tayler H. Hetherington ɣ Timothy G. Rogers ɣ Lisa Hsu* Mike."— Presentation transcript:

1 Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems Tayler H. Hetherington ɣ Timothy G. Rogers ɣ Lisa Hsu* Mike O’Connor* Tor M. Aamodt ɣ ɣ UBC *AMD University of British Columbia In Proc ACM/IEEE Int’l Symp. On Performance Analysis of Systems and Software (ISPASS) Rich Miler –

2 Server farms require a lot of power – Need for efficient, cost-effective solutions – GPU/APUs New types of workloads – Non-HPC – Server applications Server applications – Memcached Programmer’s initial intuition into an application’s behavior Motivation Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU 2 Bruno Giussani – ww.wired.com

3 Background Memcached 3 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU *Slide from HPCA-18, 2012 Facebook Keynote, Sanjeev Kumar

4 Memcached - Compatible with GPU? Irregular control flow Irregular memory access patterns Large memory requirements Highly input data dependent 4 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

5 Porting Memcached Simple key-value lookup 5 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU GET Server 2 Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit READ (GET) requests on GPU WRITE (SET) requests on CPU

6 GET Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit GET Server 2 Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit GET Server 2 Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit Porting Memcached - Batching 6 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU GET Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit Server n GET Server 2 Hash Memory Key Comparison Return Hit/Miss Hash chaining Miss Hit

7 Porting Memcached 7 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU Main Goals – Increase request throughput – Keep request latency reasonable Main Challenges – Irregular memory access patterns – Irregular control flow – Data transfer overheads

8 Methodology Hardware – AMD Radeon HD 5870 (Discrete) – AMD Llano A (Fusion) – AMD Zacate E-350 (Fusion) Simulators – GPGPU-Sim v3.x – In-house GPU control flow simulator Testing and Simulation – Traces of Wikipedia accesses 8 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

9 Porting Memcached Memory Access One request per work item Data accesses for GET requests are input data dependent Data can be anywhere in memory – Poor performance on GPU? 9 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

10 Porting Memcached Memory Divergence 10 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

11 Porting Memcached Control Flow Recall the control flow graph Many branch outcomes are input data dependent 11 Work item ID 1 – 2 – 3 – 4 – 5 1 – 2 – 5 3 – 4 1 – 523 – 4 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

12 Porting Memcached Control Flow 12 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU 15%40% 62% 29% Overall 51%

13 Porting Memcached Data Management 13 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU Dynamic memory manager Transfer memory regions to device Virtual addresses different on host and device

14 Porting Memcached Data Transfer Reduction Fusion Systems – Physical shared memory region between host and device – Zero-copy data Discrete Systems – Possible transfer reduction techniques Reduction in unnecessary transfers Acyclic data transfers (Overlap comm. with comp.) Automatic data transfer frameworks 14 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

15 Porting Memcached 15 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU

16 Results Radeon HD Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU ~8000 requests yields highest ratio of throughput to latency

17 Summary Programmer intuition doesn’t always paint the whole picture We exploited the available parallelism on GPUs by batching requests, showing a 7.5X performance increase on the Llano system Data transfer overheads can have a large impact on overall performance Thank you – Questions? 17 Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU Rich Miler –


Download ppt "Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems Tayler H. Hetherington ɣ Timothy G. Rogers ɣ Lisa Hsu* Mike."

Similar presentations


Ads by Google