Download presentation
Presentation is loading. Please wait.
Published byHailie Butner Modified over 9 years ago
0
HOMME Trace Analysis Fabrice Mizero Mentor: Dr. John Dennis
Collaborators: Prof. Malathi Veeraraghavan (University of Virginia) Prof. Robert D. Russell (University of New Hampshire) Qian Liu(University of New Hampshire) Aug 1, 2014
1
Roadmap Motivation Background Methodology Results
Conclusion and Solutions Future Work
2
Big Picture Understanding the causes of poor performance of CESM on Yellowstone: a 5-step approach Experimental execution and data collection HOMME trace analysis IBMgtSim: routing study Network simulation Integrated simulation
3
3-level 2-hop 4-hop 6-hop *Credit: Dr. John Dennis Zhengyang Liu
4
Suspected Causes Network Congestion OS Jitter
“…OS noise, shape of the allocated partition, and interference from other jobs.” Abhinav Bhatele et al. SC13 Network Congestion Head of Line Blocking Credit-Based Flow Control OS Jitter Kernel Interrupts Application Interference: Self-Interference Interference with others (Neighborhood Effect) Competition against OS Daemons, Timer Interrupts, buffer-cache synchronization, etc.
5
Congestion Head of Line Blocking (HOL) Worst Case Scenario:
Congestion Spreading due to HOL H1 Victim Flow Out of Buffer Space!! H4 Out of Buffer Space!! S1 S2 H2 H5 Stuck!!! H3 H6 H7
6
OS Jitter Each compute node runs its own OS - RHEL
Interference caused by OS routines Timer interrupts OS Daemons Hardware interrupts Competition for CPU resources. Example: Line Printer Daemon
7
3 Questions How does congestion impact network latency?
How important is OS Jitter to network latency? What has a bigger impact to message latency: OS Jitter or Congestion?
8
Experimental Set-Up Congestion: OS Jitter: 2 Platforms
Jellystone: Non-production machine Yellowstone: production machine Different message sizes & Hop distance OS Jitter: Linux Transparent Huge Pages (THP)
9
Extrae Trace Collection
Methodology Extrae Trace Collection Clock Skew Correction Hop, Size Hop, Size Wilcoxon Rank Sum Test
10
Extrae Tracing tool Developed at BSC
Chronologic event, state, communications records One way communication delays – Visuals with Paraver MPI-Isend Start Time End
11
Clock Skew Same size, Same Hop-Count, host-pair level
Host A Ca(t1) Ideally, CAB= Cb(t2) – Ca(t1) Host B Cb(t2) In reality, Offset = Ca(t) – Cb(t) != 0 Skew = Ca’(t) - Cb’(t) != 0 Same size, Same Hop-Count, host-pair level Min delay: best approximation of offset CAB(t) – min( CAB(t)) + minpingpong
12
Statistical Methods Wilcoxon Rank Sum Test:
Non-parametric significance test Compare the means of two independent populations Tests: OS Jitter? Jellystone: no THP <=> with THP Congestion? Yellowstone: 0-Hop delays 4-Hop Delays Jellystone: THP Yellowstone: THP
13
Perfquery Perfquery: IB performance counters query tool.
PortXmitWait: Port congestion monitoring Credit-Based Flow control TOR Switch Credits? PortXmitWait No Yes Host A
14
Results How important is OS Jitter to network latency?
Jellystone::0-Hop::NoTHP vs. Jellystone::0-Hop::THP Intranode communications delays with THP enabled are slower than without THP. Msg size Sample size p-Value Interpretation 488B 54624::45727 <0.001, <0.001,1 NoTHP is faster than with THP 1952B 9503::7950 2440B 102120::85468 2928B 47504::39764
15
Results What has a bigger impact to message latency: OS Jitter or Congestion? Comparing: Yellowstone: 0-Hop delays, 4-Hop delays For all considered message sizes, intranode communications delays can outweigh internode delays Msg size Sample size p-Values Interpretation 488B 54325::23621 <0.001, <0.001,1 4-Hop is faster than 0-Hop 2440B 101581::16529 2928B 47243::21259 4880B 49603::4720
16
Conclusion OS Jitter can cause performance degradation or variability. Inter-job interference can lead to application performance variability. Solutions Congestion: Dynamic Allocation of Virtual Lanes to redirect victim flows around congested ports. OS Jitter: Linux Tickless Kernel MPI-3 for better control over share-memory communications.
17
Future Work Further study on the Dynamic Virtual Lanes assignment solution Plan and collect new HOMME traces with PortXmitWait monitored and LSF Logs saved. Study intra-job interference More efficient algorithm of correcting Clock Skew
18
Fabrice Mizero fm9ab@virginia.edu
Thank You Fabrice Mizero
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.