Presentation on theme: "Roadmap Motivation Background Methodology Results"— Presentation transcript:
0 HOMME Trace Analysis Fabrice Mizero Mentor: Dr. John Dennis Collaborators:Prof. Malathi Veeraraghavan (University of Virginia)Prof. Robert D. Russell (University of New Hampshire)Qian Liu(University of New Hampshire)Aug 1, 2014
1 Roadmap Motivation Background Methodology Results Conclusion and SolutionsFuture Work
2 Big PictureUnderstanding the causes of poor performance of CESM on Yellowstone: a 5-step approachExperimental execution and data collectionHOMME trace analysisIBMgtSim: routing studyNetwork simulationIntegrated simulation
3 3-level2-hop4-hop6-hop*Credit: Dr. John DennisZhengyang Liu
4 Suspected Causes Network Congestion OS Jitter “…OS noise, shape of the allocated partition, and interference from other jobs.” Abhinav Bhatele et al. SC13Network CongestionHead of Line BlockingCredit-Based Flow ControlOS JitterKernel InterruptsApplication Interference:Self-InterferenceInterference with others (Neighborhood Effect)Competition against OS Daemons, Timer Interrupts, buffer-cache synchronization, etc.
5 Congestion Head of Line Blocking (HOL) Worst Case Scenario: Congestion Spreading due to HOLH1Victim FlowOut of Buffer Space!!H4Out of Buffer Space!!S1S2H2H5Stuck!!!H3H6H7
6 OS Jitter Each compute node runs its own OS - RHEL Interference caused by OS routinesTimer interruptsOS DaemonsHardware interruptsCompetition for CPU resources.Example: Line Printer Daemon
7 3 Questions How does congestion impact network latency? How important is OS Jitter to network latency?What has a bigger impact to message latency: OS Jitter or Congestion?
8 Experimental Set-Up Congestion: OS Jitter: 2 Platforms Jellystone: Non-production machineYellowstone: production machineDifferent message sizes & Hop distanceOS Jitter:Linux Transparent Huge Pages (THP)
9 Extrae Trace Collection MethodologyExtrae Trace CollectionClock Skew CorrectionHop, SizeHop, SizeWilcoxon Rank Sum Test
10 Extrae Tracing tool Developed at BSC Chronologic event, state, communications recordsOne way communication delays – Visuals with ParaverMPI-IsendStartTimeEnd
11 Clock Skew Same size, Same Hop-Count, host-pair level Host A Ca(t1)Ideally, CAB= Cb(t2) – Ca(t1)Host BCb(t2)In reality, Offset = Ca(t) – Cb(t) != 0Skew = Ca’(t) - Cb’(t) != 0Same size, Same Hop-Count, host-pair levelMin delay: best approximation of offsetCAB(t) – min( CAB(t)) + minpingpong
12 Statistical Methods Wilcoxon Rank Sum Test: Non-parametric significance testCompare the means of two independent populationsTests:OS Jitter?Jellystone: no THP <=> with THPCongestion?Yellowstone: 0-Hop delays 4-Hop DelaysJellystone: THP Yellowstone: THP
13 Perfquery Perfquery: IB performance counters query tool. PortXmitWait: Port congestion monitoringCredit-Based Flow controlTOR SwitchCredits?PortXmitWaitNoYesHost A
14 Results How important is OS Jitter to network latency? Jellystone::0-Hop::NoTHP vs. Jellystone::0-Hop::THPIntranode communications delays with THP enabled are slower than without THP.Msg sizeSample sizep-ValueInterpretation488B54624::45727<0.001, <0.001,1NoTHP is faster than with THP1952B9503::79502440B102120::854682928B47504::39764
15 ResultsWhat has a bigger impact to message latency: OS Jitter or Congestion?Comparing: Yellowstone: 0-Hop delays, 4-Hop delaysFor all considered message sizes, intranode communications delays can outweigh internode delaysMsg sizeSample sizep-ValuesInterpretation488B54325::23621<0.001, <0.001,14-Hop is faster than 0-Hop2440B101581::165292928B47243::212594880B49603::4720
16 ConclusionOS Jitter can cause performance degradation or variability.Inter-job interference can lead to application performance variability.SolutionsCongestion:Dynamic Allocation of Virtual Lanes to redirect victim flows around congested ports.OS Jitter:Linux Tickless KernelMPI-3 for better control over share-memory communications.
17 Future WorkFurther study on the Dynamic Virtual Lanes assignment solutionPlan and collect new HOMME traces with PortXmitWait monitored and LSF Logs saved.Study intra-job interferenceMore efficient algorithm of correcting Clock Skew