Projections Overview Ronak Buch & Laxmikant (Sanjay) Kale

Slides:



Advertisements
Similar presentations
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
Advertisements

Automated Instrumentation and Monitoring System (AIMS)
CompSci Applets & Video Games. CompSci Applets & Video Games The Plan  Applets  Demo on making and running a simple applet from scratch.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Chapter 11 - Monitoring Server Performance1 Ch. 11 – Monitoring Server Performance MIS 431 – created Spring 2006.
V0.01 © 2009 Research In Motion Limited Introduction to Java Application Development for the BlackBerry Smartphone Trainer name Date.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 14 Server and Network Monitoring.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
1 Using Compressed Files and Folders Applications and operating systems read and write to compressed files. NTFS uncompresses the file before making it.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Introduction and simple using of Oracle Logistics Information System Yaxian Yao
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
MCTS Guide to Microsoft Windows 7
Applets & Video Games 1 Last Edited 1/10/04CPS4: Java for Video Games Applets &
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CCS APPS CODE COVERAGE. CCS APPS Code Coverage Definition: –The amount of code within a program that is exercised Uses: –Important for discovering code.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
PMaC Performance Modeling and Characterization Performance Modeling and Analysis with PEBIL Michael Laurenzano, Ananta Tiwari, Laura Carrington Performance.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
XA R7.8 Link Manager Belinda Daub Sr. Technical Consultant 1.
VAMPIR. Visualization and Analysis of MPI Resources Commercial tool from PALLAS GmbH VAMPIRtrace - MPI profiling library VAMPIR - trace visualization.
Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Belgrade, 25 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Performance analysis Tools: a case study of NMMB on Marenostrum.
1 Scaling Applications to Massively Parallel Machines using Projections Performance Analysis Tool Presented by Chee Wai Lee Authors: L. V. Kale, Gengbin.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
Projections - A Step by Step Tutorial By Chee Wai Lee For the 2004 Charm++ Workshop.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
FTC-Charm++: An In-Memory Checkpoint-Based Fault Tolerant Runtime for Charm++ and MPI Gengbin Zheng Lixia Shi Laxmikant V. Kale Parallel Programming Lab.
Tuning Threaded Code with Intel® Parallel Amplifier.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
Robust Non-Intrusive Record-Replay with Processor Extraction Filippo Gioachin Gengbin Zheng Laxmikant Kalé Parallel Programming Laboratory Departement.
PADTAD 2008 Memory Tagging in Charm++ Filippo Gioachin Laxmikant V. Kalé Department of Computer Science University of Illinois at Urbana-Champaign.
Monitoring Windows Server 2012
Core LIMS Training: Project Management
OPERATING SYSTEMS CS 3502 Fall 2017
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Chapter 3: Process Concept
MCTS Guide to Microsoft Windows 7
In-situ Visualization using VisIt
Performance Evaluation of Adaptive MPI
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Case Studies with Projections
Why Threads Are A Bad Idea (for most purposes)
BigSim: Simulating PetaFLOPS Supercomputers
Introducing NTFS Reliability Security Long file names Efficiency
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Support for Adaptivity in ARMCI Using Migratable Objects
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Trace and Logs analysis
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Projections Overview Ronak Buch & Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign

Manual http://charm.cs.illinois.edu/manuals/html/projections/manual-1p.html Full reference for Projections, contains more details than these slides.

Projections Performance analysis/visualization tool for use with Charm++ Works to limited degree with MPI Charm++ uses runtime system to log execution of programs Trace-based, post-mortem analysis Configurable levels of detail Java-based visualization tool for performance analysis

Instrumentation Enabling Instrumentation Basics Customizing Tracing Tracing Options

How to Instrument Code Build Charm++ with the --enable-tracing flag Select a -tracemode when linking That’s all! Runtime system takes care of tracking events

Basics Traces include variety of events: Entry methods Methods that can be remotely invoked Messages sent and received System Events Idleness Message queue times Message pack times etc.

Basics - Continued Traces logged in memory and incrementally written to disk Runtime system instruments computation and communication Generates useful data without excessive overhead (usually)

Custom Tracing - User Events Users can add custom events to traces by inserting calls into their application. Register Event: int traceRegisterUserEvent(char* EventDesc, int EventNum=-1) Track a Point-Event: void traceUserEvent(int EventNum) Track a Bracketed-Event: void traceUserBracketEvent(int EventNum, double StartTime, double EndTime)

Custom Tracing - Annotations Annotation supports allows users to easily customize the set of methods that are traced. Annotating entry method with notrace avoids tracing and saves overhead Adding local to non-entry methods (not traced by default) adds tracing automatically

Custom Tracing - API API allows users to turn tracing on or off: Trace only at certain times Trace only subset of processors Simple API: void traceBegin() void traceEnd() Works at granularity of PE.

Custom Tracing - API Often used at synchronization points to only instrument a few iterations Reduces size of logs while still capturing important data Allows analysis to be focused on only certain parts of the application

Tracing Options Two link-time options: -tracemode projections Full tracing (time, sending/receiving processor, method, object, …) -tracemode summary Performance of each PE aggregated into time bins of equal size Tradeoff between detail and overhead

Tracing Options - Runtime +traceoff disables tracing until a traceBegin() API call. +traceroot <dir> specifies output folder for tracing data +traceprocessors RANGE only traces PEs in RANGE

Tracing Options - Summary +sumdetail aggregate data by entry method as well as time-intervals. (normal summary data is aggregated only by time-interval) +numbins <k> reserves enough memory to hold information for <k> time intervals. (default is 10,000 bins) +binsize <duration> aggregates data such that each time-interval represents <duration> seconds of execution time. (default is 1ms)

Tracing Options - Projections +logsize <k> reserves enough buffer memory to hold <k> events. (default is 1,000,000 events) +gz-trace, +gz-no-trace enable/disable compressed (gzip) log files

Memory Usage What happens when we run out of reserved memory? -tracemode summary: doubles time-interval represented by each bin, aggregates data into the first half and continues. -tracemode projections: asynchronously flushes event log to disk and continues. This can perturb performance significantly in some cases.

Projections Client Scalable tool to analyze up to 300,000 log files A rich set of tool features : time profile, time lines, usage profile, histogram, extrema tool Detect performance problems: load imbalance, grain size, communication bottleneck, etc Multi-threaded, optimized for memory efficiency

Visualizations and Tools Tools of aggregated performance viewing Time profile Histogram Communication Tools of processor level granularity Overview Timeline Tools of derived/processed data Outlier analysis: identifies outliers

Analysis at Scale Fine grain details can sometimes look like one big solid block on timeline. It is hard to mouse-over items that represent fine-grained events. Other times, tiny slivers of activity become too small to be drawn.

Analysis Techniques Zoom in/out to find potential problem spots. Mouseover graohs for extra details. Load sufficient but not too much data. Set colors to highlight trends. Use the history feature in dialog boxes to track time-ranges explored.

Dialog Box

Select processors: 0-2,4-7:2 gives 0,1,2,4,6 Dialog Box

Select time range Dialog Box

Add presets to history Dialog Box

Aggregate Views

Time Profile

Time spent by each EP summed across all PEs in time interval

Histogram

Shows statistics in “frequency” domain. How many methods took a certain amount of time? What percentage of total execution time was composed of methods taking a certain execution duration? Shows statistics in “frequency” domain.

Communication vs. Time

Shows communication over all PEs in the time domain. Y-axis in units of bytes sent, shows number of bytes sent from each method in each time interval Shows communication over all PEs in the time domain.

Communication per Processor

Shows how much each PE communicated over the whole job.

Processor Level Views

Overview

Time on X, different PEs on Y

Intensity of plot represents PE’s utilization at that time

Timeline

Most common view. Much more detailed than overview.

Clicking on EPs traces messages, mouseover shows EP details.

Colors are different EPs Colors are different EPs. White ticks on bottom represent message sends, red ticks on top represent user events.

Processed Data Views Will only mention one, other processed data views exist (noise miner) but are of utility in only narrow domain

Outlier Analysis

k-Means to find “extreme” processors

Global Average

Non-Outlier Average

Outlier Average

Cluster Representatives and Outliers

Advanced Features Live Streaming Online Extrema Analysis Run server from job to send performance traces in real time Online Extrema Analysis Perform clustering during job; only save representatives and outliers Multirun Analysis Side by side comparison of data from multiple runs

Future Directions PICS - expose application settings to RTS for on the fly tuning End of run analysis - use remaining time after job completion to process performance logs Simulation - Increased reliance on simulation for generating performance logs Most analysis is currently done on users’ workstations, for large jobs, this becomes untenable; end of run analysis would improve this

Conclusions Projections has been used to effectively solve performance woes Constantly improving the tools Scalable analysis is become increasingly important Most analysis is currently done on users’ workstations, for large jobs, this becomes untenable; end of run analysis would improve this