Kevin A. Huck Department of Computer and Information Science Performance Research Laboratory University of.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Machine Learning-based Autotuning with TAU and Active Harmony Nicholas Chaimov University of Oregon Paradyn Week 2013 April 29, 2013.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Profiling your application with Intel VTune at NERSC
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Robert Bell, Allen D. Malony, Sameer Shende Department of Computer and Information Science Computational Science.
Scalability Study of S3D using TAU Sameer Shende
Sameer Shende Department of Computer and Information Science Neuro Informatics Center University of Oregon Tool Interoperability.
Profiling S3D on Cray XT3 using TAU Sameer Shende
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen.
Nick Trebon, Alan Morris, Jaideep Ray, Sameer Shende, Allen Malony {ntrebon, amorris, Department of.
On the Integration and Use of OpenMP Performance Tools in the SPEC OMP2001 Benchmarks Bernd Mohr 1, Allen D. Malony 2, Rudi Eigenmann 3 1 Forschungszentrum.
Performance Tools BOF, SC’07 5:30pm – 7pm, Tuesday, A9 Sameer S. Shende Performance Research Laboratory University.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon TAU Performance.
Performance Evaluation of S3D using TAU Sameer Shende
Scalability Study of S3D using TAU Sameer Shende
Performance Tools for Empirical Autotuning Allen D. Malony, Nick Chaimov, Kevin Huck, Scott Biersdorff, Sameer Shende
Allen D. Malony Performance Research Laboratory (PRL) Neuroinformatics Center (NIC) Department.
An Automated Component-Based Performance Experiment and Modeling Environment Van Bui, Boyana Norris, Lois Curfman McInnes, and Li Li Argonne National Laboratory,
Autotuning Large Computational Chemistry Codes PERI Principal Investigators: David H. Bailey (lead)Lawrence Berkeley National Laboratory Jack Dongarra.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Christopher Jeffers August 2012
Allen D. Malony, Kevin Huck Department of Computer and Information Science Performance.
German National Research Center for Information Technology Research Institute for Computer Architecture and Software Technology German National Research.
Paradyn Week – April 14, 2004 – Madison, WI DPOMP: A DPCL Based Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Magnetic Field Measurement System as Part of a Software Family Jerzy M. Nogiec Joe DiMarco Fermilab.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications Boyana Norris Argonne National Laboratory Van Bui, Lois.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.
John Mellor-Crummey Robert Fowler Nathan Tallent Gabriel Marin Department of Computer Science, Rice University Los Alamos Computer Science Institute HPCToolkit.
Profile Analysis with ParaProf Sameer Shende Performance Reseaerch Lab, University of Oregon
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
PerfExplorer Component for Performance Data Analysis Kevin Huck – University of Oregon Boyana Norris – Argonne National Lab Li Li – Argonne National Lab.
S. Shumilov – Zürich Analytical Visualization Framework - a visual data processing and knowledge discovery system Ivan Denisovich, Serge Shumilov Department.
Allen D. Malony Department of Computer and Information Science TAU Performance Research Laboratory University of Oregon Discussion:
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Testing in Android. Methods Unit Testing Integration Testing System Testing Regression Testing Compatibility Testing Black Box (Functional) White Box.
CrossCheckSimulation Results Conclusions References Model Instrumentation Modeling with CUTS Property Specification SPRUCE Challenge Problem Checking Model.
® IBM Software Group © 2009 IBM Corporation Essentials of Modeling with the IBM Rational Software Architect, V7.5 Module 15: Traceability and Static Analysis.
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory University.
Integrated Performance Views in Charm++: Projections meets TAU Scott Biersdorff Allen D. Malony Department Computer and Information Science University.
Allen D. Malony Department of Computer and Information Science Performance Research Laboratory.
Performane Analyzer Performance Analysis and Visualization of Large-Scale Uintah Simulations Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance.
Requirement Engineering with URN: Integrating Goals and Scenarios Jean-François Roy Thesis Defense February 16, 2007.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Performance Tool Integration in Programming Environments for GPU Acceleration: Experiences with TAU and HMPP Allen D. Malony1,2, Shangkar Mayanglambam1.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Advanced Computer Systems
Introduction to the TAU Performance System®
Performance Technology for Scalable Parallel Systems
Self Healing and Dynamic Construction Framework:
TAU integration with Score-P
Lawrence Livermore National Laboratory
Allen D. Malony, Sameer Shende
Allen D. Malony Computer & Information Science Department
Overview of Workflows: Why Use Them?
TAU Performance DataBase Framework (PerfDBF)
Presentation transcript:

Kevin A. Huck Department of Computer and Information Science Performance Research Laboratory University of Oregon Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Outline  Parallel performance data mining  PerfExplorer as application  PerfExplorer as framework  Data management framework  Data mining framework  Automation  Knowledge capture  Expert system  Recent Work as Framework  OpenUH Compiler Feedback Emphasis on modularity and programmability

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Motivation for Performance Data Mining  Overwhelming complexity in parallel profile analysis  Events  Scale  Unmanaged collections of performance data  Directories of profiles  No easy way to perform parametric studies  Terascale, petascale performance profiles difficult to:  Manage  Visualize  Analyze  Needed a way to explore the data in new ways

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer As Application  Java GUI Application  Data browsing  Custom views  Metadata browsing  XML tree table  Parametric studies, charts, graphs  3-D visualization of performance profiles  Data mining  Clustering  Correlation  Dimension reduction

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer As Framework  Transition from “tool” to “toolbox”  Automation  Repeatable analysis  Chain operations together as workflows  Persistence  Option to save intermediate / derived results  Reuse cached results  Provenance  Information on how intermediate / final results created  Knowledge Capture / Reuse  Parallel performance profile expert system

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer v2: Architecture

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Performance Data Management  Need for robust processing and storage of multiple profile performance data sets  Avoid developing independent data management solutions  Waste of resources  Incompatibility among analysis tools  Goals  Foster multi-experiment performance evaluation  Develop a common, reusable foundation of performance data storage, access and sharing  A core module in an analysis system, and/or as a central repository of performance data

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfDMF Approach  Performance Data Management Framework  Extensible toolkit to promote integration and reuse across available performance tools  Supported profile formats: TAU, CUBE (Kojak), Dynaprof, HPC Toolkit (Rice), HPM Toolkit (IBM), gprof, mpiP, psrun (PerfSuite), Open|SpeedShop, OMPP, GPTL, IPM, PERI-XML, Paraver  Supported DBMS: PostgreSQL, MySQL, Oracle, DB2, Derby/Cloudscape  Profile query and analysis API

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfDMF Schema All supported data formats map to this schema All supported data formats map to this schema XML_METADATA

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Knowledge Metadata role in analysis SourceCode Build Environmen t Run Environmen t Performance Result Execution you have to capture these......to understand this Application Machine Performanc e Problems

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Metadata Collection  Integration of XML metadata for each profile  Ways to incorporate metadata  Measured hardware/system information  CPU speed, memory in GB, MPI node IDs, …  Application instrumentation (application-specific)  Application parameters, input data, domain decomposition  PerfDMF data management tools can incorporate an XML file of additional metadata  Compiler flags, submission scripts, input files, …  Metadata can be imported from / exported to PERI-DB  PERI SciDAC project (UTK, NERSC, UO, PSU, TAMU)

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF XML Metadata File Example build:module.F90 qk-pgf90 -I. -fastsse -c module.inst.F90 - I/spin/home/khuck/src/tau2/include -I/opt/xt-mpt/default/mpich2-64/P2/include -o module.o buildenv:XNLSPATH /usr/X11R6/lib/X11/nls job:#PBS -l walltime=0:45:00,size=512 input:mstep 100 Build data Environment data Job data Input file data

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Metadata for Each Experiment

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Performance Data Mining  Manage performance complexity  Discover performance relationship and properties  Multi-experiment performance analysis  Large-scale performance data reduction  Summarize characteristics of large processor runs  Implement extensible analysis framework  Abstraction / automation of data mining operations  Interface to existing analysis and data mining tools  R / Weka implementation  Clustering, normalization, PCA, linear regression, …

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer v2 – Requirements and Features  Component-based analysis process  Analysis operations implemented as modules  Linked together in analysis process and workflow  Scripting  Provides process/workflow development and automation  Metadata input, management, and access  Inference engine  Reasoning about causes of performance phenomena  Analysis knowledge captured in expert rules  Persistence of intermediate results  Provenance  Provides historical record of analysis results

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer Component Interaction

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Example Analysis Components Basic StatisticsExtract eventsTop X events CopyExtract metricsTop X percent events CorrelationExtract phasesANOVA Correlation with metadataExtract rankLinear regression Derive metricsk-meansRatios DifferenceMerge trialsNon-linear regression* Extract callpath eventsPCABackward elimination* Extract non-callpath eventsScalabilityCorrelation elimination* * future development

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Analysis Processes and Workflow Automation  Create analysis processes from analysis components  Repeatable analysis for common analysis workflows  Extensible solution with easy accessiblity and reuse  Run analysis workflows automatically  Need programming system for process development  Provide full access to PerfDMF and PerfExplorer  Chose Jython (Python interpreter written in Java)   other languages possible  Java objects can execute Python scripts  Python scripts can execute Java code  Extend data mining analysis knowledge

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer Script Example # create a rulebase for processing ruleHarness = RuleHarness.useGlobalRules("openuh/OpenUHRules.drl") # load a trial Utilities.setSession("openuh”) trial = TrialMeanResult(Utilities.getTrial("Fluid Dynamic", "rib 45", "1_8")) # calculate the derived metric stalls = "BACK_END_BUBBLE_ALL" cycles = "CPU_CYCLES" operator = DeriveMetricOperation(trial, stalls, cycles, DeriveMetricOperation.DIVIDE) derived = operator.processData().get(0) # compare values to average for application for event in derived.getEvents(): MeanEventFact.compareEventToMain(derived, mainEvent, derived, event) # process the rules ruleHarness.processRules()

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Knowledge Inferencing  Knowledge capture / reuse  Parallel performance profile expert system  Performance expertise captured as inference rules  Rules used to help explain performance symptoms  Need rule development system and execution  Chose JBoss Rules (Java-based rules engine)  only supports forward-chaining  Facts are asserted  Rules are processed to ascertain all true assertions  Rules fire  arbitrary code is executed  new facts can be asserted, which can cause new rules to fire

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF PerfExplorer Rule Example rule "High Remote Memory Accesses" when // there is a low local/remote ratio f : MeanEventFact ( m : metric == "((L3_MISSES-DATA_EAR_CACHE_LAT128)/L3_MISSES)", b : betterWorse == MeanEventFact.LOWER, severity : severity > 0.02, e : eventName, a : mainValue, v : eventValue, factType == "Compared to Main" ) then System.out.println("The event " + e + " has a lower than average local/remote memory reference ratio."); System.out.println("\tAverage ratio: " + a + ", Event ratio: " + v); System.out.println("\tPercentage of total runtime: " + f.getPercentage()); end

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Analysis Examples  Correlating performance differences with metadata  Intelligent scaling analysis  L1, L2 hit ratio hotspots  Memory bound region detection  Load balance analysis  Examining stall sources (Itanium HW counters)  Memory stall source breakdown (Itanium HW counters)  Phase analysis (i.e. per-iteration analysis)  Context event analysis (i.e. message sizes)  Power modeling for optimization  More data needed “detection”

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Example: OpenUH Compiler Feedback  OpenUH: branch of the open-source Open64 compiler suite for C, C++, Fortran95, OpenMP 2.5  Wanted to give feedback to the compiler for:  Evaluating performance of OpenMP compile and runtime decision making  Validating cost model computation in the loop nest optimizer (LNO)  Provide runtime-based feedback to the compiler to improve LNO  Example: Multiple Sequence Alignment  OpenMP parallelism with default pragma behavior  Switching to dynamic scheduling improves performance

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Multiple Sequence Alignment, 16 threads

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Script to detect conditions for load imbalance ruleHarness = RuleHarness.useGlobalRules("openuh/OpenUHRules.drl") Utilities.setSession("openuh") trial = TrialResult(Utilities.getTrial("static", "size.400", "16.threads")) extractor = ExtractNonCallpathEventOperation(trial) extracted = extractor.processData().get(0) statMaker = BasicStatisticsOperation(extracted, False) stats = statMaker.processData() stddev = stats.get(BasicStatisticsOperation.STDDEV) means = stats.get(BasicStatisticsOperation.MEAN) totals = stats.get(BasicStatisticsOperation.TOTAL) ratioMaker = RatioOperation(stddev, means) ratios = ratioMaker.processData().get(0) for event in ratios.getEvents(): MeanEventFact.evaluateLoadBalance(means, ratios, event, metric) extractor = ExtractCallpathEventOperation(trial) extracted = extractor.processData().get(0) for event in extracted.getEvents(): fact = FactWrapper("Callpath name/value", event, None) RuleHarness.assertObject(fact) RuleHarness.getInstance().processRules()

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Rules to interpret conditions of load imbalance rule "Load Imbalance" when // there is a load imbalance for one event which is a significant event f : MeanEventFact ( m : metric, severity : severity > 0.05, e : eventName, a : mainValue > 0.05, v : eventValue > 0.25, factType == "Load Imbalance" ) then assert(new FactWrapper("Imbalanced Event", e, null)); end rule "New Schedule Suggested" when f1 : FactWrapper ( factName == "Imbalanced Event", e1 : factType ) f2 : FactWrapper ( factName == "Imbalanced Event", e2 : factType != e1 ) f3 : FactWrapper ( factName == "Callpath name/value", e3 : factType ) eval ( e3.equals( e1 + " => " + e2) ) then System.out.println(e1 + " calls " + e2 + ", and they are both showing signs of load imbalance."); System.out.println("If these events are in an OpenMP parallel region, consider methods to balance the workload, including dynamic instead of static work assignment.\n"); end

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF OpenMP loop, before and after #pragma omp for for(md=first;md<=last;md++) { for(nd=md+1;nd<=last;nd++) { /* … do stuff …*/ } int chunk = 1; #pragma omp for schedule(dynamic,chunk) nowait for(md=first;md<=last;md++) { for(nd=md+1;nd<=last;nd++) { /* … do stuff …*/ }

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Conclusion, discussion points  PerfDMF can (or should) support your profile format (or trace analysis output)  The PerfExplorer framework is configurable, extensible, automated, reusable, available  Need more Jython scripts for repeatable analysis – how can we encode the processes you repeat often?  Need more rules to build the expert system – how can we encode the knowledge that you have?  What does PerfExplorer not support that you want?

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Support Acknowledgements  Department of Energy (DOE)  Office of Science  MICS, Argonne National Lab  ASC/NNSA  University of Utah ASC/NNSA Level 1  ASC/NNSA, Lawrence Livermore National Lab  Department of Defense (DoD)  HPC Modernization Office (HPCMO)  NSF Software and Tools for High-End Computing  Research Centre Juelich  Los Alamos National Laboratory  TU Dresden  ParaTools, Inc.

CScADS Integrating Knowledge, Automation, and Persistence with PerfExplorer and PerfDMF Acknowledgements  Prof. Allen D. Malony, Advisor  Dr. Sameer Shende, Director, Performance Research Lab  Alan Morris, Senior software engineer  Wyatt Spear, Software engineer  Scott Biersdorff, Software engineer  Dr. Matt Sottile, Research Faculty  Aroon Nataraj, Ph.D. student  Shangkar Mayanglambam, Ph.D. student  Brad Davidson, Systems administrator