TM System-level Performance Management Ken McDonell Engineering Manager, CSBU

Slides:



Advertisements
Similar presentations
2 Introduction A central issue in supporting interoperability is achieving type compatibility. Type compatibility allows (a) entities developed by various.
Advertisements

GridPP7 – June 30 – July 2, 2003 – Fabric monitoring– n° 1 Fabric monitoring for LCG-1 in the CERN Computer Center Jan van Eldik CERN-IT/FIO/SM 7 th GridPP.
Heroix Longitude - multiplatform, automated application performance monitoring and management software.
Distributed Data Processing
CACORE TOOLS FEATURES. caCORE SDK Features caCORE Workbench Plugin EA/ArgoUML Plug-in development Integrated support of semantic integration in the plugin.
Introduction to Systems Management Server 2003 Tyler S. Farmer Sr. Technology Specialist II Education Solutions Group Microsoft Corporation.
1 Planetary Network Testbed Larry Peterson Princeton University.
© 2014 Cognizant 4 th March 2015 MBaaS: Mobile Backend as a Service Pablo Gutiérrez / Senior Mobility developer.
1 Software …an issue that could have taken two days to detect can now be fixed remotely, ensuring minimal downtime and maximizing operational efficiencies.
What’s New in BMC ProactiveNet 9.5?
Tunis, Tunisia, 28 April 2014 Business Values of Virtualization Mounir Ferjani, Senior Product Manager, Huawei Technologies 2.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
1 SWE Introduction to Software Engineering Lecture 3 Introduction to Software Engineering.
Introduction to Network Administration. Objectives.
1 GENI: Global Environment for Network Innovations Jennifer Rexford On behalf of Allison Mankin (NSF)
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Software Overview.
Adaptive Server Farms for the Data Center Contact: Ron Sheen Fujitsu Siemens Computers, Inc Sever Blade Summit, Getting the.
Appliance Firewalls A Technology Review By: Brent Huston T h e B l a c k H a t B r i e f i n g s July 7-8, 1999 Las Vegas.
TM Herding Penguins with Performance Co-Pilot Ken McDonell Performance Tools Group SGI, Melbourne.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Bologna Aprile Atempo Product Suite Atempo Time Navigator™ Secure, highly scalable protection of heterogeneous data in complex, mission-critical.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
BMC Software confidential. BMC Performance Manager Will Brown.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
Management Suite for Dell Servers The Power of Control! Kevin Winert, Dell OpenManage Product Marketing Bryan Rhodes, Dell Alliance Product Manager, Altiris.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Enterprise PI - How do I manage all of this? Robert Raesemann J Jacksonville, FL.
Cluster Reliability Project ISIS Vanderbilt University.
Building an Agile Datacenter with Deployment Standards Jonathan Richey | Director of Development | Altiris Sam Rosenbalm | Director of Microsoft Alliance.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 4 v3.1 Module 6 Introduction to Network Administration.
DR Software: Essential Foundational Elements and Platform Components UCLA Smart Grid Energy Research Center (SMERC) Industry Partners Program (IPP) Meeting.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 1: Software and Software Engineering.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
System Level Performance Management with PCP Nathan Scott
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
This poster has been developed with support from the CATIIS project Program doctoral interregional și transnațional de excelență în domeniile “Calculatoare.
Chapter 8 Workflows of the Process Taken from Walker Royce’s textbook – Software Project Management plus a number of Personal Comments.
1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer.
Stanford GSB High Tech Club Tech 101 – Session 1 Introduction to Software, Distributed Architectures, and ASPs Presented by Shawn Carolan Former Manager.
© 2014 IBM Corporation Does your Cloud have a Silver Lining ? The adoption of Cloud in Grid Operations of Electric Distribution Utilities Kieran McLoughlin.
Visual Studio Team System overview Pierre Greborio Software Architect – PEWay Microsoft MVP – Solutions Architect.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
IPS Infrastructure Technological Overview of Work Done.
Upgrading the Platform - How to Get There!
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
GO-ESSP The Earth System Grid The Challenges of Building Web Client Geo-Spatial Applications Eric Nienhouse NCAR.
Next Generation of Apache Hadoop MapReduce Owen
Online Software November 10, 2009 Infrastructure Overview Luciano Orsini, Roland Moser Invited Talk at SuperB ETD-Online Status Review.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
SAFARI TEST AUTOMATION: NAVIGATING THROUGH THE JUNGLE BY KARAN KUMAR AND JAMES CHUONG.
HUAWEI TECHNOLOGIES CO., LTD. Huawei Storage ISM Management Pre-sales Product Training Materials Easy and Efficient WEU IT Solution Team.
GridOS: Operating System Services for Grid Architectures
The Role of Reflection in Next Generation Middleware
IS301 – Software Engineering V:
Campus Monitoring Service
Management of Virtual Execution Environments 3 June 2008
Dev Test on Windows Azure Solution in a Box
Designed for powerful live monitoring of larger installations
CCNA 4 v3.1 Module 6 Introduction to Network Administration
Presentation transcript:

TM System-level Performance Management Ken McDonell Engineering Manager, CSBU

Overview Status quo for system-level performance monitoring and management in Linux. Factors conspiring to change this. Features of a desirable solution. Porting considerations. Support for distributed processing environments.

Influence of Linux Philosophies Anti-bloat mantra … available instrumentation is very sparse. 1-2p design center … many hard problems are off the radar screen. Developer-centric view leads to terse tools … and making them more like sar is not innovative. /proc/stat model is both good and bad. Bias towards running tools on system under investigation.

Challenges to the Status Quo Linux deployment on larger platforms. Linux deployment in production environments. Cluster and federated server configurations. More complex application architectures. Focus shift from kernel performance: –applications performance is key –quality of service matters –systems-level performance mgmt

Large Systems Influences There may be a lot of data, e.g. for a large (128p) server metrics and 30,000+ values from the platform & O/S. Data comes from the hardware, the operating system, the service layers, the libraries and the applications. Clustered and distributed architectures compound the difficulties. All of the data is needed at some time, but only a small part is needed for each specific problem.

Production Environment Influences Something is broken all of the time. Cyclic patterns of workload and demand. Transients are common. Service-level agreements are written in terms of performance as seen by an end- user. Environmental evolution changes the assumptions, rules and bottlenecks, e.g. upgrades, workload, filesystem age, re- organization.

Neanderthal Approaches Making the Problem Harder Tool and data islands: ownership, functional, temporal and geographic domains. Primitive filtering and information presentation. Protocols and UIs that are not scalable. Emphasis on tools rather than toolkits. Very little automated monitoring that is useful for the hard problems.

Features of a Desirable Export Infrastructure Low overhead and small perturbation. Unified API for all performance data. Extensible (plug-in) architecture to accommodate new sources of performance data. Sufficient metadata to allow evolution and change. Support for remote access to performance data. Platform neutral protocols & data formats.

Plug-in Collector and Client- Server Architecture

Features of a Desirable Performance Tool Environment Complement, not displace, simple tools. The same tools for both real-time and retrospective analysis. Visualization and drill-down user navigation. Remote and multi-host monitoring. Toolkits not tools. Smarter reasoning about performance data.

2-D Performance Visualization

3-D Performance Visualization

3-D Visualization of Platform Performance

3-D Visualization of Application Performance

Reasoning About Performance Data Thresholds are not enough Need quantification predicates: existential, universal, percentile, temporal, instantial. Multi-source predicates for client-server and distributed applications. Retrospection is essential. Customized alarms and notification.

Performance Co-Pilot Porting History Initial development for IRIX 1994 Linux experiments HP/UX port 1998 NT port Linux port

Performance Co-Pilot Porting Some things that did not help For efficiency and historical reasons we’d chosen to avoid xdr and SNMP. HP/UX secrets. Lack of instrumentation in the Linux kernel. Tool frameworks used for IRIX development are not universally available, e.g. Motif, ViewKit, OpenInventor, XRT.

Performance Co-Pilot Porting Some things that did help Programmer discipline. Obsessive attitude to automated QA. Orthogonal functionality, especially for APIs. Monitoring tools that are predominantly shell scripts in front of a small number of generic applications (the “toolkit” approach).

A Linux Performance Monitoring Architecture Linux kernel linuxpmda procfs and /proc/stat pmcd

A Beowulf Perf Monitoring Architecture - Node View Linux kernel linuxpmda procfs and /proc/stat pmcd beowulfpmda cluster infrastructure

A Beowulf Perf Monitoring Architecture - Application View Linux kernel linuxpmda procfs and /proc/stat pmcd beowulfpmda cluster infrastructure my application mypmda

A Beowulf Perf Monitoring Architecture - Cluster View pmcd Node C monitor pmcd Node A pmcd Node B

Some Concluding Comments System-level performance management for large systems is a hard problem. Simple solutions do not exist. Need an extensible collection architecture Monitoring tools should provide centralized control for distributed processing. Retrospection is not optional. Linux offers real opportunities for “better” solutions in this area.