Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting.

Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting

Personal Introduction Architect for three response time measurement & analysis system projects (project manager for one) MACP Consulting, 2007 (Measurement, Analysis, Capacity & Performance) ~20 years commercial field, operating system internals; measurement & performance analysis; capacity planning/analytical & statistical modeling; databases Digital Equipment Corp (Digital, DEC); Compaq Computer Corp; Hewlett-Packard; early career @ Grumman Aerospace B.S. Statistics, Minor Computer Science, California Polytechnic State University 9/24/2008 MACP Consulting

Time We all have an intuitive feel for what time is, since we were children Are we there yet? Next, our parents and teachers made sure our intuitive notions graduated to the quantitative – Early measurement: Learning to read the clock and tell time; calculate elapsed time Time is a sequence of events, one after the other (Einstein?) MACP Consulting 9/24/2008

Response Time (Rt) Definition; Terminology; Viewpoint Time measured from the initiation of some action, event or request until completion of the action or event, or initial receipt of the response – Terminals – GUIs – Service time (St) is a subset of Response time (Rt) – Queue analysis Specifics depend on viewpoint – Where and What – Not philosophical – Viewpoint: The system or component(s) of interest – Where, what part of the system is to be examined, to be measured? In fact, what is the system? – Rt vs. Residence time in the literature – System vs. component of system – Lazowska (1984); Gunther (2000,2005); Menasce(1993) MACP Consulting 9/24/2008

Why Measure Response Time? THE Quality Measurement Primary business perception of IT SLAs Management bragging rights? 9/24/2008 MACP Consulting

Original Flavor, a.k.a. The Good Old Days: Measurement on Monolithic Systems (do they even exist anymore?) Users connected through ye olde character cell terminals – a.k.a. green screens Users transaction normally executed within context of a single system – 2008: Single system = operating system instance = image" 9/24/2008 MACP Consulting

Monolithic Instrumented terminal driver & interactive process User serial terminals Host

Distributed Architecture Response Time Client/Server 2-tier, to 3-, 4-tier, multi-tier distributed systems End-to-end Rt – Normally viewed from (business) user perspective – Rt of users web form entry; click corresponding to some business transaction; etc. – Total time from click or carriage return (initial request) to first character, packet, data item of the response – In other words, sum of time for all visits across architectural tiers by the users transaction – Can be measured at client, or just before or at the first tier of the infrastructure MACP Consulting 9/24/2008

Client Response Request Server Two-tier Client/Server : Hint of the Explosion (and Troubleshooting Difficulty) To Come…

Client End-to-End Rt Web App DB Four-Tier Web Architecture (Three-Tier Measured Rt) The Explosion Is Here!

Client End-to-End Rt Web App DB External System(s) Multi-tier Web

Issues With Multi-Tier Distributed Systems Sum of time for all visits across architectural tiers by the transaction (previous slide) If a transaction is slow, where is the slowdown occurring? Distributed systems not instrumented in an integrated fashion; i.e., no standards easily implemented (development impact) to provide this data to operations/performance/capacity planning Mythology pervades troubleshooting distributed environments due to lack of essential cross-tier Rt data Typical Approach: Guilt by Correlation – Look at each server (or all servers within a functional tier if one is lucky) – Visually correlate in time, high activity on server(s) in one tier with high activity with server(s) in the next tier MACP Consulting 9/24/2008

Clocks and Time Measurement for Multiple-tier Distributed Architectures Time synchronization across systems is critical Standard: Network Time Protocol (NTP) One to sub-second accuracy across systems on a LAN Storage subsystems often do not support NTP Specialized, high accuracy, non-distributed server-attached clocks – Cellular telephone tower clock signals – Global Positioning System (GPS) clock signals – Accuracy to ~tens of microseconds MACP Consulting 9/24/2008

Categories of Rt Instrumentation Active Host-based, on systems executing business applications Passive No software on host systems Hybrid Uses both techniques These definitions tend to be from a server-centric viewpoint A network-centric viewpoint of active vs. passive might be whether or not traffic is injected onto the network – Krishnamurthy (2001) Server-centric, since our goal is to provide a breakdown of a large end- to-end response time into each individual tiers response time The fact network and server response time components are measured is incidental to this goal MACP Consulting 9/24/2008

Active Rt Monitoring Techniques Host-based - Most common & familiar Synchronous sampling of event-driven Rt accumulators Asynchronous (event-driven; i.e., when specific event occurs) Web server Logging Rich data source often mined and written to multi-dimensional data warehouses for customer behavior pattern analysis Can be great source of distributed Rt data but requires custom development to process into usable data Middleware Logging Transaction processing, transaction reformat/redirect systems Also a very rich source, also needs custom development to process Application-level Logging Custom routines or standard Application Programming Interfaces (APIs) 9/24/2008 MACP Consulting

Active Rt Monitoring Techniques, contd Application Response time Measurement (ARM) API Standardization effort initiated by HP & Tivoli, adopted by Open Group 1999 CMG has a dated Q&A still useful as introductory info (ignore links) http://regions.cmg.org/regions/cmgarmw/armfaq.html Callable routines in C & Java for developers to instrument their code for collecting Rt data Current version ARM 4.0 v2 http://www.opengroup.org/management/arm/ Brief history: http://findarticles.com/p/articles/mi_m0EIN/is_1999_Jan_26/ai_53640469 ARM is a moderately successful standard SAS implements ARM in their products and exposes it through macros http://support.sas.com/rnd/scalability/tools/arm/armapi.html Siebel CRM implements ARM in its SARM logging levels, some of which can impact the server and hence are oriented toward debugging, not routine data collection 9/24/2008 MACP Consulting

Active Rt Monitoring Techniques, contd Middleware-dependent APIs Java Management Extensions (JMX) http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/ Java Virtual Machine (JVM) Bytecode Instrumentation Commercial products used to profile or analyze Java app performance The best way to peer into that little JVM black box, but I digress… Dynamically loaded at run-time Useful source of distributed Rt data Java method calls (methods) to remote systems – In this case, the method Rt is a distributed Rt! Methods making remote calls can be discovered via sorting method names by Rt – Not a bad way to go in situations where little is known about deeper levels of the application Make a map of remote call methods – Might be a way to filter further and characterize different transaction classes by a given remote method if the data regularly show high Rt variability 9/24/2008 MACP Consulting

Active Rt Monitoring Techniques, contd Synthetic Sampling – Injecting synthetic requests onto the network from PC-based robots and measuring response time of representative user transactions Similar idea to how Keynote Systems measures the top Internet web sites Has been a popular technique implemented by a number of large vendors for products that sample of end-to-end Rt measurements on corporate LANs/WANs – More and more customers are demanding that all their business transactions measured, not just a small sample of synthetic requests – Applications are typically intolerant of destructive (write) synthetic transactions (e.g., accounting). Workarounds usually circumvent quality of measurement (e.g., hitting the same dummy accounts) – Used in isolation, synthetic sampling can and will miss causes of long response times – If already in place, well-implemented synthetic sampling is a known load for passive monitoring until the latter can be used to more fully characterize Rt of the real business transaction load 9/24/2008 MACP Consulting

Active Rt Monitoring Techniques, contd Insertion of tags, markers, IDs into protocol headers Relatively recent on the commercial landscape Custom implementations exist in end-business development organizations Strong, forward-looking architects and management team seeing business benefit Commercial: Agent on host inserts tag into outbound protocol/message header Agent at next tier reads tag Logs either locally or centrally for post-processing Tracks transactions across tiers, calculates time spent on each tier The bouncing ball May measure all traffic but only some of the time, not always on Custom: Application instrumented to insert tags in its own requests and responses Local logging of raw data, post-processing on business system or central system, insertion into central DB with custom-built visualization and reporting software 9/24/2008 MACP Consulting

Difficulties With Some Active Techniques Logging and log processing require development resources – Developers may view instrumentation as another potential source of bugs Can be viewed as a delay factor in time-to-market Custom implementation requires strong architects and management to make the case and see it through in each development release Perceived to impact another limited resource: Testing cycles Logging levels (degree of detail, types of data logged) can be implemented to limit impact, but sometimes the logging level needed for useful data imposes significant resource utilization overhead for over-taxed servers, or worse, increases application service time (execution time) overhead, affecting throughput scalability 9/24/2008 MACP Consulting

Passive Rt Monitoring Techniques Recent technology-driven innovation & economics makes passive instrumentation possible – Processors; NICs; PCI-express I/O; Serial Attached SCSI (SAS) disks; open source software Deeper innovation makes passive instrumentation a reality – Multi-threaded, efficient software design in particular Passive monitoring may be commonly referred to as network sniffing, but this is misleading – the technology is far more capable and sophisticated than a network sniffer implies – This is real time processing of the complete traffic stream Widely deployed in network security monitoring – Though most solutions do not process the entire packet 9/24/2008 MACP Consulting

Passive Rt Monitoring Techniques, contd Passive Rt measurement techniques read all network packets at strategic points in the network Either part or all of each packet is processed – Answers the question, The response time of what? [component] Is it a technical item or a business transaction a manager would care about the response time of? The more of each packet processed: The more business value can be delivered – The business context of the transaction is typically at the deepest layer (see slides) Measured Rt of business-critical transactions (not only technical IT items); time series counts (throughput); per-transaction or per- transaction class resource profiles (network) The heavier the load on the probe – f(λ) (packet arrival rate) Some points in a network of multi-tier distributed systems can be real fire hoses! Major challenge for passive monitoring vendors who try to add value beyond IP, TCP or HTTP headers – do they report dropped packets? 9/24/2008 MACP Consulting

Passive Rt Monitoring Techniques, contd Beware of vendor-speak Know your terms and exactly what the client and server are at any point in a logical infrastructure when capabilities are being discussed Use diagrams and take your time to first understand what is being measured in your infrastructure by the vendors solution Reports, graphs, etc. come afterward Reporting adds tremendous value but understanding how the fundamental measurements relate to your infrastructure is crucial for determining whether it can solve your business & IT challenges Assumptions are often unspoken. Ask more than enough questions and get the answers necessary for everyone to be clearly on the same page Is all of the Rt data acquired passively? Are there points in the infrastructure where the response time measurement solution active, not passive? At what logging level; i.e., at what level of impact to the [business-critical] server? Trust but verify… Does the solution measure all traffic, all the time, or only part of the time? Anything less than all traffic is sampling, and can miss At what part of the packet does the passive solution stop reading? Does it stop at the TCP header and call the rest of it the application Some are skilled at leaving people with the impression or belief that their solution can do things that it in fact cannot do 9/24/2008 MACP Consulting

Client Users End-to-End Rt WebAppDatabase Passive Rt Monitoring Across Tiers (Logical Measurement Points In Between Tiers) Web tier RtWeb-App tier Rt App-DB tier Rt Complete end-to-end Rt measurement for remote clients may require single passive measurement at each client location or active client measurement software

Network-Centric View of Packet 9/24/2008 IP Packet Header TCP Message Header Ethernet Frame Application Telnet; FTP; SMTP; DNS; NNTP; HTTP… Ethernet Frame CRC MACP Consulting

Business/Performance/CP/Application-Centric View of Packet (Business context is often deep inside message body of last protocol) 9/24/2008 IP Packet Header TCP Message Header Ethernet Frame HTTP Header XML Message Header XML Message Body Ethernet Frame CRC IP Packet Header TCP Message Header Ethernet Frame Proprietary Middleware Message Header Proprietary Middleware Message Body Ethernet Frame CRC MACP Consulting

Passive Benefits Development of passive measurement solutions easily proceeds without impact to application development & test cycles – One-time scheduled downtime, connect taps or configure span ports – Everything afterward is, well,… Passive! – Taps electrically prevent probes from inadvertent writing into the production network path Quality measurements of any and all business transactions – Much additional business data can be filtered, persisted and reported Aid to testing & development: Visibility Aid to production: Visibility (knowledge), myth-busting quantitative performance data, improves time to problem resolution – And delicious capacity planning data… Detailed traces and very precise timing 9/24/2008 MACP Consulting

Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting.

Similar presentations

Presentation on theme: "Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting.

Similar presentations

Presentation on theme: "Instrumentation Strategies for Response Time Management of Distributed Systems Greg Rogers MACP Consulting."— Presentation transcript:

Similar presentations

About project

Feedback