Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve Lewis J.D. Edwards & Company

Similar presentations


Presentation on theme: "Steve Lewis J.D. Edwards & Company"— Presentation transcript:

1 Steve Lewis J.D. Edwards & Company
Implementing a Model for Service Level Management: A Practical Approach to Integrating Performance Tools Topics that will be Addressed: Why you might want to implement infrastructure management. What tools must be in place to do infrastructure management. How to manage several diverse types of systems, networks, and applications. Key design decisions you must make for infrastructure management. Real-life experiences & examples of infrastructure management. Steve Lewis J.D. Edwards & Company

2 Topics: Why manage/monitor your infrastructure?
What tools must be in place? Managing diverse systems, networks, and applications. Key design decisions. Implementation experiences, examples, and lessons.

3 Why do we need tools? Every IT organization wants to be known for its proactive monitoring and automated Service Level Management. If you know what system resources you have used in the past, you can better plan for the future. Re-active mode vs. Pro-active mode: operating off of a pager alert system vs. identifying potential problems before they happen. Quick notification gives a jump to the technical team who repairs the service. Knowledge base = better history on failures; a training tool for new team members. What is the cost to manage: Hardware Costs Software Costs Maintenance costs Other resources Facilities – building, cooling, electricity, access control People Cost avoidance – no addition to bottom line Do these costs offset the cost of lost productivity and “waste” of not managing? (waste being not utilizing systems)

4 What is the Cost to Manage?
Hardware, Software, & Maintenance fees. Facilities – building, cooling, electricity, access control, disaster recovery sites. People – design, operations, support. But what about . . . Cost avoidance – no addition to bottom line. Do these costs offset the cost of not managing? (Under- or Over-utilization, lost productivity, “waste”) “As we observe energy costs steadily rising, technology costs falling between percent per year, and storage demand increasing at over 60 percent annually, the key strategic question now becomes, ‘When will IT energy costs exceed the costs of IT hardware?’” (Fred Moore, CMG 2001 keynote address)

5 What can we gain? If you know what resources you have used in the past, you can better plan for the future. Re-active mode vs. Pro-active mode: operating from a pager vs. identifying potential problems before they happen. Quick notification gives a jump to the technical team who repairs the service. Knowledge base = better history on failures; a training tool for new team members.

6 How to Move in the Right Direction
Break down the task into sequential steps. Build Service Level Management step-by-step from the bottom up. Break down the task into sequential steps. Build Service Level Management step-by-step from the bottom up.

7 The Layers of Service Level Mgmt
Automated functionality built in layers according to their dependencies. Everyone in the organization should understand the model that is being constructed: They should be kept apprised of progress toward the goal. They can cooperate during the process of building the system. “Service Level Management is not a unique, isolated function. It is the culmination of all of the functions involved in providing the service.” (Rick Sturm)

8 #1 – Technical Infrastructure
In order for a specific service to be available, all of the technical components must exist: Network Devices & Communication Links Server Hardware & Operating Systems Application Software & Processes Each device must gather statistics on itself (using SNMP, WMI, syslog, flat files, etc.) This is where most $$$ and people are allocated! This is where most of our efforts and $$ are invested. Equipment, circuits, operations staff & technicians, engineers & provisioning specialists, redundant links, disaster recovery sites. This tends to get most of the focus and one tends to be in re-active (“firefighting”) mode all the time. Network, System, and Application Infrastructure

9 #2 – Fault Management Tools
A defined SERVICE may not be available if a network, system, or application component experiences a failure or poor performance. “Root Cause Correlation” identifies the exact point of failure in the event chain. Provides the tools that help operations staff be aware that the infrastructure is having a problem. Types of Fault Management tools: Commercial Off The Shelf software products: Network polling engines that know the network/device topology and can test to see if devices/links are alive. Home-Grown software products: Collecting data using native tools or custom programs tailored to the specific environment. Centralized vs. Decentralized monitoring/management Tools ??? Fault Management Tools Network, System, and Application Infrastructure

10 #3 – Information Management Tools
This should include tightly integrated tools: Problem Management Change Management Asset Management Information Management Tools The Fault Management system should report infrastructure events to a Problem Management system Change Management tool Asset Tracking Tool Fault Management Tools Network, System, and Application Infrastructure

11 Problem Management Tools
If an infrastructure event is detected by the Fault Management tools, it should be reported to the Problem Management System: Documenting (trouble ticket & knowledge base) Tracking (status update & workflow) Escalating (service response) Notifying (pager, , phone, PA system) Generating reports (mean time between failure) If an infrastructure event is detected by the Fault Management tools, it should be reported to the Problem Management System: Documenting (trouble ticket & knowledge base) Tracking (status update & workflow) Escalating (service response) Notifying (pager, , phone, PA system) Generating reports (mean time between failure) Problem Mgmt Change Mgmt Asset Mgmt

12 Change Management Tools
Change Management System: Schedule & approve changes to the infrastructure. Track routine maintenance tasks. The Problem Management tool can check with the Change Management tool to distinguish between “Planned Outages” & unexpected faults. Notification & reporting are handled differently for planned outages. Change Management System: Schedule & approve changes to the infrastructure. Track routine maintenance tasks. The Problem Management tool can check with the Change Management tool to distinguish between “Planned Outages” & unexpected faults. Notification & reporting are handled differently for planned outages. Problem Mgmt Change Mgmt Asset Mgmt

13 Asset Management Tools
Vital information on each technical component -- Asset Management System: Vendor & maintenance plan Serial number & location Lease expiration & asset owner Responsible support team by shift so the appropriate group is notified of an event. Some call this Configuration Management – can be confused with Software Configuration Control. We call it Asset Management, but extend the term to include specific information on connectivity and day-to-day responsibility for the device. Vital information on each technical component -- Asset Management System: Vendor & maintenance plan Serial number & location Lease expiration & asset owner Responsible support team by shift so the appropriate group is notified of an event. Problem Mgmt Change Mgmt Asset Mgmt

14 #4 – Performance Management Tools
Performance/Capacity Planning statistics. Resource utilization thresholds for proactive notification when thresholds are exceeded. Performance Management Tools Information Management Tools Capacity planning information helps determine where resources are being under-utilized or over-utilized. Performance thresholds can alert when resources are ABOUT to be exhausted, rather than waiting until something fails and the customer is impacted. Fault Management Tools Network, System, and Application Infrastructure

15 #5 – Service Level Policies
Technical components grouped into services. “Customer view” transaction monitoring. Service Level Policies Performance Management Tools Information Management Tools Group specific technology components into “services” and monitor as such. Correlate business requirements and user expectations. Measure the actual business impact of outages. Component failure vs. Degraded performance (acceptable or not) Generating and monitoring customer transactions that represent actual business tasks. Report business cost statistics from measured response times entered into the service policy definition. Identify which infrastructure components compose a “service” and be able to determine if a customer is impacted by the failure of one or more of these components. Set thresholds for acceptable performance: Availability Response Time Policy takes into account the planned outages vs. unplanned failures and calculates service levels accordingly. Fault Management Tools Network, System, and Application Infrastructure

16 #5 – Service Level Policies (continued)
Two ways to measure a service: Monitor each component in the “service chain” – BUT how do you synchronize the data from different monitoring tools? Generate synthetic transactions from an “end user” viewpoint – BUT how do you isolate troublesome components? Service Level Policies Group specific technology components into “services” and monitor as such. Correlate business requirements and user expectations. Measure the actual business impact of outages. Component failure vs. Degraded performance (acceptable or not) Generating and monitoring customer transactions that represent actual business tasks. Report business cost statistics from measured response times entered into the service policy definition. Identify which infrastructure components compose a “service” and be able to determine if a customer is impacted by the failure of one or more of these components. Set thresholds for acceptable performance: Availability Response Time Policy takes into account the planned outages vs. unplanned failures and calculates service levels accordingly.

17 #6 – Service Level Management
Automated reporting of SLA compliance. Service Level Management Service Level Policies Performance Management Tools Information Management Tools A crucial function is synchronizing or correlating all of the data from various tools in order to present a unified picture of service compliance. Beyond Service Level Compliance is the goal of Predictive Service Assurance, which involves sophisticated trend analysis. Fault Management Tools Network, System, and Application Infrastructure

18 #6 – Service Level Management (continued)
Service Level Management is not a unique, isolated function. It is the culmination of ALL the functions involved in providing the service. Rick Sturm

19 Difficulty of Service Level Management
Collecting the appropriate metrics. Automating the correlation of those metrics. Collecting the appropriate metrics. Automating the correlation of those metrics. Customer View Technology View

20 Design Decision #1 Reality: Decision:
The technical infrastructure is relatively dynamic, constantly changing, with little centralized control. Decision: Choose “Self-Configuring” Tools that detect and adjust to change automatically. Reality: The technical infrastructure is relatively dynamic, constantly changing, with little centralized control. Decision: Choose “Self-Configuring” Tools that detect and adjust to change automatically.

21 Design Decision #2 Reality: Decision:
Cannot afford the intensive administrative overhead required to maintain most tools. Decision: Choose “Zero-Admin” tools that automate or minimize administrative tasks. Reality: Cannot afford the intensive administrative overhead required to maintain most tools. Decision: Choose “Zero-Admin” tools that automate or minimize administrative tasks. Configuration Routine Maintenance Monitoring the tools themselves Generating customized reports

22 Design Decision #3 Reality: Decision:
Extensive software distribution, version control, and cost issues with agent-based tools. Decision: Choose “Agent-Less” tools for common metrics (collect with SNMP, WMI, syslog). Reality: Extensive software distribution, version control, and cost issues with agent-based tools. Decision: Choose “Agent-Less” tools for common metrics (collect with SNMP/WMI).

23 Design Decision #4 Reality: Decision:
Need a consolidated “single-pane-of-glass” view of performance and service level statistics. Decision: Choose “Web-Based” tools that offer security & customization per user. Reality: Need a consolidated “single-pane-of-glass” view of performance and service level statistics. Decision: Choose “Web-Based” tools that offer security & customization per user.

24 Design Decision #5 Decision:
Centralize to provide a single control point for security, event monitoring, administration, and report generation. Decision: Centralize to provide a single control point for security, event monitoring, administration, and report generation.

25 Constructing The System (part 1)
Fault Management Layer: HP OpenView NNM Adjusts to network configuration changes. Provides up/down status on connected devices. Does “root cause” correlation for events. Ability to define metrics for SNMP collection and database storage. Serves as SNMP trap destination for processing application-level events. Other Vendors: IBM (Tivoli) CA (UniCenter) CompuWare (EcoTools) Aprisma (Spectrum) Concord (Network Health) Cisco (CiscoWorks) Lucent (Vital Suite) Candle (Command Center)

26 Constructing The System (part 2)
Fault Management Layer: Magnum Technologies: COORDINATOR Provides “root cause” correlation for events. Updates its correlation engine when the OpenView topology changes. Contains an External Command Processor for parsing event messages, automatically opening trouble tickets, and sending notifications. Other Vendors: Opticom (iView) Brix Networks (Brix System) Taave Software (Event Watch) Micromuse (NetCool) InfoVista (www.infovista.com/products) Quallaby (www.quallaby.com/proviso.html)

27 Constructing The System (part 3)
Performance Management Layer: Magnum Technologies: CAPTREND Contains internal SNMP & WMI polling engines to collect basic performance metrics. Stores data for ad hoc reporting; generates several canned graphical reports. Ability to create performance thresholds that generate exception events for notification. Other Vendors: CompuWare (EcoScope) Empirix (Holistix product suite) Heroix (Robomon) HP ManageX (SNMP/WMI collection & reporting)

28 Constructing The System (part 4)
Performance Management Layer: BMC Software: Patrol Monitors application metrics at a detailed level. Ability to generate SNMP traps for application events which are sent to OpenView and COORDINATOR for processing. There are situations where a performance monitoring agent is appropriate. Metrics beyond the basics. Collection mechanisms other than SNMP or WMI. Other Vendors: CompuWare (Application Vantage) Quest Software (FogLight) Proxima Technologies (slaManage)

29 Constructing The System (part 5)
Performance Management Layer: Empirix: eMonitor & OneSight Generates web-based customer-oriented transactions (including https authentication). Ability to generate SNMP traps for response time threshold violations that are sent to OpenView and COORDINATOR for processing. There are situations where a performance monitoring agent is appropriate. Metrics beyond the basics. Collection mechanisms other than SNMP or WMI. Other Vendors: CompuWare (IntervalPro with QArun) Mercury Interactive (Topaz) Altaworks (Panorama) Covasoft (CovaOne)

30 Still-to-be-Accomplished
Integration of tools at the Information Management layer. Automated reporting from existing agent-based tools at the Performance Management layer. Tools to correlate technology components and define policies at the Service Level Policy layer. Integration of tools at the Information Management layer. Automated reporting from existing agent-based tools at the Performance Management layer. Tools to correlate technology components and define policies at the Service Level Policy layer.

31 Lessons Learned It always costs more MONEY and takes more TIME than expected. It is always more difficult than expected to INTEGRATE diverse tools. Key Success Factors: Management Commitment Business Process Improvement Customer Care Strategy Organizational Flexibility It always costs more MONEY and takes more TIME than expected. It is always more difficult than expected to INTEGRATE diverse tools. Key Success Factors: Management Commitment Business Process Improvement Customer Care Strategy Organizational Flexibility


Download ppt "Steve Lewis J.D. Edwards & Company"

Similar presentations


Ads by Google