Tool Mentor: SA - Resolve Incident and Recover Service
TM107 - How to Use IBM Tivoli System Automation to Resolve Incident and Recover a Service
Tool: IBM Tivoli System Automation
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

IBM® Tivoli® System Automation family of products provide the corrective action necessary to restore normal operations for applications, middleware and hardware resources after an incident such as a temporary loss of power, which would otherwise cause unplanned downtime.

The IBM Tivoli System Automation family of products includes:

  • IBM Tivoli System Automation for z/OS provides high availability and disaster recovery capabilities for z/OS systems including Sysplex clusters through policy-based automation
  • IBM Tivoli AF/OPERATOR provides high availability for z/OS systems through REXX-based automation
  • IBM Tivoli AF/REMOTE provides secure outboard automation and console consolidation for disparate operating systems
  • IBM Tivoli System Automation for Multiplatforms Base Component provides availability and disaster recovery automation (stop, start, move/failover, restart in place) for Linux®, Linux for zSeries and and AIX systems through policy-based automation
  • IBM Tivoli System Automation for Multiplatforms End-to-End Component provides coordinated, cross-cluster/resource automation and high availability for multi-tiered applications

There are several key activities required to fully automate your Incident Management process including: Detecting and recording incidents by monitoring various system resources; classifying and providing initial support; investigating and diagnosing; resolving the incident to recover service, and finally closing the incident.

The System Automation family of products discovers system, application and resource incidents in a single system or cluster, and using sophisticated knowledge about application components and their relationships along with availability goals, determines corrective actions within the right context. For example, in the case of a temporary loss of power that causes a primary server to shutdown, System Automation will determine if it is possible to restart the application in place, and if not the application will be restarted on a backup server in order to prevent unplanned downtime.

Even more advanced and proactive performance-based automated event resolution can be achieved by using deep-dive monitoring and diagnostic products, including IBM Tivoli OMEGAMON® XE, IBM Tivoli Monitoring and IBM Tivoli Composite Application Manager, to invoke automated responses by the System Automation family of products. For example, consider a system where IBM DB2 is running in a z/OS Parallel Sysplex environment. The operator is able to easily customize System Automation for z/OS to create polices defining performance thresholds, for example time limits on running DB2 threads, along with defining a list of jobs that should be cancelled when the condition occurs. In this example, System Automation for z/OS queries OMEGAMON XE for DB2 for performance metrics, and compares operating conditions against its policies, taking action when thresholds are breached. If an exception is generated due to a long running DB2 thread, System Automation for z/OS determines the address space and the job name causing the exception, interrogates its policy definitions to determine if the job can be cancelled, and cancels the job if allowed to do so.

The NMC console for IBM Tivoli System Automation for z/OS

The NMC console for IBM Tivoli System Automation for z/OS provides intuitive display, management and control of monitored resources and their health status, including their relationships and dependencies.

For More Information

For more information about this tool, click on the link for this tool at the top of this page.