Tool Mentor: SA - Resolve Event
TM106 - How to Use IBM Tivoli System Automation to Resolve Events
Tool: IBM Tivoli System Automation
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

IBM® Tivoli® System Automation family of products provide the corrective action capability to restore normal operations for applications, middleware and hardware resources after an event such as a temporary loss of power, which would otherwise cause unplanned downtime.

The IBM Tivoli System Automation family of products includes:

  • IBM Tivoli System Automation for z/OS provides high availability and disaster recovery capabilities for z/OS systems including Sysplex clusters through policy-based automation
  • IBM Tivoli AF/OPERATOR provides high availability for z/OS systems through REXX-based automation
  • IBM Tivoli AF/REMOTE provides secure outboard automation and console consolidation for disparate operating systems
  • IBM Tivoli System Automation for Multiplatforms Base Component provides availability and disaster recovery automation (stop, start, move/failover, restart in place) for Linux®, Linux for zSeries and AIX systems through policy-based automation
  • IBM Tivoli System Automation for Multiplatforms End-to-End Component provides coordinated, cross-cluster/resource automation and high availability for multi-tiered applications

There are several key activities required to fully automate your Event Management process including: Detecting and logging events by monitoring various system resources; filtering the events to identify only those abnormal events for which a response is needed; correlating, escalating and processing events; and finally taking corrective action to resolve and close events.

The System Automation family of products is capable of identifying certain events affecting application, middleware and hardware resources, logging them and determining the appropriate automated corrective action in order to restore normal operations while preventing unplanned downtime. For example, in the case of a temporary loss of power that causes a primary server to shutdown, System Automation will determine if it is possible to restart the application in place, and if not the application will be restarted on a backup server in order to prevent unplanned downtime.

The System Automation family of products also sends event information to IBM Tivoli Enterprise Console®, which correlates events across systems, networks, databases and applications to help operators better understand event patterns and determine root cause of failure.

Even more advanced and proactive performance-based automated event resolution can be achieved by using deep-dive monitoring and diagnostic products, including IBM Tivoli OMEGAMON® XE, IBM Tivoli Monitoring and IBM Tivoli Composite Application Manager, to invoke automated responses by the System Automation family of products.

For example, consider a system where IBM WebSphere® Application Server is running in an AIX cluster and experiences a problem where it slows down and then eventually crashes. Tivoli Composite Application Manager for WebSphere is used to provide comprehensive memory heap analysis to determine where the memory leak occurs, and when a specified system performance threshold is breached, invokes Tivoli System Automation for Multiplatforms to restart the application server in place before the system experiences further degraded performance. By employing both Tivoli solutions in this case, the system is held at the most efficient operating level possible with the highest degree of safety against unplanned downtime.

Operations console for TSA

The Web-based operations console for Tivoli System Automation for Multiplatform Base component provides intuitive management and control of cluster-wide application resources and components.

For More Information

For more information about this tool, click on the link for this tool at the top of this page.