Context
Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.
You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:
Details
IBM® Tivoli® System Automation family of products provide the corrective action necessary to restore normal operations
for applications, middleware and hardware resources after an incident such as a temporary loss of power, which would
otherwise cause unplanned downtime.
The IBM Tivoli System Automation family of products includes:
-
IBM Tivoli System Automation for z/OS provides high availability and disaster recovery capabilities for z/OS
systems including Sysplex clusters through policy-based automation
-
IBM Tivoli AF/OPERATOR provides high availability for z/OS systems through REXX-based automation
-
IBM Tivoli AF/REMOTE provides secure outboard automation and console consolidation for disparate operating systems
-
IBM Tivoli System Automation for Multiplatforms Base Component provides availability and disaster recovery
automation (stop, start, move/failover, restart in place) for Linux®, Linux for zSeries and and AIX systems through
policy-based automation
-
IBM Tivoli System Automation for Multiplatforms End-to-End Component provides coordinated, cross-cluster/resource
automation and high availability for multi-tiered applications
There are several key activities required to fully automate your Incident Management process including: Detecting and
recording incidents by monitoring various system resources; classifying and providing initial support; investigating
and diagnosing; resolving the incident to recover service, and finally closing the incident.
The System Automation family of products discovers system, application and resource incidents in a single system or
cluster, and using sophisticated knowledge about application components and their relationships along with availability
goals, determines corrective actions within the right context. For example, in the case of a temporary loss of power
that causes a primary server to shutdown, System Automation will determine if it is possible to restart the application
in place, and if not the application will be restarted on a backup server in order to prevent unplanned downtime.
Even more advanced and proactive performance-based automated event resolution can be achieved by using deep-dive
monitoring and diagnostic products, including IBM Tivoli OMEGAMON® XE, IBM Tivoli Monitoring and IBM Tivoli Composite
Application Manager, to invoke automated responses by the System Automation family of products. For example, consider a
system where IBM DB2 is running in a z/OS Parallel Sysplex environment. The operator is able to easily customize System
Automation for z/OS to create polices defining performance thresholds, for example time limits on running DB2 threads,
along with defining a list of jobs that should be cancelled when the condition occurs. In this example, System
Automation for z/OS queries OMEGAMON XE for DB2 for performance metrics, and compares operating conditions against its
policies, taking action when thresholds are breached. If an exception is generated due to a long running DB2 thread,
System Automation for z/OS determines the address space and the job name causing the exception, interrogates its policy
definitions to determine if the job can be cancelled, and cancels the job if allowed to do so.
The NMC console for IBM Tivoli System Automation for z/OS provides intuitive display, management and control of
monitored resources and their health status, including their relationships and dependencies.
For More Information
For more information about this tool, click on the link for this tool at the top of this page.
|