Tool Mentor: TEC - Resolve Event
TM085 - How to Use IBM Tivoli Enterprise Console to Resolve Events
Tool: IBM Tivoli Enterprise Console
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

The IBM® Tivoli® Enterprise Console® is capable of taking actions to automatically recover from a failure or service compromise. It uses a component known as the task engine to manage this. The task engine runs programs, tasks, scripts, and commands initiated by rules. The task engine process monitors these running items and can return their exit statuses to the dispatch engine, which writes the statuses to the event database. The task engine runs these items as it receives requests to do so. It does not wait for a running item to complete before starting another one. The toolmentor How to Use TEC to Automatically Recover a Service discusses this process in more depth.

The following example from the Rule Builder's Guide shows how a task can be invoked from the rules to attempt a recovery action. All current manuals are located on the IBM Web site in the Tivoli Information Center at http://publib.boulder.ibm.com/infocenter/tivihelp/v3r1/index.jsp?toc=/com.ibm.itec.doc_3.9/toc.xml.

The following example shows how to run a task based on an event sent from IBM Tivoli Monitoring, which is monitoring an application instance of an MS SQL database. The filtering criteria for the rule is an event of class MSSQLDatabase_LogSpacePercentUsedDB with a severity of CRITICAL. The value for the collection attribute contains the resource type being monitored. The resource type and host name are instantiated in variables for use in the exec_task call.

rule: plain_rule1_42: (description:'ADSM incremental backup task', event: _ev1 of_class within ['MSSQLDatabase_LogSpacePercentUsedDB']
where [severity: _ev1_severity
collection: _ev1_collection,
hostname: _ev1_hostname
] ,
reception_action: action0: (
(exec_task(_ev1, 'ADSMIncBackup', '-l MSSQLManagerTasks -h \'@%s:%s\'',
[_ev1_collection,_ev1_hostname], 'YES'
) ) ) ).

The exec_task call resolves to the following command when an event is received for an MSSQLDatabase collection on host master@holon@holon:

wruntask -t ADSMIncBackup -l MSSQLManagerTasks -h @MSSQLDatabase:master@holon@holon -E

The event server can use rules to delay responses to an event. If responses are delayed for an event, event consoles are not updated and the event server does not issue an automatic response until the specified amount of time has elapsed. A delayed response might be preferable, for example, if you have a self-correcting problem that occasionally occurs on the network. This feature can prevent an operator from needlessly responding to a problem.

Whether the event is processed by automation for resolution, the event goes to a trouble-ticketing solution for resolution, or humans interact with the Tivoli Enterprise Console, there needs to be an escalation process that defines where, how, and under what circumstances events are escalated.

A rule can specify an action to be taken automatically in response to an incoming event. For example, if an event indicates that a router is down, the first response might be to attempt to restart the router and give an operator a low-severity notice. If the attempts to restart the router within a designated time period fail, a rule can specify that attempts to retry be cancelled and that a higher-severity notice be sent to an operator. An operator can monitor actions that are automatically performed for an event.

If an operator does not respond to an event after a specified period of time, the event server can take additional actions beyond displaying the event on an event console. For example, the event server can send an e-mail notice of the unacknowledged event to an operator. If the operator still does not acknowledge the event, the server can then perform actions, such as paging the operator or sending an e-mail notice to an alternate contact.

For More Information

For more information about this tool, click on the link for this tool at the top of this page.