Tool Mentor: TNO - Correlate Events and Select Response
TM139 - How to Use IBM Tivoli Netcool OMNIbus to Correlate Events and Select Response
Tool: IBM Tivoli Netcool OMNIbus
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

IBM® Tivoli® Netcool leverages a distributed, multi-layer approach to event correlation rather than a single monolithic correlation engine. This approach allows Netcool to provide an out-of-the-box solution that is capable of reducing the vast flood of events that must be managed in today's increasingly complex Service infrastructures. Correlation is performed by multiple components of the Netcool suite to reduce the amount of 'noise' generated in complex infrastructures, eliminating redundant or unnecessary data and allowing operators to focus solely on the information required to resolve problems quickly.

Distribution of the intensive correlation process spanning the event layer to the service layer provides increased scalability by enabling asynchronous processing of different event types routing only relevant events to the various processes while ensuring that events that can quickly be identified as Critical are processed or escalated rapidly.

The applied correlation methods of the IBM Tivoli Netcool suite are based upon the discovered topology as well as collected events. Correlation methods include event reduction and device-level correlation, topology-based correlation (a.k.a. root cause analysis), policy-based correlation, service correlation and several other methods.

Netcool/OMNIbus supports the creation of causal based correlation rules at several different levels. At the collection layer described in tool TNO - Detect and Log Event, functions within the probe and monitor rules file language allow for the use of global variables which can be used as the basis for de-duplication and basic correlation rules. The flexibility in the probe rules language is used in many cases to classify events at the entry point into the management system.

For more complex correlations the automation sub-system within the Netcool/OMNIbus event engine (ObjectServer) is used. The automation sub-system enables correlation of multiple events, the updating of existing events (i.e., modify the severity of existing events based on new events), and the insertion of new events (i.e., suppress 10 existing symptomatic events and insert a new summary or root cause or service affected event). In addition, external actions, such as email or paging, can be automatically initiated by the automation sub-system as part of the escalation process. For example, if an operator does not respond to an Alarm after a specified period of time, the ObjectServer Triggers may be used to escalate the issue by for example, increasing the severity of the event, cause the event to flash within the user display, send an e-mail or page an alternate contact.

De-duplication of collected events and the state-based correlation of related problem and resolution events is handled automatically by the ObjectServer.

The following diagram shows the consolidation of four ping failure events for the same managed object by de-duplication into a single event giving the First and Last Occurrence times and a Count of the number of occurrences in that time frame. Note that the Event Severities are set and displayed by default according to the ISO standards of six severities ranging from 0=Clear=green to 5=Critical=red

Diagram shows consolidation of four ping failure events for the same managed object by de-duplication into a single event

The resulting event shown here in Operator Desktop view is then available for further automation and/or Operator intervention.

Netcool provides automated, out-of-the-box, state-based correlation at the object level (e.g., if a 'link down' event is received for a router interface which then corrects itself and generates a subsequent 'link up' event, the system correlates the two and clears the original 'link down' event). During the collection process Netcool probes and monitors analyze the incoming events and classify them as problem or resolution events. Once inserted into the ObjectServer, a series of automations provide the correlation needed for problem and resolution events to be properly associated and cleared as appropriate removing the need for manual correlation and resolution by an Operator.

This diagram shows the visual representation of the pairing and clearing of related problem and resolution events.

A visual representation of pairing and clearing related problems and resolution events

Temporal automations may be applied to manage related events where for example the existence or resolution of a problem may vary according to an event sequence including the absence or occurrence of a related event.

This diagram shows an example of key data fields in probe rules and in automations that identify the Link Down root cause and the symptomatic interface or circuit alarms that might be suppressed.

Key data fields set in probe rules and automations to identify the Link Down root cause and the symptomatic interface or circuit alarms being suppressed

The events held within the ObjectServer are available at all times to the user interface.  Using both the OMNIbus native Desktop client and Netcool Webtop, tools which are linked to configurable filters and views can be applied to enable users to react and work with Events in real time.

The state-based correlation capabilities and native User Interface are further described in tool TNO - Filter Event. The advanced integration of Netcool/Precision with Netcool/OMNIbus provides automated topology-based correlation which enables rapid identification of root cause events and suppression of downstream 'symptom' events, i.e., suppress the sympathetic events that occur when elements downstream from a known problem are unreachable.

Netcool/Impact Policy-based Correlation and Enrichment is tightly integrated to OMNIbus providing advanced correlation and event enrichment policies that leverage information stored in external systems. For example, an Impact policy which receives critical events from the ObjectServer and then, based on the device hostname, checks an Oracle® database to see if the device is currently within a maintenance window can easily be created; this policy might suppress the event if it occurs within a maintenance window or escalate it if the device is related to an especially critical service.

Events correlated by Netcool/Precision and Netcool/Impact are returned to Netcool/OMNIbus for update to the central record. Root cause and Service affecting events are highlighted in the Desktop, while symptom events may be suppressed from operator views.

The distributed power of probe rules and ObjectServer Triggers coupled with external advanced correlation tools provide both a high level of automated event processing and tools supporting Operator activity to prioritize and manage the Alarms that are Business affecting.

OMNIbus can be integrated with a range of Incident Management solutions to provide for escalation or management of events. Refer to tool mentor TNO - Identify and Log Incident for additional details. In addition it is possible to use the Operator Desktop Event List to manage some of these activities. Refer to the tool mentor TNO - Investigate and Diagnose Incident , for further details on working with the Desktop.

For More Information

For more information about this tool, click on the link for this tool at the top of this page.