Context
Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.
You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:
Details
IBM® Tivoli® Netcool leverages a distributed, multi-layer approach to event correlation rather than a single monolithic
correlation engine. This approach allows Netcool to provide an out-of-the-box solution that is capable of reducing the
vast flood of events that must be managed in today's increasingly complex Service infrastructures. Correlation is
performed by multiple components of the Netcool suite to reduce the amount of 'noise' generated in complex
infrastructures, eliminating redundant or unnecessary data and allowing operators to focus solely on the information
required to resolve problems quickly.
Distribution of the intensive correlation process spanning the event layer to the service layer provides increased
scalability by enabling asynchronous processing of different event types routing only relevant events to the various
processes while ensuring that events that can quickly be identified as Critical are processed or escalated rapidly.
The applied correlation methods of the IBM Tivoli Netcool suite are based upon the discovered topology as well as
collected events. Correlation methods include event reduction and device-level correlation, topology-based correlation
(a.k.a. root cause analysis), policy-based correlation, service correlation and several other methods.
Netcool/OMNIbus supports the creation of causal based correlation rules at several different levels. At the collection
layer described in tool TNO - Detect and Log Event, functions within the probe and monitor rules file
language allow for the use of global variables which can be used as the basis for de-duplication and basic correlation
rules. The flexibility in the probe rules language is used in many cases to classify events at the entry point into the
management system.
For more complex correlations the automation sub-system within the Netcool/OMNIbus event engine (ObjectServer) is used.
The automation sub-system enables correlation of multiple events, the updating of existing events (i.e., modify the
severity of existing events based on new events), and the insertion of new events (i.e., suppress 10 existing
symptomatic events and insert a new summary or root cause or service affected event). In addition, external actions,
such as email or paging, can be automatically initiated by the automation sub-system as part of the escalation process.
For example, if an operator does not respond to an Alarm after a specified period of time, the ObjectServer Triggers
may be used to escalate the issue by for example, increasing the severity of the event, cause the event to flash within
the user display, send an e-mail or page an alternate contact.
De-duplication of collected events and the state-based correlation of related problem and resolution events is handled
automatically by the ObjectServer.
The following diagram shows the consolidation of four ping failure events for the same managed object by de-duplication
into a single event giving the First and Last Occurrence times and a Count of the number of occurrences in that time
frame. Note that the Event Severities are set and displayed by default according to the ISO standards of six severities
ranging from 0=Clear=green to 5=Critical=red
The resulting event shown here in Operator Desktop view is then available for further automation and/or Operator
intervention.
Netcool provides automated, out-of-the-box, state-based correlation at the object level (e.g., if a 'link down' event
is received for a router interface which then corrects itself and generates a subsequent 'link up' event, the system
correlates the two and clears the original 'link down' event). During the collection process Netcool probes and
monitors analyze the incoming events and classify them as problem or resolution events. Once inserted into the
ObjectServer, a series of automations provide the correlation needed for problem and resolution events to be properly
associated and cleared as appropriate removing the need for manual correlation and resolution by an Operator.
This diagram shows the visual representation of the pairing and clearing of related problem and resolution events.
Temporal automations may be applied to manage related events where for example the existence or resolution of a problem
may vary according to an event sequence including the absence or occurrence of a related event.
This diagram shows an example of key data fields in probe rules and in automations that identify the Link Down root
cause and the symptomatic interface or circuit alarms that might be suppressed.
The events held within the ObjectServer are available at all times to the user interface. Using both the OMNIbus
native Desktop client and Netcool Webtop, tools which are linked to configurable filters and views can be applied to
enable users to react and work with Events in real time.
The state-based correlation capabilities and native User Interface are further described in tool TNO - Filter Event. The advanced integration of Netcool/Precision with Netcool/OMNIbus provides automated topology-based
correlation which enables rapid identification of root cause events and suppression of downstream 'symptom' events,
i.e., suppress the sympathetic events that occur when elements downstream from a known problem are unreachable.
Netcool/Impact Policy-based Correlation and Enrichment is tightly integrated to OMNIbus providing advanced correlation
and event enrichment policies that leverage information stored in external systems. For example, an Impact policy which
receives critical events from the ObjectServer and then, based on the device hostname, checks an Oracle® database to
see if the device is currently within a maintenance window can easily be created; this policy might suppress the event
if it occurs within a maintenance window or escalate it if the device is related to an especially critical service.
Events correlated by Netcool/Precision and Netcool/Impact are returned to Netcool/OMNIbus for update to the central
record. Root cause and Service affecting events are highlighted in the Desktop, while symptom events may be suppressed
from operator views.
The distributed power of probe rules and ObjectServer Triggers coupled with external advanced correlation tools provide
both a high level of automated event processing and tools supporting Operator activity to prioritize and manage the
Alarms that are Business affecting.
OMNIbus can be integrated with a range of Incident Management solutions to provide for escalation or management of
events. Refer to tool mentor TNO - Identify and Log Incident for additional details. In addition it is possible to use the Operator Desktop Event
List to manage some of these activities. Refer to the tool mentor TNO - Investigate and Diagnose Incident , for further details on working with the Desktop.
For More Information
For more information about this tool, click on the link for this tool at the top of this page.
|