Tool Mentor: ITM - Deliver Service
TM046 - How to Use IBM Tivoli Monitoring to Monitor Service Operation
Tool: IBM Tivoli Monitoring
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

IBM® Tivoli® Monitoring applies preconfigured best practices to the automated monitoring of essential system resources. The application detects bottlenecks and other potential problems and provides for the automatic recovery from critical situations, which eliminates the need for system administrators to manually scan through extensive performance data. The application also integrates seamlessly with other Tivoli availability solutions, including Tivoli Business Systems Manager and Tivoli Enterprise Console.

Most features of Tivoli Monitoring can be used as supplied or can be modified manually using the graphical user interfaces (GUIs) or the command line interface (CLI) provided.

The main features of Tivoli Monitoring are:

  • An off-the-shelf solution for monitoring Windows®, UNIX®, Linux®, and OS/400 systems. Data collection and problem analysis is performed locally on the system.
  • Pre-defined workspaces in the Tivoli Enterprise Portal that provide detailed monitoring information on resources such as process, disk, and memory.
  • Situations can be added in the Enterprise Portal, which can be distributed to multiple systems simultaneously.
  • The ability to modify situations by changing, for example, threshold levels to match specific requirements.
  • The ability to view both real-time and historical data for any system from a centralized monitoring application
  • The ability to send the results from the data collection and analysis to the Tivoli Enterprise Console or to the Tivoli Business Systems Manager.
  • The ability to specify automatic corrective or preventive actions to resolve situations that could develop into real problems.
  • A scheduling feature that enables monitoring to take place at user-specified times.
  • A heartbeat function, running at gateways, that regularly checks the availability and status of attached endpoints and makes the information available to the Tivoli Enterprise Console server, Tivoli Business Systems Manager, or Tivoli Monitoring Notice Group.

Understanding Tivoli Monitoring

IBM Tivoli Monitoring overview
IBM Tivoli Monitoring is the base software for the Monitoring Agent for Windows OS. IBM Tivoli Monitoring provides a way to monitor the availability and performance of all the systems in your enterprise from one or several designated workstations. It also provides useful historical data that you can use to track trends and to troubleshoot system problems. You can use IBM Tivoli Monitoring to do the following:
  • Monitor for alerts on the systems that you are managing by using predefined situations or custom situations.
  • Establish your own performance thresholds.
  • Trace the causes leading to an alert.
  • Gather comprehensive data about system conditions.
  • Use policies to perform actions, schedule work, and automate manual tasks.

The Tivoli Enterprise Portal is the interface for IBM Tivoli Monitoring products. By providing a consolidated view of your environment, the Tivoli Enterprise Portal permits you to monitor and resolve performance issues throughout the enterprise.

This section explains the main concepts behind the product.

Resources

Tivoli Monitoring monitors resources at distributed systems. In this context, a resource is anything that affects the operation of a computer system and includes physical and logical disks, CPUs, memory, printers, as well as the processes and services running, such as LanMan, the Windows event log, the UNIX (logging system daemon) syslogd, and TCP/IP.

Situations

Tivoli Monitoring uses out-of-the box, predefined situations to specify which resource data are accessed from the system at runtime and how this data is processed. For example, the Process situation obtains data related to processes running on the system. Performance data is automatically collected by the situation and processed by an appropriate algorithm to determine whether or not the system is performing to your expectations. Generally, you can use the situation default values and still obtain useful data. However, if necessary, you can customize the resource models to suit your requirements.

Cycles

When a situation is run at an endpoint, it gathers data at regular intervals, known as cycles; the duration of a cycle is the cycle time . A situation with a cycle time of 60 seconds gathers information every 60 seconds. The data collected is a snapshot of the status of the resources specified in situation. Each of the supplied situations has a default cycle time, which you can modify as required.

Thresholds

Each situation defines one or more thresholds. A threshold is a named property of the resource with a default value that you can modify in the customization phase. Typically, the value specified for a threshold represents a significant reference level of a performance-related entity, which, if exceeded or not reached, a system administrator might want to know about. However, some thresholds are used as reference values to limit the scope of the resource model.

Parameters

Some situations have one or more parameters. Each parameter can take the form of a list of strings, a list of numeric values, a Boolean list of predetermined values from which you can make any combination of selections, or a choice list of mutually exclusive alternatives.

Indications

Each situation generates an indication if certain conditions implied by the situation's thresholds are not satisfied in a given cycle. Each situation has its own algorithm to determine which combinations of thresholds should generate an indication. Indications might be generated in any one of the following circumstances:

  • A single threshold is exceeded. For example, in the Windows Process situation, the Process High CPU indication is generated when the High CPU Usage threshold is exceeded (for any process that has a nonzero process ID).
  • A combination of two or more thresholds are exceeded. For example, in the Windows Logical Disk situation a High Read Bytes per Second indication is generated when both the following thresholds are exceeded: - The amount of bytes transferred per second (being written or read) exceeds the High Bytes per Second threshold - The percent of time that the selected disk drive spends for read or write requests exceeds the High Percent Usage threshold.
  • A combination of other factors has changed. For example, in the Windows Process situation the Process Handle Leak indication is generated when a process is leaking handles. There is no threshold for this indication. The situation compares the number of handles of the five processes with the most handles in consecutive cycles. If the number of handles has increased, the indication is generated.

For More Information

For more information about this tool, click on the link for this tool at the top of this page.