Tool Mentor: CAM RTT - Classify Incident and Provide Initial Support

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Incident Management
- Classify Incident and Provide Initial Support

Details

When an incident occurs, it's important to quickly classify it, so that it may be resolved, in order to restore normal operation to the business. Where the incident is not a service request, such as simply resetting a password, or providing documentation, it must be classified according to its type, status, effect on Service Level Agreements (SLA)s and the priority of the service to the business.

IBM® Tivoli® Composite Application Manager for Response Time Tracking allows you to measure the performance and availability of your user transactions. Transaction-based measurement, as opposed to resource-based measurement, aligns IT with the business processes it supports. Transaction-based management measures performance and availability from an end-user perspective, identifies the location of an issue and helps you fix it.

IBM Tivoli Composite Application Manager for Response Time Tracking helps you classify and diagnose a problem quickly by providing you with a decomposition-view of the transactions in your IT system. Visualizing transactions in a topology context helps you understand the flow of the transaction and pinpoint the problem. The node with the worst failure status is automatically expanded with path highlighting, both for aggregate and for instance data. For more information on navigating and understanding the topology report, refer to the Operations Guide.

The IBM Tivoli Composite Application Manager for Response Time Tracking dashboard provides you with a high-level summary of the transactions being monitored.

The Fast Path to Failure view on the dashboard identifies the root cause of the problem. Fast Path to Failure takes you to the worst performing node in your transaction topology for the selected transaction. You can get to the Fast Path to Failure feature from all levels of the dashboard. For more information on the Fast Path to Failure feature, refer to the Operations Guide.

Figure 1: Topology and root-cause: the J2C connection to IMS

For example, if the J2C connection to a mainframe IMS subsystem is the problem, that J2C implementation will be highlighted, to help you diagnose the problem more quickly.

IBM Tivoli Composite Application Manager for Response Time Tracking communicates any problem conditions - performance threshold violations and availability threshold violations using its event sub-system. The event views provide event annotation about the specifics of the problem encountered. Events can also be associated with actions such as e-mail notification and script execution - actions that are executed when a violation condition occurs. IBM Tivoli Composite Application Manager for Response Time Tracking also integrates with the IBM Tivoli Enterprise Console® to centralize event notification and action processing. If you prefer, the root cause events system can also take you straight to the cause of the problem.

Discovery policies allow you to automatically discover the transactions that are running within your application. This process helps you judiciously select monitoring candidates thereby measuring those transactions that are part of the end-user experience. Listening policies allow you to sample a percentage of your transactions in your live production environment. When an incident is reported, you can increase the sampling and also increase the level of monitoring to capture more contextual information that is relevant to the incident. The components that are categorized as listening include J2EE, Web Services, Generic ARM and Quality of Service.

For example, consider the shopping cart application of an online store. The incident was reported on the slow response time of the checkout process. The IT infrastructure consists of a browser-based Web client, a WebSphere® Application Server, and a DB2 database on the back-end. Before beginning to configure monitoring, install a Management Agent and then deploy the J2EE monitoring component on the machine with the Web application server.

First, select which transactions to monitor. To do so, create a discovery policy using the following steps:

Work with Discovery Policies -> Create a new J2EE Servlet discovery policy.
Configure the J2EE listener:

.* for the URI Filter and User Name (to discover all transactions).
100 Percent Sample Rate (to sample all transactions during discovery)

Create and Choose a Schedule with the following options

Assign Name/Description
Start as soon as possible
Run continuously
Run forever

Create Agent Group. Assign Name to discovery policy -> Click Finish

Next, create a listening policy to monitor any transactions of interest. We monitor the transaction that the incident is reported on as shown:

Work with Discovery Policies -> View Discovered Transactions
Perform a data roll up -> Select a transaction of interest
Create Listening Policy from -> Configure J2EE Listener
Use regular expressions to sample a set of related transactions
100 Percent Sample Rate (to sample all transactions responsible for slowdown)
Record Aggregate and Instance Data (to get detailed context data)
Configure J2EE Thresholds

Transaction status captures ARM return code
Performance captures response time for the transaction

Set the trace level detail to high
Choose a Schedule and Agent Group
Create a Policy Group: one that maps to your logical IT structure.
Assign Name/Description -> Click Finish

After the policy is distributed to the agent, and it runs, use the Dashboard, Fast Path to Failure and Component Events features to isolate and diagnose the incident.

For more details on how to configure monitoring using IBM Tivoli Composite Application Manager for Response Time Tracking, refer to the Administrator's Guide.

In summary, when a problem is encountered, you can use the topology view to quickly investigate and diagnose the problem, or have this done for you by the root cause event from TEC. This will enable you to resolve the incident as a known error or known problem.

For more information

For more information about this tool, click on the link for this tool at the top of this page.