Tool Mentor: TSM - Evaluate IT Service Continuity Management Performance
TM056 - How to Use IBM Tivoli Storage Manager File System Evaluate Service Continuity Management Performance
Tool: IBM Tivoli Storage Manager
Relationships
Main Description

Context

Tool mentors explain how a tool can perform tasks, which are part of ITUP processes and activities. The tasks are listed as Related Elements in the Relationships section.

You can see the details of how processes and activities are supported by this tool mentor, by clicking the links next to the icons:

Details

IT Service Continuity Management is concerned with managing the ability of an organization to continue to provide a pre-determined and agreed level of IT services to support the minimum business requirements, following an interruption to the business. Other ITUP mentors have discussed the creation and planning for IT service continuity along with preparation and execution of the process. This mentor discusses measuring the performance of the process.

IT service continuity performance management effectively helps you to include a feedback loop into your process, which allows you to make both coarse and fine adjustments as needed. The IT service continuity plan has already documented information on SLAs and QOS. The preparation, testing, and execution have demonstrated that recovery is possible. Now it is important in an ongoing manner to ensure that operations continue to run smoothly, to flag any issues that might arise, to provide recommendations on what to do in case issues arise, to determine and specify who is responsible for corrective action in the case of an issue, to ensure that the responsible party is notified to take action and to do it automatically to ensure that the process is consistent and repeatable. By identifying exceptions and trends, insights and lessons learned can be folded back into the process in a constant cycle of measurement and improvement.

The IT service continuity plan includes information on which computers are to be backed up along with their respective SLAs. It is important to determine who is responsible for ensuring that each storage resource is backed up. For example, in some organizations it might be the IBM® Tivoli® Storage Manager administrator who is responsible to ensure that a particular database is backed up and in that case if the backup fails the Tivoli Storage Manager administrator must take corrective action. In other organizations it might be the database owner who is responsible.

Tivoli Storage Manager supports queries so that an administrator can determine the status of the system. At any point in time, an administrator can issue ad-hoc queries to review status and metrics of the system and can then disseminate the information as needed. This method is a very manual way of handling the process.

Tivoli Storage Manager provides a feature called Tivoli Storage Manager operational reporting, which is specifically designed to automate this process. Operational reporting supports both the reporting and monitoring of Tivoli Storage Manager. Reports and monitors can be scheduled and they can be viewed interactively, on a Web site, or in e-mail where the subject line of the e-mail provides a status on whether a Tivoli Storage Manager server is running smoothly or if it has issues and needs attention.

Operational reporting is highly customizable and extensible, allowing existing rules and sections to be adjusted or removed and new sections and rules to be added. Multiple Tivoli Storage Manager servers are supported. Multiple reports and monitors can be run for a single Tivoli Storage Manager server and each report or monitor can be sent to multiple recipients. The difference between a report and monitor is that when a report runs, it will query the Tivoli Storage Manager server, compare the results to the rules, flag any issues, and provide customizable recommendations on how to resolve any issues. In its sections, it reports on a wide variety of metrics that can be used to track SLA conformance. When a report is scheduled to run, it will run and send information to the list of recipients regardless of whether there are any issues or not.

A monitor, on the other hand, is intended to run much more frequently. Monitors use the same rule-based mechanism as reports but they will notify recipients only if any rules are triggered, that is, if there are any issues. Monitors can also optionally and conditionally execute statements in a self-healing fashion. For example, if a tape drive goes offline, a rule can check for that and see that it is offline. The monitor can then list all drives along with their status, it can then issue a command to Tivoli Storage Manager to tell it to bring the drive back online, and can then re-list all the drives including status information. In this way, administrators can manage Tivoli Storage Manager by exception. In this case one or more administrators will automatically be notified that a drive went offline, they'll see that in a self-healing fashion the drive was brought back online, and they'll be able to see the list and status of the drives before and after the action was attempted.

Tivoli Storage Manager operational reporting also includes the ability to automatically notify node owners of failed or missed backups and provides customizable instructions telling the node owner how to make corrections and who to call if further help is needed. In support of managing responsibility, operational reporting can send notification to specific node owners with details of their backup operations. For the case where the Tivoli Storage Manager administrator is responsible for ensuring that a database application is backed up, they can be automatically notified if there is a problem and if the database owner themselves can be notified.

Operational reporting provides an efficient, automated, customizable, and repeatable method of measuring the state of Tivoli Storage Manager operations where rules can be configured to validate whether SLAs are being met. The resulting information can be used to determine which areas need improvement, and the information can be sent to the most appropriate set of people with the correct roles, responsibilities, and skills to address any issues as specified in the IT service continuity plan.

For more information, refer to: http://publib.boulder.ibm.com/infocenter/tivihelp/index.jsp?toc=/com.ibm.itstorage.doc/toc.xml

Search for:

  • "Operational Reporting"
  • "Scheduling"

For More Information

For more information about this tool, click on the link for this tool at the top of this page.