Automatic response to outage
Scenario: PS005 - Operations automatically responds to a server outage
Main Description

Context

A server running a critical customer facing Web-based application runs out of system resources resulting in the application being unavailable. This problem has been occurring intermittently over the past few weeks and the underlying cause has yet to be determined. As an interim solution, it has been determined that the quickest way to restore service is to reboot the server when this happens. Operations has been asked to automate detection and recovery, and to minimize down time.

Description

Note: Only a few roles are shown in this scenario because the event is being handled autonomically. However, more roles would have been used if it were handled by people.

Steps Process and Activity Roles Work products Tools Tools
A server is providing a service.
Service Execution
Service Execution
Deliver Service
Deliver Service
Operations Analyst
IT Operations Analyst
Operational Monitoring Data
Operational Monitoring Data
The server begins to run out of system resources. Monitoring software determines that the server is no longer responding and generates an event.
Event Management
Event Management
Monitor, Detect, and Log Event
Monitor, Detect, and Log Event
 
Event
Event
Other events are considered to be of low priority at this time.
Event Management
Event Management
Examine and Filter Event
Filter Event
 
Event
Event
An automated response is identified for the server's condition.
Event Management
Event Management
Correlate, Escalate, and Process Event
Correlate Events and Select Response
 
Event
Event
An incident record is opened.
Incident Management
Incident Management
Detect and Record Incident
Identify and Log Incident
 
Incident
Incident
The automated response is carried out and the server is rebooted. The event management system determines that the server is working again.
Event Management
Event Management
Resolve Event
Resolve Event
 
Event
Event
The event is closed.
Event Management
Event Management
Close Event
Close Event
 
Event
Event
The incident record is closed.
Incident Management
Incident Management
Close Incident
Close Incident
 
Incident
Incident


The service continues unimpeded.
Service Execution
Service Execution
Deliver Service
Deliver Service
Operations Analyst
IT Operations Analyst
Operational Monitoring Data
Operational Monitoring Data
Same as step 1

Obtaining more information

To get more information, talk to a representative, purchase IBM® Service Management tools, or visit the IBM Service Management page.