asset mgmt is the core of refresh services provided by the resolver group and critical when configuration mgmt concepts are extended to workgroups. send the technicians out with a magnifying glass and ask that they document the serial number on every asset involved in the trouble ticket. the resolver group should be getting paid for asset and configuration management, you might as well grab the data when someone is touching the hardware rather than define and execute independent sampling processes.
helpdesk service levels are normally conceptualized as applying to a single trouble ticket. the supporting processes are usually well defined and the associated roles are assigned and owned by the outsourcer, the resolver group, or another supplier.
in reality, delivery capacity exerts a major influence on the ability of the resolver group to execute a service request. service level targets should be closely linked to ticket volumes and types.
Threshold implementation is a standard practice used to regulate the number of events that need to be managed. it is unusual for the gravity of these values seems to be re-evaluated by either party. They just get set when the monitoring tools are installed, and the service provider tweaks them behind the scenes. What possible availability thresholds exist? “service must be down for more than X minutes”? does it make sense to wait for that threshold before triggering a ticket? Why not launch the ticket immediately and recall it if necessary?
In theory, the data collection process and result repository should be reviewed periodically. I’ve never seen service level data on the service level management components. People don’t think about it and it adds a layer of overhead that people can’t afford. But it is appropriate in some situations, especially where multiple service providers are contributing to the same deal and one of them has the responsibility to develop overall service level reports.
Everyone should have confidence in the data. The monthly SLA report preparation cycle includes a period in which the various staff (client and providers) review the draft report. Most of the focus for these reviews is on understanding the data points that exceed the target or can trigger a penalty. The atmosphere of these discussions is defensive and essentially non-constructive as everyone scrambles to justify their performance.
These hassles could be moderated with a safety valve that takes issues off the metric review table that are not statistically relevant and pipes these to a Problem Mgmt process for further evaluation and improvement. Clients need to take the aberrations out of the metrics rationalization process and handle them in an escalation context. The end result will be more effective and the service level reports will get out on time.
Availability objectives must be clearly defined and communicated to the client. OK! Sounds good.
This requirement is normally satisfied by repeatedly pinging a key server box (or boxes). There’s not a lot of information there, at least we can conclude that the application was possibly up when the ping response was received.
A lightweight, programmatic test transaction would be more informative. These should be triggered from remote segments used by the key application end users, we get all the network dependencies that way. The only missing piece in the view of availability is the end user workstation.
How can the outsourcer require the service provider to actively monitor availability? A real time view of services would only yield a Boolean result. A real time average seems to miss the point. How just about relying on automated alerts (traditional monitoring) to identify and resolve issues?
The data collection methods should be available to the people who consume the service level metrics. I’ve seen an appendix that serves as a handy reference. If service level reports are distributed via a portal, a “metrics definition page” also helps remind people.
These definitions should also document the data collection intervals. Bonus points if the service level objectives recognize the business cycle, and tighten at critical times of the day, week, month, or quarter.
availability requirements usually change depending on the weekly, monthly or annual cycle. we notice this because most everyone does maintenance on saturday night, stuff needs to be greased and the filters need to be cleaned. its also a good time to do a few upgrades. don’t include this time in the service level calculation!
it might be possible to offer a piece of the standard service during the maintenance window, perhaps supporting the client’s original vision of overall availability, while encouraging the client to remember that maintenance is a good thing.
training for the service desk is an example of maintenance that has to happen and it favors the client. we can reduce production capacity temporarily for the benefit of the client, service provider, and the agent. the idea is to make the risk of missed service levels low.
the math discourages people from calculating availability all the way out to the end user. it seems impractical to insist that all the components in the chain support an end result of 95+%.
being careful about change scheduling and understanding the cycle of business critical transactions are ways to raise the probability of raising the effective availability of the service.
a new release, as a collection of enhancements or bug fixes, is expected to improve overall functionality and service effectiveness. change managers and release planners should be able to anticipate any negative effects, but the communication about those issues to end users or the helpdesk usually does not happen.
at minimum, problem managers need a heads-up to give them a chance to write some sort of release note to give the helpdesk a hint as to the expected impact.