- its always a good idea to start with the SERVICE LEVEL NAME.
- then attach the DEFINITION and identify the TYPE (KPI or SLO).
- define the method of measurement and the calculation to determine the VALUES.
- define the TARGET and the MINIMUM
- define any LIMITATIONS (this is where the statistics guru needs to sign-off)
- SERVICE WINDOWS (periods or conditions for which the SLO is enforced; think about CAPACITY and QUIET PERIODS)
“the effectiveness of the agreed services defined in attachment x.x is determined by describing the scale of two measurable values.” (Sounding good so far.) “the two values for the Service Level Objective adopted herewith are:” (The Legal guys were probably feeling very uncomfortable at this point.)
#1: the Target SLO represents the lowest value above which the service provider is considered to have fully met the service delivery requirements associated with that metric.
#2: the Minimum SLO is the highest value below which the service provider is considered to have fully missed the service delivery requirements. The outsourcer considers this performance level unacceptable and requires a remediation plan.
helpdesk service levels are normally conceptualized as applying to a single trouble ticket. the supporting processes are usually well defined and the associated roles are assigned and owned by the outsourcer, the resolver group, or another supplier.
in reality, delivery capacity exerts a major influence on the ability of the resolver group to execute a service request. service level targets should be closely linked to ticket volumes and types.
delays relating to resource problems will be at the top of weekly issue lists. check carefully for issues related to scope creep or convenient scope reduction.
credible weekly reports are invaluable sources of insight around systemic weaknesses. delivering the right kind of information (business driver linked) is the obvious objective, but the sources and collection methods should be constantly challenged as everthing should be assumed to have changed since last week.
availability requirements usually change depending on the weekly, monthly or annual cycle. we notice this because most everyone does maintenance on saturday night, stuff needs to be greased and the filters need to be cleaned. its also a good time to do a few upgrades. don’t include this time in the service level calculation!
it might be possible to offer a piece of the standard service during the maintenance window, perhaps supporting the client’s original vision of overall availability, while encouraging the client to remember that maintenance is a good thing.
training for the service desk is an example of maintenance that has to happen and it favors the client. we can reduce production capacity temporarily for the benefit of the client, service provider, and the agent. the idea is to make the risk of missed service levels low.
a pissed off workforce adds momentum to outsourcing discussions. awareness that employee attitude can influence information continuity issues moves executive team”discussions” into “negotiations” and some of the exposure is ultimately reduced.
two simple process areas bubble to the top of information continuity risk management: access control maintenance and separation of duties.
access control is not complete unless there is ongoing maintenance. the composition of outsourced workgroups is constantly changing; access to specific applications needs to be revoked if the agent’s role changes. KPIs? how about list review completion (%) by individual supervisors?
separation of duties is important in the application services space - developers should never have access to production systems.
a third, rarely seen, approach is to limit the access windows to certain assets. we think we are in an always-on world, but there are still many functions that only need to be executed at certain times.
“continuous improvement” clauses continue to get people crazy. there is never agreement on what kind of progress is expected or how to measure it.
I’m amazed that clients don’t require that some level of R&D spend be directly assigned to their delivery team. R&D can happen in the local environment to do some workgroup and/or process optimization or it can happen back at HQ as long as the delivery manager has some insight into the pipeline and a plan to integrate specific deliverables as they become available.
continuous improvement won’t happen without some focused effort; given appropriate visibility, clients can easily extrapolate specific efforts into client-perspective cost or performance improvements. all they have to do is insist on being part of that conversation.
we had gauge labs that housed all of the instruments used to verify inbound parts and to check tolerances during manufacturing. it was somewhat uncomfortable going to ask for an instrument because it meant interrupting a tech fine-tuning an instrument or replacing a component based on a calibration or maintenance schedule. calibration strickers recorded status.
The simple objective was to ensure valid results from measuring equipment. To do this calibration was performed and tracked. The instruments were protected against improper adjustment (the lab was restricted access) and protected against damage (transit cases and carts were used to move them to the mfg floor).
today, service levels drive much of the conversation with the client. the client is focused on *interpretation* of the numbers and, for whatever reason, we loose awareness of the tool configuration details that essentially represent changes in “calibration”. over time, the significance of the collected data can easily change if the service provider changes how the data is collected.
most frequently, data is skewed by changes in workgroups as new people come in with perhaps different language skills; common terms can be interpreted in many ways. the source of data can also change over time. the most pernicious examples occur when service mgmt tool throttles are adjusted to “drop” transactions that do not meet certain criteria. These actions should transit a transparent Change Mgmt process, but normally, they don’t.
time to own: the image is one of an end user drumming his fingers, waiting for a call-back. so the agent sucked the ticket into his queue. how does that help? just give the end users direct connect, they will be happier.
time to resolve: resolution from whose perspective? resolution of what?
percent resolved on the first contact: look at this as a “bad thing”, service outages that are resolved on the first contact indicate basic monitoring functions are not in place or the self-help facilities are not effective.
percent resolved by first level support: incent the organization to get the guy fixed, not to hold a poorly supported ticket in the wrong resource pool.
excessive metrics are an excuse for poor supervision and investment in training and knowledge management tools.