Sei sulla pagina 1di 16

Understanding

Software-as-a-Service and
Cloud Services SLAs

2013

Contents
Introduction ......................................................................................................... 2
What is an SLA? ................................................................................................... 2
What is Availability? .............................................................................................. 2
Understanding S-a-a-S and Cloud Services SLAs....................................................... 3
Appendix A: Contractual Walkthrough ..................................................................... 7
Appendix B: SLA Chart ........................................................................................ 13
Uptime SLA Detailed Chart ................................................................................ 13
Uptime SLA Summary Chart .............................................................................. 14
About Intreis ...................................................................................................... 15

Introduction
This document will cover common definitions related Service Level Agreements, the top
ten questions to ask yourself when reviewing S-a-a-S and Cloud SLA, and an actual
contract and SLA walk-though using the Amazon Web Services contract as an example.
Also included in this document are two SLA reference charts for your convenience,
located in Appendix B.

What is an SLA?
Abbreviated SLA, a Service Level Agreement is contract between a provider and the end
user which stipulates and commits the provider to a required level of service.
An SLA should contain:

A specified level of service


End-user Support Options
Enforcement or penalty provisions for services not provided
A guaranteed level of system performance as it relates to downtime or uptime
A specified level of customer support
What software or hardware will be provided and for what fee

Metrics that typical SLAs may specify include:

What percentage of the time services will be available


The number of users that can be served simultaneously
Specific performance benchmarks to which actual performance will be periodically
compared
The schedule for notification in advance of network changes that may affect users
Help desk response time for various classes of problems
Usage statistics that will be provided

What is Availability?
Availability is the time during which a device, such as a computer, or a service, such as
web server, is functioning or available for use. Availability depends on many factors
including software stability, hardware load, and infrastructure reliability. The table in
Appendix B illustrates some common availability metrics. For example, a "five-9s"
metric is available 99.999% of the time which is to say that the system or service is
down for just 5.25 minutes in a year. As a comparison, a 99.000% rating may be down
for a total of 87.6 hours in a year which is an average of about 1 hour and 40 minutes
per week.
2

Understanding S-a-a-S and Cloud Services SLAs


Service Level Agreements (SLAs) are one of the things that often confound consumers of
3rd party services like SaaS and Cloud. Many users will hear the 99% and think Oh
thats good.thats almost 100%. And 100% means its never down so 99% is almost
never down...I can live with that.
Some consumers are getting wise to the lingo and will eagerly latch onto marketing
language like We guarantee four 9s (99.99%) or even better, the elusive five 9s
(99.999%) availability SLA. So, weve learned that decimal points matter and the more
9s the better but that still doesnt mean you have a good SLA.
In the end it doesnt matter what the marketing slick or the website tells you. What
matters is what the contract says. While there is no boilerplate way to structure an SLA,
there are definitely things you should consider every time you enter into a contract with
a 3rd party service provider. So get ready to use your head and put pen to paper.
Here are the top 10 questions to ask to ask yourself as you review an SLA document, as
well as areas that deserve your special attention:
1. Does the document clearly state the type, quality, and quantity of services
I will receive?
Your SLA document should clearly state the services the supplier will provide as
well as levels of performance or quality in the scope of work section (e.g., calls to
the help desk will be answered on the second ring and either resolved or escalated
within a 12-minute time-frame).
2. Are the desired outcomes clearly defined and documented?
Clearly documenting what the outcome needs to be assists the service provider in
determining what needs to be done to provide higher-level services. Additionally,
it eliminates the problem of having the SLA "re-interpreted" if there are changes in
management. Make sure you get a clear definition of what suppliers mean when
they use phrases like 99% uptime or 99.99% closure of tickets or any other
similar terms.
3. Are you properly allocating levels of service based on a clear set of
priorities relative to what is important to you?
For example, if you have fifty servers out of which five must be up 24x7, you
would not want to pay for all fifty to be up 24x7 when only five require it. What
you want is 24x7 for five servers and some lower level of uptime for the balance of
your servers. The service request or services requested section of your agreement
3

should contain assigned severity and priority levels for each specific type of service
request.

4. Is the cost of achieving the higher performance level supported by the


improved outcome?
The "nice to haves" are often far more expensive than the benefits they generate.
Be wary of asking for levels of perfection that will result in inordinately more
expensive performance if perfection is not needed for success.

5. What is the formula for calculating the SLA


Make sure you look at the formula used to calculate the SLA. More importantly,
actually plug numbers into the formula and see what you get. The SLA should be
based on 7 x 24 x 365 (with no exclusions for weekends, holidays, off-hours,
geographies, etc.). Remember, SLAs are designed to work in the favor of the
provider; and when I say "favor" I mean financial favor. Trust me, after an eight
hour outage, you do not want to look down and see a $5.00 SLA credit in your
hands. Do the SLA math before you sign the contract, not during your first outage.

6. Are maintenance windows excluded?


Take maintenance windows seriously. Every 3rd party provider should have them,
but they should not be hiding behind them. For example: Its easy for a provider
to give you 99.999% uptime then follow up with Excluding regularly scheduled
maintenance windows and other non-scheduled emergency maintenance windows
as required. The provider has nothing to lose in this scenario, because every
outage will become a non-scheduled emergency maintenance window.
Also, beware of excessively large maintenance windows; I have seen providers
who have over 12 hours of maintenance window time per week. The windows
are usually on the weekends, but if you work in a global 24x7 business, those
weekend maintenance windows will be impacting business users. Add up the
total number of maintenance hours and ask yourselfcan your business really live
with that?

7. Is the SLA on application availability, core functionality, hardware, or


other infrastructure?
At the end of the day if you cant use the application to do your job, it might as
well be down. At a minimum make sure your SLA is for application availability. If
the application is core to your business I would consider pushing SLAs down to
the core functionality level. For example, if you have an accounting application and
the General Ledger (GL) part of the application is not working I would consider
that loss functionality an outage. Many SaaS providers will take the position that
the application is up in this case, and ask you to open a Service Ticket. However, if
youre trying to close the books at an end of a quarter, loss of GL is going to have
an impact similar to a full application outage.
As for hardware and infrastructure SLAs, theyre nice to have but let's make it
simple; as a business user the application is either up or down, the "why" behind it
is not important when you're trying to get your job done. Make sure your SLAs are
not limited to hardware and infrastructure, but also include application availability.

8. Force Majeure
Force majeure: a common clause in contracts that essentially frees both parties
from liability or obligation when an extraordinary event or circumstance beyond
the control of the parties, such as a war, strike, riot, crime, or an event described
by the legal term "act of God" (such as flooding, earthquake, or volcanic eruption),
prevents one or both parties from fulfilling their obligations under the
[2]
contract. However, force majeure is not intended to excuse negligence or
other malfeasance of a party, as where non-performance is caused by the usual
and natural consequences of external forces (for example, predicted rain stops an
outdoor event), or where the intervening circumstances are specifically
contemplated. For Example: a widespread power outage would not be a force
majeure excuse if the contract requires the provision of backup power or other
contingency plans for continuity. http://en.wikipedia.org/wiki/Force_majeure
Read every word in the Force Majeure clause. Then think about what it means to
you as a consumer of the SaaS/Cloud product. Force Majeure is where providers
will make a last ditch effort to abdicate all responsibility. I have actually
seen human error and hardware failure listed under force majeure.
5

9. Does the SLA fairly compensate the business?


Now that you have your SLA formula figured out, and maintenance windows
identified, do the math. If a perfect storm happened (and it will) and your
Software-as-a-Service went down and/or there were scheduled maintenance
windows during the busiest time of your year (end of quarter for instance), ask
yourself:

Can you afford to be down that long?


Will the SLA credit you receive be enough to compensate your business?

In most cases the answer will be no. Many providers will tell you they dont
negotiate SLAs, and in the case of click-through agreements there seems to be no
opportunity to negotiate at all. But, negotiation is always an option! The contract is
your first and best chance to put the relationship with your provider on the right
footing.

10.

So, whats a good SLA?

Only you as a business consumer of SaaS/Cloud can accurately determine that,


but to accurately determine the best SLA you have to combine the context of your
business requirements with the details and mechanics of how the SLA is
determined.

Appendix A: Contractual Walkthrough


This contractual walkthrough is using the standard on-line contract from Amazon web
services (AWS). This author fully supports SaaS and Cloud technologies and is in no
way discouraging the use of Amazon Web Service. The goal is to educated business
users on the risks involved with ANY Cloud offering and to encourage the informed
adoption of emerging technologies. The comments on Amazon's contract and SLAs are
solely this author's opinion and as always you should consult with legal counsel or
your legal department on any contractual matters.
I will be referencing the Amazon SLA agreement and calculating an SLA credit for a
multi-day outage. I will also be pointing out key wording (underlined sections) in the
contract that should send red flags up for readers concerned with SLAs. Author
comments and opinions will be in orange text.

For the purposes of doing the calculation I will be using Amazon's Large Standard OnDemand Instance, used for 750 hours on average per month. Total monthly charge
= $360/mo.

Amazon EC2 Service Level Agreement


This Amazon EC2 Service Level Agreement (SLA) is a policy governing the use of the
Amazon Elastic Compute Cloud (Amazon EC2) under the terms of the Amazon Web
Services Customer Agreement (the AWS Agreement) between Amazon Web Services,
LLC (AWS, us or we) and users of AWS services (you). This SLA applies
separately to each account using Amazon EC2. Unless otherwise provided herein, this
SLA is subject to the terms of the AWS Agreement and capitalized terms will have the
meaning specified in the AWS Agreement. We reserve the right to change the terms of
this SLA in accordance with the AWS Agreement.
7

Service Commitment
AWS will use commercially reasonable efforts to make Amazon EC2 available with
an Annual Uptime Percentage (defined below) of at least 99.95% during the Service
Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage
commitment, you will be eligible to receive a Service Credit as described below.
1. Personally, I hate the phrase commercially reasonable efforts, its like the
ultimate "get out of jail free card." If the vendor can show They did the best they
could under the circumstances you may not be getting a credit.
2. Annualthis is the word that will really haunt the business users. Most SLA are
calculated monthly but here it is 99.95% over a year, which allows an outage
which is measured in hours as opposed to minutes.
Definitions:

Service Year is the preceding 365 days from the date of an SLA claim
Annual Uptime Percentage is calculated by subtracting from 100% the
percentage of 5 minute periods during the Service Year in which Amazon EC2 was
in the state of Region Unavailable. If you have been using Amazon EC2 for less
than 365 days, your Service Year is still the preceding 365 days but any days prior
to your use of the service will be deemed to have had 100% Region
Availability. Any downtime occurring prior to a successful Service Credit claim
cannot be used for future claims. Annual Uptime Percentage measurements
exclude downtime resulting directly or indirectly from any Amazon EC2 SLA
Exclusion (defined below).
Region Unavailable and Region Unavailability means that more than one
Availability Zone in which you are running an instance, within the same Region, is
Unavailable to you.
Unavailable means that all of your running instances have no external
connectivity during a five minute period and you are unable to launch replacement
instances.
The Eligible Credit Period is a single month, and refers to the monthly billing
cycle in which the most recent Region Unavailable event included in the SLA claim
occurred.
A Service Credit is a dollar credit, calculated as set forth below, that we may
credit back to an eligible Amazon EC2 account.

1. Percentage of five minute periods. Interesting wording so, how do you calculate
the percentage of 5 minutes periods? Let's do some math:
1 year = 525,948.766 minutes
8

There are 105,189.75 five minute periods in a year


If youre down for 15 minutes (three five minute periods) that would be
0.0000285% percent of the five minute periods in a year.
100% - 0.0000285% = NO SLA Credit
So when would you get a credit? When the percentage of five minute periods
reaches >.05% which is around 60 five minute periods, or five hours. Note that
the contract says less than 99.95% so the actual SLA credits start to kick in at
99.949-ish%.
2. "Any downtime occurring prior to a successful Service Credit claim cannot be used
for future claims. This means, if you have already submitted a claim for an
outage in January and you have another outage in March, the outage time in
January doesn't actually count in your annual uptime calculation. Essentially with
this language, your trailing 365 days now resets to February 1st and you have to
experience another five hours of outage before you can successfully receive
another credit. With this language its arguable that there is an annual uptime
guarantee of 99.95% since the trailing 365 days resets after every "successful
claim."
NOTE: When analyzing SLAs it helps to create a simple SLA calculator in Excel so
you can quickly run scenarios. Here is a snapshot of the one I created for the
Amazon agreement.

3. Region Unavailable and Region Unavailability means that more than one
Availability Zone in which you are running an instance, within the same Region, is
Unavailable to you. Simply put, two or more zones must be down at the same
time with in a region. And, to get a credit you must be running instances in the
affected zones (plural). If you have a single instance in a single zone, there would
be no credit for the outage.

Service Commitments and Service Credits


If the Annual Uptime Percentage for a customer drops below 99.95% for the Service
Year, that customer is eligible to receive a Service Credit equal to 10% of their bill
(excluding one-time payments made for Reserved Instances) for the Eligible Credit
Period. To file a claim, a customer does not have to have wait 365 days from the day
they started using the service or 365 days from their last successful claim. A customer
can file a claim any time their Annual Uptime Percentage over the trailing 365 days
drops below 99.95%.
We will apply any Service Credits only against future Amazon EC2 payments otherwise
due from you; provided that, we may issue the Service Credit to the credit card that you
used to pay for Amazon EC2 for the billing cycle in which the error occurred. Service
Credits shall not entitle you to any refund or other payment from AWS. A Service Credit
will be applicable and issued only if the credit amount for the applicable monthly billing
cycle is greater than one dollar ($1 USD). Service Credits may not be transferred or
applied to any other account. Unless otherwise provided in the AWS Agreement, your
sole and exclusive remedy for any unavailability or non-performance of Amazon EC2 or
other failure by us to provide Amazon EC2 is the receipt of a Service Credit (if eligible) in
accordance with the terms of this SLA or termination of your use of Amazon EC2 .
Whether youre down for 5 hours or 5 days your maximum credit (using my base
assumptions) will be$36.00 assuming you used your instance for 100% of the hours in a
given month. For most users this credit will be much lower.
Question to ask yourself:

What if the outages were not in one big chunk but rather the instance was down
for five minutes every hour for 60 hours? Is that more or less disruptive? How
would that affect your business?
What if your instance was down every hour for 4 minutes and 59 seconds at
random? You would never be able to make an SLA claim under the current
contract. How would that affect your business? Is that an acceptable risk for you?

Credit Request and Payment Procedures


To receive a Service Credit, you must submit a request by sending an e-mail message to
aws-sla-request @ amazon.com. To be eligible, the credit request must (i) include your
account number in the subject of the e-mail message (the account number can be found
at the top of the AWS Account Activity page); (ii) include, in the body of the e-mail, the
dates and times of each incident of Region Unavailable that you claim to have
experienced including instance ids of the instances that were running and affected
during the time of each incident; (iii) include your server request logs that document the
10

errors and corroborate your claimed outage (any confidential or sensitive information in
these logs should be removed or replaced with asterisks); and (iv) be received by us
within thirty (30) business days of the last reported incident in the SLA claim. If the
Annual Uptime Percentage of such request is confirmed by us and is less than 99.95%
for the Service Year, then we will issue the Service Credit to you within one billing cycle
following the month in which the request occurred. Your failure to provide the request
and other information as required above will disqualify you from receiving a Service
Credit.
1. You are required to monitor your instances and provide that data back to Amazon
in the event of an outage. Does your IT department know about this requirement?
Do you test your ability to monitor your instance? Without this data you cannot
make your SLA claim.
2. Interestingly, even though Amazon requires you to provide detailed evidence of an
outage it is still left to Amazons discretion as to whether the credit is given.
Amazon EC2 SLA Exclusions
The Service Commitment does not apply to any unavailability, suspension or termination
of Amazon EC2, or any other Amazon EC2 performance issues: (i) that result from a
suspension described in Section 6.1 of the AWS Agreement; (ii) caused by
factors outside of our reasonable control, including any force majeure event or Internet
access or related problems beyond the demarcation point of Amazon EC2; (iii) that
result from any actions or inactions of you or any third party; (iv) that result from your
equipment, software or other technology and/or third party equipment, software or
other technology (other than third party equipment within our direct control); (v) that
result from failures of individual instances not attributable to Region Unavailability; or
(vi) arising from our suspension and termination of your right to use Amazon EC2 in
accordance with the AWS Agreement (collectively, the Amazon EC2 SLA Exclusions). If
availability is impacted by factors other than those explicitly listed in this agreement, we
may issue a Service Credit considering such factors in our sole discretion.
1. There are several other documents referenced above. You need to always check all
referenced documents and read the language. I have included the referenced
language section below.
2. There is a lot of vague language in this section such as v) that result from failures
of individual instances not attributable to Region Unavailability. I have no idea
what "instances not attributable to Region Unavailability" means and Im sure
most business users dont either. If you dont know what it is ask and have it
included as a definition in the contract.

11

3. Another phrase I hate.In our sole discretion. This is another "get out of jail free
card." Translation: Other things may cause outages that we havent anticipated
and we may or may not give you an SLA credit when they occur.

AWS Customer Agreement (referenced in the SLA agreement above)


http://aws.amazon.com/agreement/
6.1 Generally. We may suspend your or any End Users right to access or use any
portion or all of the Service Offerings immediately upon notice to you if we determine:
(a) your or an End Users use of the Service Offerings (i) poses a security risk to the
Service Offerings or any other AWS customer, (ii) may adversely impact the Service
Offerings or the systems or Content of any other AWS customer, or (iii) may subject us,
our affiliates, or any third party to liability;
(b) you are, or any End User is, in breach of this Agreement, including if you are
delinquent on your payment obligations for more than 15 days; or
(c) you have ceased to operate in the ordinary course, made an assignment for the
benefit of creditors or similar disposition of your assets, or become the subject of any
bankruptcy, reorganization, liquidation, dissolution or similar proceeding.
13.2 Force Majeure. We and our affiliates will not be liable for any delay or failure to
perform any obligation under this Agreement where the delay or failure results from any
cause beyond our reasonable control, including acts of God, labor disputes or other
industrial disturbances, systemic electrical, telecommunications, or other utility failures,
earthquake, storms or other elements of nature, blockages, embargoes, riots, acts or
orders of government, acts of terrorism, or war.

1. Make sure youre up on your payments or you will not get your SLA credit
2. Ahh, good old Force Majeure. Systemic electrical, telecommunications, or other
utility failures are in my opinion key components for providing Cloud service, and
therefore should not be included in Force Majeure.

12

Appendix B: SLA Chart


You can use the following uptime SLA charts to determine your potential
daily/monthly/yearly downtime with a given SLA. The SLAs have been divided up into
categories based on quality of SLA in the context of business use.

Uptime SLA Detailed Chart


High Comfort/Low Risk: At or above industry average*
Calculated Daily

Calculated Monthly

Calculated Monthly

Calculated
Annually

Calculated
Annually

SLA%

Outage in Minutes

Outage in Minutes

Outage in Hours

Outage in Hours

Outage in Days

99.999%
99.99%
99.95%
99.90%
99.50%

0.0144
0.144
0.72
1.44
7.2

0.432
4.32
21.6
43.2
216

0.0072
0.072
0.36
0.72
3.6

0.0864
0.864
4.32
8.64
43.2

0.0036
0.036
0.18
0.36
1.8

*Ideal for mission critical business applications.

Medium Comfort/Medium Risk: At or below industry standard*


SLA%

99%
98.99%
98.95%
98.90%
98.50%
98%

Calculated Daily

Calculated Monthly

Calculated Monthly

Calculated Annually

Calculated Annually

Outage in Minutes

Outage in Minutes

Outage in Hours

Outage in Hours

Outage in Days

14.4
14.544
15.12
15.84
21.6
28.8

432
436.32
453.6
475.2
648
864

7.2
7.272
7.56
7.92
10.8
14.4

86.4
87.264
90.72
95.04
129.6
172.8

3.6
3.636
3.78
3.96
5.4
7.2

*May be an acceptable risk based on business need

Low Comfort/High Risk, Below industry standard*


SLA%

97.99%
97.95%
97.90%
97.50%
97%
96.99%
96.95%
96.90%
96.50%

Calculated Daily

Calculated Monthly

Calculated Monthly

Calculated Annually

Calculated Annually

Outage in Minutes

Outage in Minutes

Outage in Hours

Outage in Hours

Outage in Days

28.944
29.52
30.24
36
43.2
43.344
43.92
44.64
50.4

868.32
885.6
907.2
1080
1296
1300.32
1317.6
1339.2
1512

14.472
14.76
15.12
18
21.6
21.672
21.96
22.32
25.2

173.664
177.12
181.44
216
259.2
260.064
263.52
267.84
302.4

7.236
7.38
7.56
9
10.8
10.836
10.98
11.16
12.6

*Providers with SLAs that fall into this section should not be considered if Mission
Critical applications are in question.
13

Low Comfort/High Risk: Below industry standard*


SLA%

96%
95.99%
95.95%
95.90%
95.50%
95%
94.99%
94.95%
94.90%
94.50%

Calculated Daily

Calculated Monthly

Calculated Monthly

Calculated Annually

Calculated Annually

Outage in Minutes

Outage in Minutes

Outage in Hours

Outage in Hours

Outage in Days

57.6
57.744
58.32
59.04
64.8
72
72.144
72.72
73.44
79.2

1728
1732.32
1749.6
1771.2
1944
2160
2164.32
2181.6
2203.2
2376

28.8
28.872
29.16
29.52
32.4
36
36.072
36.36
36.72
39.6

345.6
346.464
349.92
354.24
388.8
432
432.864
436.32
440.64
475.2

14.4
14.436
14.58
14.76
16.2
18
18.036
18.18
18.36
19.8

*If you are looking at a provider that falls into this category look elsewhere. If you have an existing
provider whose uptime is falling into this section immediate and definitive action should be taken.

Uptime SLA Summary Chart

14

About Intreis
Intreis is a Chicago based consulting firm specializing in IT Governance Risk &
Compliance and IT Service Management integrations. Intreis also offers a wide range of
services which support ITGRC and ITSM integrations including: Assessments, Controls
Definition, Process Design, Remediation Work, Risk and Compliance Strategy, Training
and Education. For more information about Intreis services, please visit us at
www.intreis.com .

Created By:
Morgan Hunter, VP Professional Services
Email: Morgan.Hunter@Intreis.com
Twitter: @Intreis
Web: www.Intreis.com

Copyright Intreis, Inc. 2013. All rights reserved. No part of this publication may be
reproduced, or distributed without the prior written permission of Intreis.
15

Potrebbero piacerti anche