How much downtime per month does 99.9% vs 99.99% allow?

Measured against a 30-day month, 99.9% (three nines) allows 43 minutes and 12 seconds of total downtime, while 99.99% (four nines) allows just 4 minutes and 19 seconds. That single extra nine cuts your monthly downtime budget by roughly 10x, from about three-quarters of an hour down to under five minutes.

Why is each additional nine exponentially harder to achieve?

Every extra nine costs you about 10x less downtime budget, and roughly 10x more operational maturity, redundancy, and money to back it up. 99% is basically one server with monitoring. 99.9% needs a load balancer, multiple app instances, a managed database, and an on-call rotation. 99.99% requires active multi-region or multi-AZ setups with tested failover, database replicas, and deploys that don't cause noticeable downtime. 99.999% is the telco tier with active/active regions and dual vendors, which almost no consumer SaaS actually delivers.

What actually counts as downtime under an SLA?

SLAs usually define downtime by what they exclude. Common exclusions are planned maintenance windows announced in advance (typically 48 to 72 hours' notice), customer-caused issues like misconfigured DNS, force majeure such as an AWS us-east-1 outage, and outages below a duration threshold, since many SLAs only count outages lasting more than 5 minutes. Your monitoring tool sees raw availability instead, so the number you publish is usually more generous than what your monitors report.

What uptime do most SaaS products actually achieve?

Most SaaS products operate around 99.9% in practice, which is the realistic ceiling for a competent single-cloud region setup. Major providers reflect this: AWS S3 is 99.9% standard, GitHub Enterprise Cloud is 99.9%, and an AWS EC2 single instance is only 99.5%. Saying 99.9% plainly is more credible than promising 99.99% and quietly bleeding SLA credits.

How do I pick an SLA I can actually keep?

Follow three rules. First, measure before you promise: run multi-region monitoring for at least a quarter, and if your measured uptime is 99.7%, don't promise 99.99%. Second, promise less than you deliver, since publishing 99.9% while running at 99.97% is a compounding advantage. Third, tie the SLA to things you control by defining scope, exclusions, and how uptime is measured. Also remember that if you build on a 99.9% dependency, you mathematically cannot commit above 99.9% without engineering around it.

reliability

99.9% vs 99.99% SLA: Downtime Math Explained

Uptimera teamMay 21, 20267 min readUpdated June 30, 2026

"We're targeting four nines" sounds great in a board deck. It sounds different when you do the math: 99.99% uptime is 4 minutes 22 seconds of downtime per month — total. That includes every deploy, every database failover, every misconfigured DNS change, every cert renewal that didn't go cleanly.

This post walks through what each "nine" actually buys you, why each one is exponentially harder than the last, and how to pick an SLA you can actually keep.

The math: nines to minutes

Per 30-day month, here's what each tier allows in total downtime:

99% (two nines): 7 hours, 12 minutes of downtime allowed per month.
99.5%: 3 hours, 36 minutes.
99.9% (three nines): 43 minutes, 12 seconds.
99.95%: 21 minutes, 36 seconds.
99.99% (four nines): 4 minutes, 19 seconds.
99.999% (five nines): 25.9 seconds.
99.9999% (six nines): 2.6 seconds. Effectively zero on any meaningful timescale.

Downtime allowed per month at each SLA tier. Each added nine cuts the budget roughly 10x — a constant step on this scale.

What counts as downtime, anyway?

This is the question that turns SLAs from marketing copy into contractual landmines. Read any production SLA carefully and you'll find downtime is defined by what gets excluded:

Planned maintenance windows. Excluded if announced in advance — usually 48–72 hours' notice.
Customer-caused issues. Misconfigured DNS pointing at your service, hitting your own rate limits, etc.
Force majeure. AWS us-east-1 going down is usually upstream, not your fault.
Below a duration threshold. Many SLAs only count outages lasting more than 5 minutes.

Your monitoring tool, by contrast, sees raw availability — every timeout, every retried request, every flap. The number you publish to customers is almost always more generous than the number your monitors report. That's OK as long as the definition of "down" is documented and consistent. For the internal target you actually chase, see the difference between SLA, SLO, and SLI and how an error budget translates the SLO into real engineering policy.

Why each nine is exponentially harder

Roughly speaking, the operational cost of each tier:

99% — "basically up." One server, decent monitoring, alert on death. Most side projects hit this without trying.
99.9% — "competent single-cloud." Load balancer in front of multiple app instances, managed database with automated backups, monitoring with on-call rotation. This is the realistic ceiling for most SaaS on a single cloud region.
99.99% — "active multi-region or multi-AZ." Failover paths actually tested. Database replicas. Deploys that don't cause noticeable downtime. Chaos engineering. An on-call team large enough to cover 24/7 without burnout. This is a different category of operational investment.
99.999% — "the telco tier." Active/active across regions, dual vendors for critical dependencies, change-management processes that approach regulated industries. Almost no consumer SaaS actually delivers this in practice, regardless of what they claim.

What major providers actually commit to

For reference — these are all 99.9% three-nines, not four:

AWS EC2 Region SLA: 99.99% for a region (across multi-AZ); 99.5% for a single instance.
AWS S3: 99.9% standard, 99.99% target.
Stripe API: 99.99% target (very high in practice).
GitHub Enterprise Cloud: 99.9%.
Cloudflare: 100% for paid Enterprise (with credits below); SLAs vary by service.

If you build on top of a 99.9% dependency, you mathematically cannot commit to higher than 99.9% to your customers without either engineering around the dependency or accepting the credit risk.

How to pick an SLA you can keep

Three rules:

Measure first; promise second. Run multi-region uptime monitoring with quorum for at least a quarter before you put a number in a contract. If your current measured uptime is 99.7%, don't promise 99.99%.
Promise less than you deliver. Customers remember outages, not SLA credits. Publishing 99.9% while actually running at 99.97% is a competitive advantage that compounds.
Tie SLA to the things you control. Define the scope (which endpoints, which regions), define exclusions (maintenance, customer-caused), and define how it's measured. Vague SLAs become disputes.

Why this means you need a monitor

You can't hit an SLA you don't measure. The whole point of uptime monitoring is to give you the raw data — separate from your customers' complaints, separate from your incident channel — to know whether the number you're publishing is actually true.

That's the entire reason Uptimera exists: independent, multi-region uptime measurement that you can point at when the customer asks "are you really at 99.9%?"

Frequently asked questions

How much downtime per month does 99.9% vs 99.99% allow?: Measured against a 30-day month, 99.9% (three nines) allows 43 minutes and 12 seconds of total downtime, while 99.99% (four nines) allows just 4 minutes and 19 seconds. That single extra nine cuts your monthly downtime budget by roughly 10x, from about three-quarters of an hour down to under five minutes.
Why is each additional nine exponentially harder to achieve?: Every extra nine costs you about 10x less downtime budget, and roughly 10x more operational maturity, redundancy, and money to back it up. 99% is basically one server with monitoring. 99.9% needs a load balancer, multiple app instances, a managed database, and an on-call rotation. 99.99% requires active multi-region or multi-AZ setups with tested failover, database replicas, and deploys that don't cause noticeable downtime. 99.999% is the telco tier with active/active regions and dual vendors, which almost no consumer SaaS actually delivers.
What actually counts as downtime under an SLA?: SLAs usually define downtime by what they exclude. Common exclusions are planned maintenance windows announced in advance (typically 48 to 72 hours' notice), customer-caused issues like misconfigured DNS, force majeure such as an AWS us-east-1 outage, and outages below a duration threshold, since many SLAs only count outages lasting more than 5 minutes. Your monitoring tool sees raw availability instead, so the number you publish is usually more generous than what your monitors report.
What uptime do most SaaS products actually achieve?: Most SaaS products operate around 99.9% in practice, which is the realistic ceiling for a competent single-cloud region setup. Major providers reflect this: AWS S3 is 99.9% standard, GitHub Enterprise Cloud is 99.9%, and an AWS EC2 single instance is only 99.5%. Saying 99.9% plainly is more credible than promising 99.99% and quietly bleeding SLA credits.
How do I pick an SLA I can actually keep?: Follow three rules. First, measure before you promise: run multi-region monitoring for at least a quarter, and if your measured uptime is 99.7%, don't promise 99.99%. Second, promise less than you deliver, since publishing 99.9% while running at 99.97% is a compounding advantage. Third, tie the SLA to things you control by defining scope, exclusions, and how uptime is measured. Also remember that if you build on a 99.9% dependency, you mathematically cannot commit above 99.9% without engineering around it.

Uptimera team

We build Uptimera — multi-region uptime monitoring, SSL and DNS checks, and branded status pages. These guides come from running the same monitoring and on-call practices we write about.