SLI, SLO and SLA: quick definitions for developers

Posted: 24.1.2024 19.40.07 (EET/GMT+2)

If you work with modern production systems, you'll often see terms like SLI, SLO, and SLA. Here's a quick way to understand these terms and their meaning.

These terms come from DevOps and SRE (Site Reliability Engineering) practices, but they are directly relevant to application development as well.

Here are the definitions:

SLI (Service Level Indicator): the metric you measure.
SLO (Service Level Objective): the target or goal for that metric.
SLA (Service Level Agreement): the formal commitment, often contractual.

A simple example for a web API:

SLI: percentage of successful HTTP requests.
SLO: 99.9% of requests succeed over the period of, say, 30 days.
SLA: customers are compensated if availability drops below 99.9%.

Another example for latency:

SLI: request duration in milliseconds.
SLO: 95% of requests complete under 200 ms.
SLA: agreed performance level in a contract.

From a developer perspective, these show up as concrete requirements:

what to measure (logging, metrics, telemetry)
what to optimize (performance, error handling)
what to test (load, failure scenarios).

In many teams, SLOs drive engineering decisions. For example, if you are close to missing an objective, you might prioritize reliability work over new features.

In short:

SLI = what you measure.
SLO = what you aim for.
SLA = what you promise.

Microsoft's Azure Well-Architected Framework or WAF has good documentation on these topics under the Reliability section. Recommended reading!