I’ve run production systems where we could tell you the p99 latency of any endpoint down to the microsecond, but couldn’t explain why our AWS bill jumped $40,000 in a single weekend. That disconnect — between operational visibility and financial reality — is where most observability strategies quietly fail.
The orthodox telemetry trinity (metrics, logs, traces) gives you performance. Error rates. Request volumes. Latency distributions that let you argue about whether 250 ms is acceptable for a search API. What it won’t tell you is that the microservice you just optimized for speed now costs $0.03 per invocation instead of $0.002, and at scale, that rounding error becomes someone’s quarterly budget.