There’s a particular kind of incident that doesn’t show up in your error dashboards. No alerts fire. Latency looks fine, actually — or fine-ish, in that flickering, indeterminate way that makes you suspicious but not certain. What shows up, days later, is a billing anomaly. A line item that’s 4x what you budgeted. And when you dig, you find it: retries. Hundreds of thousands of them. Loyal, tireless, utterly pointless retries, hammering a dependency that was never going to recover within the retry window, each one spinning up a Lambda invocation, writing to CloudWatch, touching the database, accruing egress. The system was “retrying” its way into insolvency.
This is what I mean when I call uncontrolled retries a self-inflicted Denial-of-Wallet attack. Not metaphorically. Mechanically.