Modern authorization endpoints often do more than approve a request. They evaluate complex policies, compute rolling aggregates, call third‑party risk services, and enforce company/card limits, all under a hard latency budget. If you miss it, the transaction fails, and the failure is customer-visible.
This post walks through a practical approach to take a Python authorization API from roughly ~3.2s P99 down to ~650ms P99, using a sequence of changes that compound: query/index correctness, deterministic query planning, connection pooling and warmup, and parallelizing third‑party I/O.