Analytics agents are moving from answering questions to doing things — running SQL, resolving metrics, fetching lineage, creating exports, and triggering workflows. This shift breaks a common assumption in GenAI projects: that production will be fine if the agent’s prompt is good. In reality, once an agent can call tools, you are operating a distributed system whose behavior can drift with every model upgrade, prompt change, routing adjustment, or schema change.
Most teams respond by adding a few guardrails, tuning prompts, or rate-limiting tool access. That helps, but it doesn’t address the failure mode that matters most in data platforms: the same question leading to different tool behavior over time. A small change can turn a safe metric lookup into raw SQL, increasing retries and introducing silent correctness drift without any explicit error. Traditional data platforms solved this problem with data contracts, which consist of SLOs, explicit interfaces, controlled rollouts, and ownership.