Here is a scenario that plays out constantly in enterprise software teams. A product manager asks the company’s AI assistant: “Who are our top customers this quarter?” The system returns a clean, ranked list. It looks right. Everyone moves on.
Except the product group defines “top” by engagement. Finance defines it by net revenue. Sales defines it by deal size. The AI picked one interpretation, presented it with complete confidence, and nobody noticed until a strategy decision got made based on numbers that meant something different to every person in the room.
This is not hallucination in the way people usually talk about it. The system didn’t make anything up. It just made a choice about meaning that was never its choice to make.
The Real Problem Isn’t the Model
There’s a widespread assumption in enterprise AI adoption that if you pick the right model, tune it carefully, and feed it good data, you’ll get reliable outputs. That assumption misses the actual failure mode.
LLMs are extraordinarily good at language. They are not good at organizational meaning. Ask your AI what your churn rate is, and watch what happens. The model doesn’t know whether you measure churn at the subscription level or the customer level. It doesn’t know whether you count downgrades or ignore them. It doesn’t know if enterprise accounts with multiple seats are handled differently. These are not answers buried in a document somewhere. They are organizational decisions that live in tribal knowledge, team agreements, and data model comments written two years ago by someone who has since left the company.
The model will infer. And inference, presented with confidence, is a liability.
Embeddings Don’t Fix This
The standard response to this problem is better retrieval. Embed your documentation, pull the most relevant chunks, give the model more context. It’s a reasonable intuition and a partial improvement. But it does not solve the underlying issue.
Embeddings measure how close two pieces of text are in vector space; they say nothing about whether a given interpretation is actually correct for your organization. “Revenue” and “profit” are neighbors in embedding space because they appear together constantly in financial writing. In your financial reporting system, conflating them is a serious error. No amount of retrieval resolves that because the correct answer isn’t in any document. It’s in a decision your finance team made about how to define things, probably years ago, probably never written down in a form a machine can use.
The same structural problem shows up everywhere. “Active user” means something different to your engineering team (an API call) than to your product team (a completed transaction). “Conversion” means a successful HTTP request to one team and a signup-to-paid progression to another. “Engagement” is event frequency in one dashboard and session depth in another. Retrieval doesn’t resolve definitional ambiguity. It just retrieves more text that contains the ambiguity.

Figure 1: Without a semantic layer, LLM outputs are plausible but inconsistent. With one, they are grounded and correct.
What Actually Needs to Happen
The answer is a semantic layer, a structured, machine-readable representation of what your organization’s terms actually mean. Not a glossary. Not better documentation. A formal encoding of entities, relationships, metrics, and disambiguation rules that sits between your data and your AI system, so that when someone asks about churn or active accounts or top customers, the system isn’t guessing.
This isn’t a new idea in the data world. Tools like dbt and Looker have applied it to business intelligence for years. What’s new is the pressure to extend it into AI pipelines, and the tooling is catching up: the dbt Semantic Layer now supports direct AI pipeline integration, and platforms like Cube are building native LLM connections for exactly this purpose.
The practical starting point for most teams is a schema-based approach: YAML or JSON configuration files, version-controlled in git, injected at inference time. Less rigorous than formal ontologies, but dramatically more maintainable, and usually sufficient. If you already have a BI semantic layer, your definitional work is largely done. The challenge is making it queryable when the AI needs it.
The Harder Problem Is Organizational
Here’s what most architecture posts leave out: the technical implementation is the easy part. Getting three departments to agree on what “active” means is not. Building and maintaining a semantic layer forces conversations that organizations routinely avoid, and it surfaces disagreements that have been quietly producing inconsistent results for years. That’s uncomfortable. It’s also the point.
There is a simple test I use: if a new hire would need to read internal documentation to understand what a key business term means, that term belongs in a semantic layer, not in a prompt.
The next phase of enterprise AI isn’t about which model you use. It’s about how well your organization has systematized its own knowledge for machine consumption. From prompt engineering to context engineering. From data pipelines to meaning pipelines. The teams that get this right will produce AI outputs that aren’t just fluent; they’ll be correct. In enterprise systems, being fluent is not enough. If your AI is not definitionally correct, it is operationally unreliable.
Instead of asking: “Who are our top customers?” — Define it:
TopCustomer = revenue_last_90_days > $50K AND active_subscription = true
