

Like many large enterprises, we must navigate the beauty and chaos of legacy code. In our case, decades of SQL procedures and business logic that underpin a platform capable of handling over 3 million concurrent users and hundreds of micro code deployments per week. It’s a complex machine. Touch one part, and you risk breaking 10 others. That’s why modernizing the codebase is both a technical challenge and a human one. It requires empathy, trust, and the ability to make informed guesses.
Inside the Innovation Engine
At bet365, the platform innovation function was established to provoke possibility. We’re a small, specialized team charged with exploring emerging and future technologies. Our aim is to identify where they can have the greatest impact, and help the wider organization understand how to use them meaningfully.
We’re enablers and ambassadors for change. Our work spans everything from product development and cybersecurity to the future of the workforce. Our guiding model is McKinsey’s Three Horizons of Growth reimagined for innovation. Horizon 1 focuses on what we can implement today. Horizon 2 explores what’s coming next. Horizon 3 dares us to consider the future no one is talking about yet.
This framework helps us balance ambition with pragmatism. It creates space to experiment without losing sight of operational value, and it ensures our developers, architects, and stakeholders are all part of the same conversation.
When GenAI Met Developers
When GPT-4 dropped in 2023, everything changed. Like most in the tech world, we were fascinated. Generative AI offered a tantalizing vision of the future filled with faster insights, instant summaries, and automated refactoring. But the excitement quickly gave way to doubt. We handed very capable developers a powerful LLM and said, “Go for it.” The results were mixed at best.
They inserted code into the prompt windows, stripped out context to save space, and hoped the AI would understand. It didn’t. Developers were confused, frustrated, and, understandably, skeptical. They saw the AI as a shortcut, not a partner, and when the output didn’t match expectations, frustration followed. Many asked the same question: “Why am I asking a machine to write code I could just write myself?”
What we learned was profound. The problem wasn’t the AI. It was the relationship between the AI and the person using it. We had assumed that skill in software engineering would automatically translate to skill in prompt engineering. It didn’t. Did we miss something? The point we couldn’t overlook was during the exercise, our developers were completing the tasks consistently around 80% of estimated time. There was definitely something here. We just weren’t sure what it was. So, we went back to basics.
Vibe Coding and the Limits of Trust
There’s a new term in developer culture: “vibe coding.” It’s where you throw a chunk of code at an LLM, get a response, tweak it, throw it back. Iterate fast. Ship faster. It’s trendy. It’s seductive. But it isn’t risk free.
Without a clear understanding of intention or context, vibe coding can quickly become a game of trial and error. And when your system is as complex as ours – many databases processing 500,000 transactions a second – “trial and error” isn’t good enough. We needed more than vibes. We needed vision.
Context Over Content
The turning point came when we realized the real job wasn’t teaching AI how to write better code. It was teaching humans how to communicate with AI. We learned a new mantra: intention + context + detail. That’s what the AI needs. Not just content. Not just “fix this function.” But: “Here’s what this code does, here’s why it matters, and here’s what I need it to become.” This insight is key.
Our developers, especially those tackling the most complex, interdependent problems, adapted quickly. They were used to thinking deeply, providing rationale, and navigating ambiguity. They got it. They fed the AI what it needed. They flourished. The difference was mindset. We came to call this phenomenon “the unreliable narrator.” Not just the AI, but the developer. Because often, the problem wasn’t that the machine got it wrong. It was at times that we weren’t clear on what we were asking.
RAG, GraphRAG, and the Power of Grounded Context
To build reliable, human-aligned AI support we needed a way to ground what the AI was seeing in fact. That’s where we saw the power of Retrieval-Augmented Generation (RAG). RAG allows an AI model to retrieve relevant context from an external source – like documentation, system metadata, or a knowledge base – before generating a response. It’s faster to implement and more flexible than fine-tuning, making it ideal for dynamic, domain-intensive environments like ours. Developers can update the knowledge base without retraining the model, keeping outputs current and grounded.
But RAG has its limits. When a question spans multiple systems or requires reasoning across disconnected pieces of information, traditional RAG, which is based on text similarity, starts to falter. That’s why we turned to GraphRAG, a more advanced approach that uses a knowledge graph to enhance LLM outputs.
A knowledge graph doesn’t just hold facts, it encodes relationships. It captures how components interact, where dependencies lie, and what could break if you change something. GraphRAG uses this structure to augment prompts at query time, giving the AI the relational context it needs to answer with precision. This is especially true in environments where accuracy is critical, and hallucinations are unacceptable.
As a real-world exercise, we looked at our SQL server estate. We wanted to build a system that we could use to gain valuable insight on how the system works.
To build it, we started by parsing all our database objects including tables, views, procedures, functions, etc. into abstract syntax trees (ASTs). Using Microsoft’s ScriptDOM, we extracted key facts and used them to construct the initial knowledge graph. We overlaid this with natural language descriptions to further explain what each element did, and added runtime statistics like execution frequency, CPU time, and read volumes.
The result was a rich, relational representation of our SQL estate, complete with contextual insights about how objects are consumed and how they interact. Then we surfaced this intelligence to developers through three core tools:
- A chatbot that lets users query the system in plain language
- A visualiser that renders a 3D map of dependencies and relationships
- A Cypher executor for advanced graph querying and analysis
What’s important to note is that most of the system’s value lies in the graph, not the model. The AI doesn’t need to know everything. It just needs to know where to look, and how to ask the right questions. That’s the power of grounding.
For us, GraphRAG wasn’t just a nice-to-have, it became essential. It helped us move from generic code assistance to something far more valuable: a system that understands what our code means, how it behaves, and what it impacts.
We’re not just writing code anymore. We’re curating it. We’re shaping the intentions behind it. Our developers now have tooling to gain further insight to become code reviewers, system designers, and transformation agents at an expert level across huge department spanning architectures. All from a simple interface allowing natural language inquiries That’s the real shift. The future isn’t about AI doing our jobs. It’s about reimagining what the job is.
The success of our code modernization program has little to do with algorithms and everything to do with attitude. We had to unlearn old habits, rethink our relationship with code, and embrace a culture of curiosity. We had to stop asking AI for answers and start giving it the right questions. The technology was the easy part. The people part, now that was the real breakthrough.