
Most people who use banking apps never think about what happens behind the scenes when a transaction goes through. They tap a button, money moves, and that’s that. But for the engineers responsible for making sure those transactions work reliably, the reality is considerably more complicated particularly when bugs only reveal themselves under very specific conditions that no test environment ever anticipated.
Tanvi Mittal, a software quality engineering practitioner with 15 years of experience in enterprise financial systems, knows this problem intimately. She has spent much of her career building and leading test automation frameworks for large-scale banking applications, and over that time she noticed a pattern that kept repeating itself. Bugs that passed through every layer of testing, development, staging and QA would surface in production, often in ways that were difficult to trace and expensive to fix.
One incident in particular shaped her thinking. A transaction bug went undetected through the entire testing cycle and was eventually caught not by an automated alert or a monitoring tool, but by a bank teller during an actual customer interaction. The first two transactions in a sequence had worked fine. The third failed. It took days to diagnose. The bug only triggered under that specific sequence of events, at that volume, and no lower environment had ever come close to replicating it.
“The data kept showing the same pattern,” Mittal says. “Bugs were getting shipped into production that we simply couldn’t find in lower environments. Not because the team wasn’t doing their job but because lower environments don’t behave like production.”
That experience, and others like it, led her to start thinking differently about where test coverage comes from. Requirements documents and manually written test plans reflect what engineers expect users to do. Production logs reflect what users actually do in every edge case, every unusual sequence, every failure mode that nobody thought to test for. The question Mittal kept coming back to was why those logs weren’t being used to drive test generation.
That question eventually became LogMiner-QA.
Building Something That Didn’t Exist
LogMiner-QA ingests raw application logs and uses AI and machine learning to automatically generate Gherkin test scenarios, the structured, human-readable format used by testing frameworks like Cucumber and Pytest-BDD that can be fed directly into CI/CD pipelines. The idea is to take the behavioral intelligence already embedded in production logs and make it actionable for QA teams before the next release ships, rather than after something breaks.
Getting there took longer than Mittal expected, and the challenges were less glamorous than the concept. The core difficulty was that production logs are not standardized. Every organization structures them differently. Field names vary; one system calls it “message,” another calls it “msg.” Timestamp formats differ. Some teams log at the transaction level, others at the session level. Building a tool that could reliably interpret logs across that kind of variability meant testing against a wide range of real log samples and iterating constantly.
“Every time I tested against a new log structure, something broke,” she says. “That was the unglamorous part of building this, not the AI, but the messy, inconsistent reality of how logs actually look in the wild.”
The tool handles this through flexible field mapping and configurable ingestion, supporting local JSON and CSV files as well as connectors to Elasticsearch and Datadog. Under the hood, it uses NLP enrichment with transformer embeddings, clustering, and an Isolation Forest anomaly scoring engine to identify unusual behavioral patterns. An LSTM-based journey analysis component reconstructs actual customer flows across sessions, surfacing the sequences like that three-transaction failure that manual test design consistently misses.
The Privacy Problem Nobody Wanted to Talk About
When Mittal started talking to people about the tool, she ran into a reaction she had anticipated but still had to work through carefully. The moment she mentioned production logs, people got cautious. In a banking context, production logs contain real customer data account numbers, transaction IDs, IBANs, behavioral patterns that can be tied back to individuals. The idea of running those logs through any external tool raised immediate compliance concerns.
“Convincing people that putting production logs into the tool is safe was a cultural challenge as much as a technical one,” she says.
Her response was to make privacy the architectural foundation rather than a feature added on top. LogMiner-QA sanitizes logs before any analysis takes place, using pattern matching and spaCy-based named entity recognition to detect PII, redact sensitive fields, and replace them with stable tokens that preserve referential integrity without exposing underlying data. A differential privacy layer adds calibrated noise to aggregate metrics, making it computationally infeasible to reconstruct individual customer behavior from anonymized outputs. The tool runs on-premises, in containerized air-gapped environments, meaning logs never leave the organization’s own infrastructure.
For compliance teams in regulated industries, that last point tends to end the conversation quickly in a good way.
Closing the Coverage Blind Spot
Mittal initially scoped LogMiner-QA for banking, the domain she knew best and where the stakes around production failures are highest. But as the tool developed, she started to see the same underlying problem across other regulated industries healthcare, insurance, financial services broadly. The gap between what test suites cover and what production does is not unique to banking. It is structural, and it exists wherever test design is driven primarily by requirements documents rather than observed user behavior.
The tool reflects that broader scope. Its compliance module generates PCI and GDPR-aligned test scenarios. Its fraud detection module specifically targets velocity anomalies, high-value transaction flows, and failed login sequence behaviors that are nearly impossible to replicate in lower environments without real production data as a reference point. A CI mode emits compact JSON summaries for pipeline gates, allowing teams to fail builds automatically when high-severity findings or anomaly thresholds are exceeded.
LogMiner-QA is open source under the MIT license and available at github.com/77QAlab/LogMiner-QA. Mittal is looking for early adopters from banking and enterprise QA teams willing to test it against real log diversity, the same variability that made building it genuinely difficult. Planned additions include Splunk and CloudWatch connectors, a risk visualization dashboard, and more sophisticated fraud detection models.
For Mittal, the motivation behind all of it remains the same as it was when a bank teller caught a bug that an entire test cycle had missed. Production already knows what your test suite doesn’t. The question is whether you’re paying attention.
