Get in touch

AI-Based Test Automation Without AI

Published: April 16, 2020

Updated: September 13, 2025

Watch the Recording

Prefer video? You can watch this session where we walk through the key ideas covered in this article.
Jeremy Rößler

Rethinking How We Approach Quality

Artificial intelligence has been one of the most discussed themes in software testing over the last five years. Vendors highlight tools that claim to generate tests automatically, heal broken scripts, and interpret results with little human oversight. For many QA teams, the promise is appealing: reduce effort, accelerate releases, and keep up with the pace of modern development.

But there is a problem. Much of what is presented as “AI-based testing” either does not work in practice or creates a new burden in maintenance and oversight. In reality, testing organizations still struggle with fragile automation, high costs of upkeep, and the persistent difficulty of deciding what “correct” behavior actually looks like in a system.

At XBOSoft’s quarterly webinar, guest speaker Dr. Jeremias Rößler, creator of Recheck-web, joined CEO Philip Lew to ask a provocative question: What if the value promised by AI could be achieved without AI at all? The session challenged assumptions about test automation and offered a refreshingly grounded perspective for QA leaders who want results without chasing hype.

The Allure and the Limits of AI in Testing

Dr. Rößler began with a point that resonated with many attendees: generating hundreds of additional AI-created tests does not matter if a team cannot maintain the scripts it already has. Automation often breaks when interfaces change, when locators are updated, or when new versions of software shift workflows. The overhead of keeping tests stable is where most QA groups struggle, not in creating new scripts.

This problem leads directly to the oracle issue, one of the most fundamental challenges in software testing. In simple terms, the oracle problem asks: how do you know if the result produced by the system under test is the correct one? Human testers apply context and judgment to decide whether a screen looks right or a workflow behaves properly. AI, despite its name, cannot yet make these determinations reliably.

As a result, AI is often best suited for narrow tasks such as:

  • Suggesting potential test inputs based on past data.
  • Highlighting areas of the UI that change frequently.
  • Assisting with regression runs by clustering similar outcomes.

But when it comes to validating business logic, interpreting nuanced requirements, or confirming user expectations, AI falls short.

A Different Approach: Difference Testing and Golden Masters

Instead of leaning on machine learning models, Dr. Rößler introduced an approach rooted in difference testing and golden master testing. These techniques sidestep some of the core limitations of AI and address the real pain points in UI test automation.

Difference Testing

Difference testing works by capturing a complete state of the user interface at a given point in time and comparing it to a trusted baseline. Rather than writing assertions for each individual element, testers evaluate the differences between the two states. Changes that are expected or inconsequential can be filtered out, while meaningful deviations are flagged.

This reduces the effort required to script detailed expectations and dramatically lowers maintenance costs. Instead of rewriting assertions when a button shifts position or a font changes slightly, teams focus only on differences that matter for functionality and user experience.

Golden Master Testing

Golden master testing complements this by recording a reference output — the “golden master” — that represents correct behavior. Future runs of the test compare the application against this golden master, highlighting changes.

The benefit is twofold. First, testers are freed from predicting every possible expected output in advance. Second, teams gain a mechanism to validate whether new changes align with known good behavior, without reinventing the test suite each time.

Noise Filtering

To make this approach practical, tools like Recheck-web incorporate noise filters. These allow teams to exclude transient or irrelevant changes, such as session IDs, timestamps, or minor visual adjustments. By removing the noise, testers can concentrate on issues that genuinely impact quality.

The result is automation that is more resilient, easier to maintain, and closer to the way humans naturally evaluate systems.

Why This Matters for QA Leaders

The discussion in the webinar underscored a larger truth for decision makers: effective automation is less about chasing new technology and more about rethinking the foundation of test design.

Consider the typical promises of AI in QA. Vendors highlight self-healing locators, automatic test generation, and predictive analytics. Yet many of these features create dependencies on opaque models and can make test suites harder to trust or debug. When something breaks, teams are left wondering whether the AI made a mistake or the system truly failed.

By contrast, difference testing and golden master approaches are transparent. Testers can see exactly what has changed and decide whether the change is acceptable. Maintenance becomes a matter of updating baselines rather than rewriting large sets of brittle assertions.

For QA leaders under pressure to deliver stability without inflating costs, this clarity is valuable. It provides a path to scale automation while keeping it manageable.

Where AI Still Fits

Dr. Rößler was clear: AI is not irrelevant. It has a role in augmenting specific parts of the testing process. For example:

  • Generating input data where realistic datasets are scarce.
  • Exploring workflows by simulating user interactions at scale.
  • Clustering results to help teams identify patterns across large regression runs.

These are tasks where AI can accelerate human effort rather than attempt to replace it. But the core responsibility of deciding correctness, validating against requirements, and interpreting user impact remains with human testers.

This distinction is critical for organizations evaluating where to invest. AI can enhance QA, but it cannot carry the discipline alone.

Practical Applications of Difference Testing

The webinar included demonstrations of how Recheck-web applies these ideas in practice, particularly within Selenium and Java-based UI tests.

One example involved a typical web application with frequent UI updates. Traditional automation scripts failed whenever developers made even minor front-end adjustments. By applying difference testing, the team could capture full-page states, highlight only meaningful changes, and ignore cosmetic adjustments. This reduced the maintenance burden and allowed testers to focus on validating business-critical workflows.

Another example showed how noise filters could be configured to exclude random IDs generated at runtime. Without filters, these would appear as constant test failures. With filters in place, the tests ran cleanly, surfacing only substantive differences.

For organizations with large regression suites, these capabilities directly translate to saved time and higher confidence in results.

Lessons for QA Strategy

Several strategic lessons emerged from the session that extend beyond the tool itself:

  1. Maintenance is the true cost driver in automation. Creating tests is easy compared to keeping them stable over time. Approaches that reduce maintenance overhead deliver the most sustainable value.
  2. Transparency builds trust. Teams need to understand why a test passes or fails. Techniques that provide clear, inspectable differences are more reliable than opaque AI-generated results.
  3. Context matters. Automation must be tailored to the workflows, environments, and user expectations of each organization. There is no universal model that captures every scenario.
  4. AI is a complement, not a substitute. Organizations should use AI where it accelerates human judgment but avoid relying on it to make correctness decisions.

Implications for QA Investment

For executives and product leaders, the broader implication is this: even as AI advances, the need for structured QA does not diminish. If anything, the complexity of modern systems makes independent validation more critical.

Automation without a clear QA strategy risks producing an illusion of coverage. Hundreds of AI-generated tests mean little if they cannot be maintained or trusted. The foundation of software quality remains human judgment, supported by tools that extend capability rather than obscure it.

This is why organizations asking, “Do we really need QA?” should consider sessions like this carefully. AI may change the tools, but it does not remove the responsibility to ensure reliability, compliance, and user trust.

Building Confidence Without Chasing Hype

The session closed with a reminder that teams do not need to wait for future AI breakthroughs to improve their automation today. Methods such as difference testing are available now, compatible with existing frameworks, and deliver tangible reductions in effort.

For QA leaders, the choice is not between AI and no AI. The real choice is between sustainable, transparent automation and fragile, hard-to-maintain suites that collapse under change. XBOSoft’s role is to help organizations make the former choice, applying proven methods while keeping a clear eye on new technologies as they mature.

The XBOSoft Perspective

Over the past two decades, we have watched AI rise from theory to everyday tool in software development and testing. Clients often ask whether it will replace QA, and our answer is always grounded in practice. Technology can automate parts of the testing process, but it cannot replace the steady assurance that comes from experienced judgment and structured methods.

Our approach is to focus on outcomes rather than features. When clients explore automation, we guide them toward approaches that reduce fragility, improve maintainability, and provide clarity when things go wrong. Sometimes that involves AI-driven tooling, and sometimes it involves simpler, more transparent methods like difference testing. What matters is not the label on the tool, but whether it helps deliver software that users can trust. This mindset has allowed our teams to partner with organizations in regulated and high-stakes industries where mistakes are costly. By embedding QA as part of the process rather than an afterthought, we help clients avoid hype and build sustainable strategies for the future.

Next Steps

Revisit QA’s role in an AI-driven landscape
See how automation and AI fit into a broader quality strategy.
Explore Do You Really Need QA?

Shape an automation strategy that lasts
Work with XBOSoft to identify approaches that balance speed, cost, and trust.
Contact XBOSoft

Access guidance on testing partner evaluation
See strategies drawn from real experience to choose partners who match your needs.
“Questions To Evaluate Software Testing Partners” White Paper

Related Articles and Resources

Looking for more insights on Agile, DevOps, and quality practices? Explore our latest articles for practical tips, proven strategies, and real-world lessons from QA teams around the world.

Industry Expertise

March 29, 2025

Atlassian AI Featured Plugins – A Survey

Industry Expertise

April 2, 2025

A New Paradigm for AI in Software Testing

Industry Expertise

January 2, 2026

AI-Informed QA: Going Beyond the Hype

1 2 3