Published: April 2, 2025
Updated: September 13, 2025
“The most successful businessman is the man who holds onto the old just as long as it is good, and grabs the new just as soon as it is better.” – Lee Iacocca
Before, during, and after our recent webinar with Jason Arbon, I’ve been thinking about AI’s role in software development. Every day, there are claims about how AI will transform our lives, while others argue it’s just hype. It made me wonder—does the “Who Moved My Cheese” paradigm apply here? Should those who think AI is not practical for software testing stay in their cave and keep eating what’s left of the old cheese? Or are we looking for the holy grail, making AI in software testing futile?
As I mentioned in our webinar, the iceberg metaphor perfectly illustrates where we stand with AI. We now see only 20% of what AI can do for us. The remaining 80% is beneath the surface, waiting to be uncovered. And if anyone has a glimpse into what’s ahead, it’s Jason Arbon!
Software engineers know that the same quality challenges have existed for over a decade:
The challenge with AI in software testing is that we’re often focused on doing what we already do—just faster. We create test cases, so we want AI to generate them faster. We execute regression testing, so we want AI to accelerate the process. However, approaching testing with an AI perspective needs a new paradigm.
What would the world be like if no test cases were required? Do we need them to find defects? Maybe not! However, we still need defects to be hunted and reported efficiently so they can be fixed, whether by a machine or a human.
We received and answered some great questions during our Ai-Informed Testing webinar, and wanted to follow up here with more detail. Several questions touched on similar themes, so we’ve grouped them for a clear breakdown.
With the newest features and functionality from testers.ai, the answer is YES. Not only does it run checks from AI-generated test cases it enables you to write specific workflows to move through your application. There are lots of customization prompt opportunities to guide the AI. You could provide a list of things you want done, ignore accessibility issues, or tell it you want to ship tomorrow so only report stuff that would be a ship blocker. It will then run and take those prompts into account.
You can also add individual test cases, one-off prompts, or load a file to target certain features or functions. It can be complex and the AI will parse it into multiple tests, if there’s more than one, and then execute the test on the spot.
You can tell it to test the shopping cart with specific instructions to update the shipping address every time or never update it and add other conditionals. This is a prompt engineering skill so if you ask it to do something impossible or ridiculous, it doesn’t work. Generally speaking, AI will try to accomplish most of the arbitrary tests you define.
Testers.ai will run on any website, can connect to any browser, even with profile state, supports Windows, Mac, and Linux, and can run locally and behind a firewall. For complex scenarios or state requirements, you can connect to a live browser. With one command line, the AI will connect and perform over 731 tests in whatever context you have set up. It only tests web-based apps and uses responsive mode in the browser to check for mobile compatibility.
These days it’s fun because Jason Arbon can say yes to many things! The first run generates the test plan and test cases, subsequent runs include the cached version of the test suite to offer build-over-build comparisons of the same features and functionality, including previous test cases by default. For every agent, you get the bugs they found, and user persona feedback, which are siloed into separate JSON files. So, if you only care about security or one type of testing, you can pull those results out and load them into a test management tool of choice.
Another remarkable feature is visual diffing, which compares screenshots from previous builds with the current build. It reports any significant changes, such as additions, removals, or modifications, as potential issues or bugs in a JSON file. However, to get the full value, you have to test multiple builds, one over the other, as the diffing bot works by comparing each build.
All results are saved out unfiltered and un-triaged into JSON files for analysis and reporting or to load into any test management/defect tracking tool. The AI Test Manager also validates and triages results to generate an HTML summary report that you can print/download as a PDF.
“Great questions, it is something to worry about. First of all, what’s interesting is that in the last 18 months, hallucination rates have dropped dramatically in LLMs today. So that helps, that’s a nice curve. What we do is In the Enterprise version, if one LLM finds a bug, for example, we use another LLM to cross-check and validate it. Sounds familiar like humans, right? But we use different LLM brains to cross-check and validate them, kind of eliminating any LLM-specific bias.” – Jason Arbon (testers.ai)
The Test Manager reviews all bugs found, second-guessing with a skeptical mind, and judges the probability that it is a legitimate issue worthy of interrupting the team’s flow. Here at XBOSoft, we review the output, decide if it is a good defect then triage it before giving anything to our clients.
“Testers.ai is continuously learning, so it will constantly reduce hallucinations in the form of test cases and results that aren’t valid over time and at an accelerating rate. Additionally, this is part of XBOSoft’s services, validating all defects. This ensures you can get the benefits from AI with an added human eye (I) as well. The next level would be triaging them and digging deeper, figuring out their priority and what causes the issue. Instead of just providing a bunch of defects, we prioritize and vet them, confirming they’re real, to get rid of the hallucinations.” – Philip Lew (XBOSoft)
Manual testing and driving won’t disappear entirely, as we may never fully trust self-driving cars. But the most repetitive, mundane tasks will vanish. Clicking buttons to check functionality will go the way of cashiers, replaced by AI, just as kiosks now handle checkouts.
Some software testing jobs, such as manual testing and rudimentary script-driven automation, will evolve or fade. AI already handles many of these tasks, with increasing capabilities, but humans must still guide it. Today, that’s called prompt engineering.
The role of the software tester is changing. What comes next? I think only Jason Arbon knows.
Looking for more insights on Agile, DevOps, and quality practices? Explore our latest articles for practical tips, proven strategies, and real-world lessons from QA teams around the world.