Small Teams and Automated QA: How AI Agents Work Through Your Checklists

Friday, 12:30 p.m. Bernd (fictitious name) works on a six-person software team. Today is release day. The tickets are done, the features merged. What is missing: a systematic test. But who is going to do that? There is no QA department. The unspoken rule is: whoever wrote the code clicks through it once. Happy path. Looks good. Ship it.
Monday morning, a customer reaches out. The order form accepts negative quantities. If someone orders “minus five,” the order goes through anyway. Embarrassing. Preventable. And expensive: hotfix on Monday morning, customer communication, trust damage. A bug that a single input validation check would have caught.
Why QA Often Falls Through the Cracks in Small Teams
In small teams, there is no dedicated QA staff. No budget, no role, no process. Testing happens on the side, and “on the side” means in practice: everyone tests the happy path of their own change.
What falls through the cracks is predictable: edge cases, error handling, responsive checks, boundary validation. Not because the knowledge is missing, but because the capacity is missing. Six developers who build features, fix bugs, and support customers simply do not have the time to systematically test every screen against 30 failure scenarios.
The result is familiar to anyone who has worked in a team like this: bugs in production that were preventable. Not the subtle, architectural problems. The obvious ones: a form field that accepts negative numbers, a button unreachable on mobile, an error message that says “Error 500” instead of telling the user what happened.
AI Agents Can Test
In my article “From Prompts to Agents”, I describe how AI language models have evolved from chatbots into autonomous agents. A key point: agents have tools. They can do more than generate text; they can interact with the real world.
For QA, this means something concrete: AI agents have browser access. They can open pages, fill out forms, click buttons, take screenshots, and evaluate the results. And the crucial part: you can give them structured checklists. Not “test the app,” but “check every form against these 50 points and document every finding with severity and screenshot.”
The result is not a vague report, but a structured list of findings that a developer can act on immediately.
How Is This Different from Playwright and Cypress?
Automated UI testing has been around for years. Playwright, Cypress, Selenium: proven frameworks for end-to-end tests. If you are using them, you have already taken a major step. But these tests have a systemic limitation: they are deterministic. They execute exactly the steps that were written. click('#submit'), expect(page).toHaveURL('/success'). Anything not in the test script goes unchecked.
This means: when you build a new screen, you need to write new tests. When you miss an edge case, you will not test for it either. And the number of tests a small team can maintain is limited.
AI agents work differently. An agent understands what a form is. It recognizes input fields, understands their purpose (name, email, quantity), and independently determines which inputs would be valid, borderline, or malicious. It does not need a test script per screen, but a blueprint: “Check every form against these 50 attack vectors.”
The difference in practice: Playwright tells you “Button X leads to page Y.” An AI agent tells you “The quantity field accepts -5, and the form is submitted anyway. This is a business logic error with severity High.” That is a completely different level of insight.
Classical end-to-end tests remain valuable. For regression testing and the CI pipeline, they are indispensable. But AI agents add a dimension that was previously reserved for manual testers: context-aware, exploratory testing.
QA Is Not a Luxury. It Is a Checklist.
When you look at what QA actually means day to day, something stands out: a large portion of the work is systematically working through known checkpoints. Not creative problem-solving, but structured list execution.
- Forms: Leave required fields empty, overly long inputs, special characters, injection attempts
- Responsive: Mobile, tablet, desktop, various screen sizes
- Error handling: Network errors, missing data, timeouts
- Business logic: Negative quantities, discounts over 100%, boundary values, null values
All of this can be expressed as a checklist. And that foundational work is exactly what can be automated.
This does not mean QA work is simple. It means the systematic baseline work can be automated, so that developers (who wear the QA hat in small teams) can focus on the truly tricky cases: unexpected usage scenarios, domain-specific edge cases, UX intuition.
Here is what makes it powerful: precisely because AI agents stubbornly and completely work through every checklist on every screen, they catch things through combination that manual testing misses. A blueprint of 50 checkpoints, consistently applied across 30 forms, uncovers bugs that nobody finds because nobody has the patience to test checkpoint 47 on form 28.
The result is not an either-or. Automation and human expertise together are stronger than either side alone. It is support, not replacement.
Two Agents, Two Perspectives
In practice, a pattern has proven effective: instead of a single “test everything” agent, I work with two complementary roles. Each role takes a different perspective on the application, and together they cover a broad spectrum.
The Adversarial Tester: Breaking Things
The first role is the attacker. This agent gets a clear mandate: try to break the application. Find weaknesses that a malicious user or a careless operator could exploit.
Its checklist typically covers:
- Input abuse: XSS attempts, SQL injection, extreme character lengths, special characters in every field
- State manipulation: Parallel tabs with the same session, back/forward navigation mid-process, session expiry during input
- Authorization bypass: URL manipulation, direct API calls without login, accessing other users’ resources by guessing IDs
- Visual breakage: Extreme data (very long names, empty fields, thousands of entries), display at different font sizes
- Business logic abuse: Negative values, boundary conditions (0, maximum + 1), discounts over 100%, double submission
The result is a structured report: every finding with severity (Critical, High, Medium, Low), reproduction steps, and, when possible, a screenshot.
The Exploratory Tester: Like a First-Time User
The second role is the opposite: an agent that uses the application for the first time. No prior knowledge, no assumptions. Its mandate: go through the app like someone who just opened it, and document everything that is unclear, broken, or frustrating.
Its checklist covers:
- Navigation: Are all links correct? Are there dead ends? Can you find your way back?
- First-time use: Is it clear what to do? Are forms labeled? Is there guidance?
- Error behavior: What happens when errors occur? Are the messages helpful or cryptic?
- Responsiveness: Does everything work on mobile? Are buttons reachable? Does the layout break?
The result is a usability report: less technical, more focused on the user experience. Exactly the kind of feedback that usually falls through the cracks in daily work, because developers know their own app too well to see it with fresh eyes.
What a Report Looks Like
A typical finding from the adversarial test might look like this:
Finding: Negative Quantity in Order Form
Severity: High
Page: /order
Description: The “Quantity” field accepts negative values. When entering “-5,” the form submits without an error message. The order appears in the system with a negative quantity.
Reproduction: 1. Open order form. 2. Select product. 3. Enter “-5” in the Quantity field. 4. Click “Order.” 5. Form submits successfully.
Recommendation: Client-side and server-side validation: quantity must be a positive integer (minimum 1).
Exactly the bug from Bernd’s Friday release. A systematic checklist that tests boundary values on input fields would have found it in seconds. Before it reached the customer.
From Checklist to Report: The Workflow
The practical process can be summarized in four steps:
- Define quality criteria as a checklist. What should be tested? Input validation, responsiveness, error handling, business logic. The checklist does not need to be perfect; it just needs to exist.
- Give the checklist to the agent as a structured instruction. The agent receives the checklist along with the application URL and works through it point by point.
- Agent works through it and documents findings. Every finding is documented with severity, description, and reproduction steps. The report is immediately actionable.
- Human reviews, prioritizes, and decides. The agent finds, the human evaluates. Not every finding is a bug. Not every bug needs an immediate fix. The decision stays with the team.
The key point: the human stays in the decision loop. The agent is a tool that systematically checks and documents. The assessment, prioritization, and decision about what gets fixed when remains with the team.
In my article “AI-Powered Software Development”, I describe the broader context of how AI tools are changing everyday development work. QA is a particularly rewarding use case because the results are immediately measurable: fewer bugs in production, fewer Monday morning hotfixes.
What Changes When QA Is No Longer a Bottleneck
When systematic QA no longer depends on someone manually clicking through every screen, something fundamental changes in the team: confidence in the release.
Teams that know before every release that the baseline checks have run automatically ship faster. Not because they become careless, but because the fear of the undetected bug shrinks. QA shifts from “nobody does it” to “it runs in the background.”
A concrete first step: take the three most common sources of bugs from the past few months. Missing input validation, broken layout on mobile, cryptic error messages. Whatever has slipped through most often in your project. And ask yourself: would a simple checklist have caught that?
In my experience, the answer is almost always: yes. And that checklist can be automated today.
If you want to explore how to use AI agents for QA and other development workflows in your team, an AI coaching session can be the right starting point. And if you want to build a broader foundation for AI-assisted productivity, my article “Using ChatGPT Effectively” offers a practical introduction.
All names of individuals and companies used in this article are fictitious. Any resemblance to real persons or businesses is purely coincidental and unintentional. The examples are provided solely for illustrative purposes.
Related to This Topic
Get the free AI Starter Guide: 10 concrete ways to start using AI productively tomorrow.
Did this article spark an idea? Describe what you have in mind.