TL;DR: When an AI agent makes a change — a bug fix, a new feature, a data migration — you need more than a green test suite to be confident. Playwright lets Claude Code open a real browser, navigate your app, and capture exactly what happened. Pair it with screencli to get a shareable video of the verification, so anyone on your team can see the proof.
The problem with trusting AI agent output
AI coding agents like Claude Code can now make meaningful, multi-step changes: edit files, run commands, update config, and open PRs. The speed is genuinely useful. The problem is verification.
When a human makes a change, they manually open a browser, click through the relevant flows, and confirm it looks right. When an AI agent makes a change, you get a terminal log and a diff. The diff tells you what changed. It doesn’t tell you whether the app still works.
This creates a trust gap. Unit tests and type checks catch structural problems, but they don’t tell you whether the checkout flow still loads, whether the error state renders correctly, or whether the feature behaves as described in the ticket.
How Playwright fills the gap
Playwright is a browser automation library that lets you script real browser interactions: navigate to a URL, click buttons, fill forms, assert on what’s visible. It’s the same browser a real user would use — no mocked DOM, no synthetic events.
Claude Code can drive Playwright to verify its own work:
- Make a change (fix a bug, add a feature)
- Open a Playwright-controlled browser
- Navigate the relevant flows
- Confirm the expected behavior is present
- Capture the session as evidence
The last step is where screencli comes in. Instead of a text assertion that says “test passed,” you get a polished video that shows exactly what happened in the browser — shareable, reviewable, and attached to the PR.
Setting up the workflow
Step 1: Install Playwright
If you don’t have Playwright in your project yet:
npm install -D @playwright/test
npx playwright install chromium
That’s enough. You don’t need the full test runner setup if you’re using Playwright for verification only.
Step 2: Install the screencli skill
npx skills add https://github.com/usefulagents/screencli --skill screencli
This gives Claude Code the ability to drive a browser, record the session, and return a shareable link — all from a single instruction.
Step 3: Ask Claude Code to verify
After Claude Code makes a change, follow up with a verification prompt:
“Verify that the fix works. Open a browser, go through the checkout flow, confirm the error no longer appears, and record the session so I can share it with the team.”
Claude Code will:
- Launch a Playwright-controlled Chromium browser
- Navigate to the relevant URL
- Perform the verification steps
- Record the session with screencli
- Return a shareable link
No Playwright scripts to write. No selectors to maintain. No separate test file.
What the output looks like
A typical verification session produces two artifacts:
Terminal output:
✓ Navigated to https://staging.myapp.com/checkout
✓ Added item to cart
✓ Proceeded to checkout
✓ Confirmed error message no longer present
✓ Order placed successfully
✓ Video composed — recordings/b7f3a912/composed.mp4
✓ https://screencli.sh/v/b7f3a912
1 credit used · 14 remaining
Video: A composed MP4 with auto-zoom to each action, click highlights, cursor trails, and a gradient background — ready to drop into the PR description, Slack, or a Notion page.
The reviewer doesn’t need to pull the branch and run the app locally. They watch the 45-second video and see the fix working.
Verification patterns by task type
Different types of agent changes need different verification strategies.
Bug fixes
Focus on reproducing the bug first, then confirming it’s gone:
“Reproduce the bug from ticket #847: go to the settings page, click Delete Account, and confirm the confirmation modal no longer has a broken layout. Record the session.”
New features
Walk the happy path and at least one edge case:
“Verify the new export feature. Log in, navigate to the reports page, export a CSV, and confirm the download starts correctly. Then try exporting an empty report and confirm the error state is handled gracefully. Record both.”
Dependency upgrades
Check the critical paths haven’t regressed:
“After the React 19 upgrade, verify that the main product pages still load correctly: homepage, product detail, cart, and checkout. Record the full walkthrough.”
Data migrations
Confirm data reads back correctly after a schema change:
“After the migration, open the user profile page for the test account and verify all fields display correctly — especially the new
display_namefield that was backfilled. Record the session.”
Using Playwright directly for assertions
For changes where visual confirmation isn’t enough, Claude Code can write and run a focused Playwright script with explicit assertions:
import { test, expect } from '@playwright/test';
test('checkout flow works after fix', async ({ page }) => {
await page.goto('https://staging.myapp.com/checkout');
await page.click('[data-testid="add-to-cart"]');
await page.click('[data-testid="checkout-button"]');
await expect(page.locator('[data-testid="error-banner"]')).not.toBeVisible();
await expect(page.locator('[data-testid="order-summary"]')).toBeVisible();
});
Run it with:
npx playwright test verify-checkout.spec.ts
Combine this with a screencli recording and you have both the assertion result (pass/fail) and the visual evidence (the video) in the same PR comment.
How this compares to manual verification
| Approach | Time per verification | Shareable evidence | Reproducible | Works in CI |
|---|---|---|---|---|
| Manual browser check | 5–15 min | No | No | No |
| Unit/integration tests | 1–2 min (to write) | No | Yes | Yes |
| Playwright script + screencli | 30–90 sec | Yes | Yes | Yes |
| Claude Code + screencli (no script) | 30–90 sec | Yes | Prompt-based | Partial |
The Playwright + screencli path gets you reproducible verification with visual evidence at roughly the same time cost as running your existing test suite.
Integrating into your PR workflow
The most effective place to add this is immediately after Claude Code finishes a task:
1. Claude Code makes the change
2. Claude Code runs unit tests (if applicable)
3. Claude Code verifies in a real browser with screencli
4. Claude Code opens the PR with the verification video in the description
This turns every AI-generated PR from “trust me, the diff looks right” into “here’s a video of it working.” Reviewers can approve with confidence. QA has a starting point. No one needs to pull the branch to check.
FAQ
Does Claude Code need access to my staging environment?
Yes — screencli opens a real browser, so the URL needs to be reachable from wherever Claude Code is running. For local environments, use localhost. For staging, make sure the URL is accessible (or set up a tunnel with ngrok or cloudflared).
Can I record verification sessions behind a login wall?
Yes. Use --login --auth <name> on the first run to authenticate manually:
npx screencli record https://staging.myapp.com -p "verify the dashboard loads" --login --auth myapp
After that, the session is saved and Claude Code can reuse it without prompting you again.
What if Claude Code’s verification misses something? You can follow up with a more specific prompt. The cost is low — re-running a screencli session takes under 90 seconds. Think of each verification run as a focused probe, not a comprehensive test suite.
Does this replace writing tests? No — it complements them. Tests verify structural correctness at the unit level. Playwright + screencli verifies behavioral correctness at the integration level, with visual evidence. Both have their place.
How much does it cost? screencli is open-source and free to run locally. The cloud plan (hosted video storage and shareable links) starts at $12/month. Each recording uses 1 credit. The free tier includes 15 credits/month — enough for ~15 PR verifications.
Try it on your next Claude Code task → npx screencli record https://your-staging-url.com -p "verify the fix works"