How to Use Playwright and Claude Code to Verify AI Agent Work

TL;DR: When an AI agent makes a change — a bug fix, a new feature, a data migration — you need more than a green test suite to be confident. Playwright lets Claude Code open a real browser, navigate your app, and capture exactly what happened. Pair it with screencli to get a shareable video of the verification, so anyone on your team can see the proof.

The problem with trusting AI agent output

AI coding agents like Claude Code can now make meaningful, multi-step changes: edit files, run commands, update config, and open PRs. The speed is genuinely useful. The problem is verification.

When a human makes a change, they manually open a browser, click through the relevant flows, and confirm it looks right. When an AI agent makes a change, you get a terminal log and a diff. The diff tells you what changed. It doesn't tell you whether the app still works.

This creates a trust gap. Unit tests and type checks catch structural problems, but they don't tell you whether the checkout flow still loads, whether the error state renders correctly, or whether the feature behaves as described in the ticket.

How Playwright fills the gap

Playwright is a browser automation library that lets you script real browser interactions: navigate to a URL, click buttons, fill forms, assert on what's visible. It's the same browser a real user would use — no mocked DOM, no synthetic events.

Claude Code can drive Playwright to verify its own work:

Make a change (fix a bug, add a feature)
Open a Playwright-controlled browser
Navigate the relevant flows
Confirm the expected behavior is present
Capture the session as evidence

The last step is where screencli comes in. Instead of a text assertion that says "test passed," you get a polished video that shows exactly what happened in the browser — shareable, reviewable, and attached to the PR.

Setting up the workflow

Step 1: Install Playwright

If you don't have Playwright in your project yet:

npm install -D @playwright/test
npx playwright install chromium

That's enough. You don't need the full test runner setup if you're using Playwright for verification only.

Step 2: Install the screencli skill

npx skills add https://github.com/usefulagents/screencli --skill screencli

This gives Claude Code the ability to drive a browser, record the session, and return a shareable link — all from a single instruction.

Step 3: Ask Claude Code to verify

After Claude Code makes a change, follow up with a verification prompt:

"Verify that the fix works. Open a browser, go through the checkout flow, confirm the error no longer appears, and record the session so I can share it with the team."

Claude Code will:

Launch a Playwright-controlled Chromium browser
Navigate to the relevant URL
Perform the verification steps
Record the session with screencli
Return a shareable link

No Playwright scripts to write. No selectors to maintain. No separate test file.

What the output looks like

A typical verification session produces two artifacts:

Terminal output:

✓ Navigated to https://staging.myapp.com/checkout
✓ Added item to cart
✓ Proceeded to checkout
✓ Confirmed error message no longer present
✓ Order placed successfully

✓ Video composed — recordings/b7f3a912/composed.mp4
✓ https://screencli.sh/v/b7f3a912
  1 credit used · 14 remaining

Video: A composed MP4 with auto-zoom to each action, click highlights, cursor trails, and a gradient background — ready to drop into the PR description, Slack, or a Notion page.

The reviewer doesn't need to pull the branch and run the app locally. They watch the 45-second video and see the fix working.

Verification patterns by task type

Different types of agent changes need different verification strategies.

Bug fixes

Focus on reproducing the bug first, then confirming it's gone:

"Reproduce the bug from ticket #847: go to the settings page, click Delete Account, and confirm the confirmation modal no longer has a broken layout. Record the session."

New features

Walk the happy path and at least one edge case:

"Verify the new export feature. Log in, navigate to the reports page, export a CSV, and confirm the download starts correctly. Then try exporting an empty report and confirm the error state is handled gracefully. Record both."

Dependency upgrades

Check the critical paths haven't regressed:

"After the React 19 upgrade, verify that the main product pages still load correctly: homepage, product detail, cart, and checkout. Record the full walkthrough."

Data migrations

Confirm data reads back correctly after a schema change:

"After the migration, open the user profile page for the test account and verify all fields display correctly — especially the new display_name field that was backfilled. Record the session."

Using Playwright directly for assertions

For changes where visual confirmation isn't enough, Claude Code can write and run a focused Playwright script with explicit assertions:

import { test, expect } from '@playwright/test';

test('checkout flow works after fix', async ({ page }) => {
  await page.goto('https://staging.myapp.com/checkout');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="checkout-button"]');
  await expect(page.locator('[data-testid="error-banner"]')).not.toBeVisible();
  await expect(page.locator('[data-testid="order-summary"]')).toBeVisible();
});

Run it with:

npx playwright test verify-checkout.spec.ts

Combine this with a screencli recording and you have both the assertion result (pass/fail) and the visual evidence (the video) in the same PR comment.

How this compares to manual verification

Approach	Time per verification	Shareable evidence	Reproducible	Works in CI
Manual browser check	5–15 min	No	No	No
Unit/integration tests	1–2 min (to write)	No	Yes	Yes
Playwright script + screencli	30–90 sec	Yes	Yes	Yes
Claude Code + screencli (no script)	30–90 sec	Yes	Prompt-based	Partial

The Playwright + screencli path gets you reproducible verification with visual evidence at roughly the same time cost as running your existing test suite.

Integrating into your PR workflow

The most effective place to add this is immediately after Claude Code finishes a task:

1. Claude Code makes the change
2. Claude Code runs unit tests (if applicable)
3. Claude Code verifies in a real browser with screencli
4. Claude Code opens the PR with the verification video in the description

This turns every AI-generated PR from "trust me, the diff looks right" into "here's a video of it working." Reviewers can approve with confidence. QA has a starting point. No one needs to pull the branch to check.

FAQ

Does Claude Code need access to my staging environment? Yes — screencli opens a real browser, so the URL needs to be reachable from wherever Claude Code is running. For local environments, use localhost. For staging, make sure the URL is accessible (or set up a tunnel with ngrok or cloudflared).

Can I record verification sessions behind a login wall? Yes. Use --login --auth <name> on the first run to authenticate manually:

npx screencli record https://staging.myapp.com -p "verify the dashboard loads" --login --auth myapp

After that, the session is saved and Claude Code can reuse it without prompting you again.

What if Claude Code's verification misses something? You can follow up with a more specific prompt. The cost is low — re-running a screencli session takes under 90 seconds. Think of each verification run as a focused probe, not a comprehensive test suite.

Does this replace writing tests? No — it complements them. Tests verify structural correctness at the unit level. Playwright + screencli verifies behavioral correctness at the integration level, with visual evidence. Both have their place.

How much does it cost? screencli is open-source and free to run locally. The cloud plan (hosted video storage and shareable links) starts at $12/month. Each recording uses 1 credit. The free tier includes 15 credits/month — enough for ~15 PR verifications.

Try it on your next Claude Code task → npx screencli record https://your-staging-url.com -p "verify the fix works"