TL;DR: browser-use leads on benchmarks (89.1% WebVoyager), Stagehand wins on cost with action caching, Playwright CLI is 4x more token-efficient than MCP, and Skyvern handles sites with no usable DOM. To record and share what your agent does, screencli turns any of these into a polished video with one command.
The landscape in 30 seconds
AI agents that control browsers need a framework to do it. There are now five serious options, each with different tradeoffs on token cost, speed, and reliability. Here’s how they compare in March 2026.
| Framework | Language | GitHub stars | WebVoyager score | Cost per action | Best for |
|---|---|---|---|---|---|
| browser-use | Python | 78,000+ | 89.1% | ~$0.07 / 10 steps | Autonomous agents |
| Stagehand | TypeScript | 21,000+ | — | $0.002–0.02 | Hybrid automation |
| Playwright MCP | Any (MCP) | — | — | Free (high token cost) | Chat-style agents |
| Playwright CLI | Any (shell) | — | — | Free (low token cost) | Coding agents |
| Skyvern | Python | 20,000+ | 85.85% | From $29/mo | Legacy/gov sites |
| screencli | Any (CLI) | — | — | Free / Pro $12/mo | Recording agent sessions |
browser-use: highest benchmark score
browser-use holds the current state-of-the-art for autonomous web interaction at 89.1% on WebVoyager (586 diverse web tasks). It’s model-agnostic — works with Claude, GPT-4o, Gemini, or local models via LiteLLM.
The tradeoff is token consumption. Each step requires an LLM call, so a 10-step task costs roughly $0.07. That adds up on long sessions. But for complex, multi-step workflows where the agent needs to reason about what it sees, nothing else comes close on reliability.
pip install browser-use
Use it when: your agent needs to handle unpredictable pages autonomously and you’re working in Python.
Stagehand: cheapest repeat runs
Stagehand takes a different approach. Instead of full autonomy, it gives you three primitives — act, extract, observe — that combine Playwright’s precision with AI reasoning.
The killer feature in v3 (February 2026) is action caching. Actions that succeed once are stored and replayed without an LLM call on subsequent runs. Browserbase reports 44% faster execution on average and up to 80% speedup on repeated workflows, with ~30% cost reduction.
At $0.002–0.02 per action with caching, it’s the cheapest option for repetitive workflows.
import Stagehand from "@browserbasehq/stagehand";
const stagehand = new Stagehand();
await stagehand.init();
await stagehand.page.goto("https://your-app.com");
await stagehand.act("click the Sign In button");
Use it when: you’re in TypeScript, your workflows are repeatable, and you want to minimize LLM spend.
Playwright MCP: zero setup, high token cost
Microsoft’s official Playwright MCP server gives any AI agent browser control through the Model Context Protocol. It uses the accessibility tree for interactions instead of screenshots, which means fast, text-based actions with no vision model overhead.
The catch: the MCP schema for Playwright’s 26 tools costs ~3,600 tokens just to load. A content-rich page can return thousands more tokens of accessibility data per action. One benchmark measured 114,000 tokens for a typical automation task via MCP.
# Claude Code
claude mcp add playwright -- npx @playwright/mcp@latest
Use it when: you need plug-and-play browser control in a chat-style agent without filesystem access. Accept the token overhead.
Playwright CLI: 4x fewer tokens
Playwright CLI launched in February 2026 as Microsoft’s answer to the MCP token problem. Same Playwright engine, but it saves state to disk instead of streaming it back into the context window.
The numbers: 27,000 tokens for the same task that costs 114,000 via MCP — a 4x reduction. The skill definition is ~68 tokens total versus 3,600 for the MCP schema. On longer sessions, early adopters report up to 10x fewer tokens.
# Install
npm i @playwright/cli@latest
# Use from any coding agent
npx playwright-cli navigate https://your-app.com
npx playwright-cli click "Sign In"
npx playwright-cli screenshot
Use it when: your agent has filesystem access (Claude Code, Cursor, Copilot) and you care about token efficiency.
Skyvern: no selectors needed
Skyvern uses computer vision + LLM reasoning to interact with pages without relying on DOM selectors or accessibility trees. It looks at what’s on screen and decides what to click.
This makes it the only viable option for government portals, legacy enterprise apps, and sites where the DOM is inaccessible or meaningless. It scored 85.85% on WebVoyager with its 2.0 release, and it’s the best-performing agent specifically on form-filling tasks.
Starting at $29/month with 30,000 credits, it’s priced for production use rather than experimentation.
Use it when: you’re automating sites with inaccessible DOMs, heavy iframes, or anti-bot measures that break selector-based tools.
Record what your agent does with screencli
These frameworks automate the browser. But none of them produce a shareable video of what happened. That’s the missing piece.
screencli is an open-source screen recording CLI built for AI agents. It wraps Playwright under the hood — your agent navigates the page, and screencli records the session with auto-trim, auto-zoom, click highlights, and gradient backgrounds. One command, one shareable link.
npx screencli record https://your-app.com -p "Demo the checkout flow"
# → https://screencli.sh/v/a3f2c8e1
It pairs with any of the frameworks above. Your agent automates the browser. screencli turns that session into a polished video you can drop into a PR, a changelog, or a tweet.
How to pick
Budget-constrained, repeatable workflows: Stagehand. Action caching pays for itself fast.
Highest reliability on unknown pages: browser-use. The benchmark scores speak for themselves.
Token-efficient coding agent integration: Playwright CLI. 4x savings over MCP, disk-based state.
Legacy or government sites: Skyvern. Vision-based approach bypasses DOM entirely.
Quick prototype, no filesystem: Playwright MCP. Zero config, accepts the token cost.
Record and share the result: screencli. Turns any agent session into a shareable video.
FAQ
Which browser automation framework is best for Claude Code? Playwright CLI. It was designed for coding agents with filesystem access and uses 4x fewer tokens than Playwright MCP. browser-use is a strong alternative for complex multi-step tasks.
How much does browser automation cost with AI agents? Ranges from free (Playwright CLI/MCP) to ~$0.07 per 10-step task (browser-use) to $29/month (Skyvern). Stagehand’s action caching can reduce repeat workflow costs by ~30%.
Can I record what my AI agent does in the browser? Yes. Tools like screencli record AI-driven browser sessions into shareable videos with auto-zoom, click highlights, and gradient backgrounds — one command, no manual recording.
What’s the difference between Playwright MCP and Playwright CLI? Both use the same Playwright engine. MCP streams browser state into the LLM context window (high token cost, no filesystem needed). CLI saves state to disk and lets the agent read what it needs (low token cost, requires filesystem access).
Which framework has the highest success rate? browser-use leads with 89.1% on WebVoyager. Skyvern follows at 85.85%, with particular strength on form-filling tasks.
Try screencli free → screencli.sh