Back to Recipes

Playwright Browser Agent

AI-guided browser automation for web research, data extraction, form filling, and multi-step web tasks.

Best for: Developers, data teams, operations, researchers

What You Get

  • -Headless browser control with Playwright
  • -AI-guided navigation and decision making
  • -Data extraction pipeline
  • -Screenshot and PDF capture
  • -Multi-page workflow automation

Step by Step

1. Set up Playwright and browser management

Install Playwright with browser binaries. Create a browser pool that manages multiple contexts. Configure stealth options: randomized viewports, human-like mouse movements, and realistic user agents.

2. Build the task planner

Create an LLM-based planner that takes a natural language task and breaks it into browser steps. Each step has: action type (navigate, click, type, extract, screenshot), selector or URL, and expected outcome.

3. Implement the execution engine

Build a step executor that runs each plan step in Playwright. Include: smart wait strategies (networkidle, selector visibility), error recovery (retry with alternative selector), and timeout handling (30s per step default).

4. Add data extraction

Implement structured data extraction. Support: table extraction, list scraping, text content by selector, attribute extraction, and full page screenshots. Return results as JSON.

5. Handle anti-bot measures

Add CAPTCHA detection (look for known CAPTCHA iframes/text), skip or alert on detection. Handle login popups, cookie consent banners, and infinite scroll pages.

6. Build the web UI

Create a simple interface: URL input, task description textarea, run button, and results display. Show live logs of each step. Allow downloading results as JSON or CSV.

7. Add session management

Persist browser sessions to avoid re-login. Store cookies and local storage between runs. Add a session timeout of 15 minutes of inactivity.

Stack

PlaywrightOpenAI OpenAI / ClaudeNode.js/TypeScriptBrowser context management

Build This

Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to build this recipe.

Build me a browser automation agent using Playwright. It should: 1) Accept a natural language task description (e.g. 'Find the top 5 AI startups in Y Combinator and extract their names, founders, and descriptions'). 2) Use an LLM to break the task into browser steps (navigate, search, click, extract). 3) Execute each step via Playwright with proper waits and error handling. 4) Extract structured data from pages and compile results. 5) Handle authentication popups, CAPTCHAs gracefully (skip or alert). 6) Return results as JSON and optionally as a CSV download. Include a simple web UI where users can paste a URL and describe what to extract.

Common Failure Modes

  • !Sites with aggressive bot detection
  • !Dynamic content that requires JavaScript
  • !Session timeouts on long workflows
  • !CAPTCHA blocks

Implementation Notes

Use stealth Playwright configuration. Run in headless mode but keep screenshots for debugging. Set reasonable timeouts per step.

Related skill: playwright lead research

Want playwright browser agent running in your business?

4M Labs can deploy playwright browser agent as a production workflow:

  • Connected to your tools and data sources
  • Secured for your team with proper access controls
  • Deployed with monitoring and error handling
  • Documented for handoff and future maintenance
Book an Implementation Sprint