What You Get
- -Multi-source lead scraping
- -AI-powered data enrichment
- -Duplicate detection
- -CRM integration (HubSpot/Salesforce)
- -Scheduled scraping jobs
Step by Step
1. Build the scraping engine
Use Playwright to scrape Google Maps search results. For each listing, extract: name, phone, website, address, rating, review count, and category. Handle pagination up to 50 results. Export raw data to a staging table.
2. Implement AI enrichment
For each lead with a website, use Playwright to visit and capture the homepage content. Send to OpenAI to extract: estimated company size, tech stack, target market, and potential pain points. Store enrichment JSON.
3. Build duplicate detection
Query existing CRM contacts by email/domain. Use fuzzy matching on company name for non-email leads. Flag potential duplicates with a confidence score for manual review.
4. Create CRM integration
Implement HubSpot (or Salesforce) API client. Map scraped fields to CRM contact properties. Batch create/update contacts (max 100 per request). Handle rate limits with exponential backoff.
5. Build the review dashboard
Create a Next.js dashboard showing: leads found, enriched, duplicates, and pending review. Allow users to approve/reject leads before CRM push. Show scraping stats and history.
6. Add scheduling
Implement cron-based scheduled scraping jobs. Configurable: daily, weekly, or manual. Each run logs stats and sends a Slack notification with summary.
Stack
Build This
Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to build this recipe.
Common Failure Modes
- !Google Maps blocks scraping after threshold
- !Email discovery is unreliable
- !CRM API rate limits
- !Duplicate matching is imprecise
Implementation Notes
Use rotating proxies for Google Maps. Always allow manual review before CRM push for the first batch. Set reasonable daily scrape limits.
Want lead scraper to crm running in your business?
4M Labs can deploy lead scraper to crm as a production workflow:
- Connected to your tools and data sources
- Secured for your team with proper access controls
- Deployed with monitoring and error handling
- Documented for handoff and future maintenance