May 13, 2026
Multi-domain crawler hardening
The URL crawler now respects robots.txt, enforces plan-tier page caps, and detects JS-rendered pages so the content you indexed actually matches what visitors see.
improvement
The URL crawler got a significant hardening pass. Three things changed:
- *robots.txt compliance.** The crawler now fetches and parses robots.txt before indexing any page. If a path is disallowed for our user-agent, we skip it silently and log it so you can see what was excluded.
- *Plan-tier page caps.** Free plans are capped at 3 pages per crawl run, Starter at 25, Pro at 100, Business at 500. Previously there was no server-side cap and a misconfigured bot could hit a large domain and ingest thousands of irrelevant pages.
- *JS-rendered page detection.** If a page returns near-empty HTML because it's client-side rendered, we detect that and fall back to a headless fetch path so the content you indexed matches what a real visitor sees.
Commit: 09181ee