Skyvern
Open source browser agent that uses computer vision and language models instead of brittle selectors to complete multi step workflows on any website, with route memorization for deterministic replays.
Skyvern is an open source browser automation agent, backed by Y Combinator, that uses large language models and computer vision to automate web workflows on any website. Instead of the brittle CSS selectors and XPath scripts that break whenever a site changes its layout, Skyvern takes a screenshot, uses a vision language model to find the target elements, and interacts with them the way a person would. Because it reasons about what it sees rather than a fixed structure, it operates on sites it has never seen and survives redesigns automatically. The project has more than twenty thousand stars on GitHub and tens of thousands of users.
A defining feature is its explore and replay pattern, called route memorization. The first time Skyvern completes a task through reasoning, it compiles the successful path into a deterministic Playwright script and runs future identical tasks with that faster, cheaper, repeatable script, dropping the language model out of the loop. When a site changes and the script breaks, it automatically falls back to reasoning to recover the path. It ships primitives to act, extract structured data with a schema, validate page state, and prompt, plus a no code workflow builder.
Skyvern is genuinely model agnostic. Teams bring their own language model from OpenAI, Anthropic, Google Gemini, Azure, AWS Bedrock, or a local model through Ollama, and configure multi model setups. It is open source under a copyleft license and can be fully self hosted with Docker or a Python install and your own database, or run through Skyvern Cloud with bundled anti bot, proxy, and CAPTCHA handling. Extensibility is broad: a Playwright compatible software development kit, a REST API, a command line interface, a Model Context Protocol server, and connectors for Zapier, Make, and N8N.
For the enterprise, Skyvern is SOC 2 Type II and HIPAA compliant per its trust center, integrates credential vaults including Azure Key Vault, 1Password, and Bitwarden, and offers human in the loop for compliance sensitive steps on its enterprise plan. It publishes its own Web Bench benchmark and reports strong WebVoyager results, giving objective data most commercial tools lack. The main trade off is that vision reasoning adds latency and per run cost against a stable, well maintained script, and it stops short of a prebuilt agent marketplace.
Vendor details
Canonical URL
https://skyvern.com
Category
Browser / computer-use agent
Company status
independent
Use cases & customers
In practice
An operations team logs into hundreds of vendor and utility portals each month. Skyvern navigates to the right billing period and pulls invoices, and when a portal layout changes its intent metadata lets the agent recover automatically instead of failing.
A team automates repeat software license renewals through a vendor dashboard. Skyvern learns the checkout path on the first run, its fallbacks handle product variety, and replays run deterministically, flagging if a price or SKU changes.
A developer needs an insurance quote pulled from a carrier site with no API. In a few lines through the Playwright compatible software development kit, Skyvern reasons through the dynamic form, handles the security question, and returns a structured quote.
Sources & related URLs
Research notes
Open-source AI browser automation (LLMs + computer vision). Skyvern-AI. YC-backed, $2.7M raised, 21.9k GitHub stars, 30,000+ users. Founder ex-Faire/Gopuff ML-platforms. AGPL-3.0. VISION-BASED: screenshot → Vision LLM identifies elements → interact (click/type/select/scroll) → verify → continue multi-step. Operates UNSEEN websites (no custom config), resistant to layout-changes (no XPath/selectors). Playwright-compatible SDK (AI on top of Playwright, uses Playwright under hood). No-code workflow builder. 'Swarm of agents' (comprehend/plan/execute, inspired BabyAGI/AutoGPT). STANDOUT = ROUTE MEMORIZATION / 'explore→replay': first success via AI-reasoning → compiles path into DETERMINISTIC Playwright script → future runs use fast script (no LLM, faster/cheaper/deterministic), auto-recovers when site changes. Primitives: act/extract(JSON-schema)/validate/prompt. 5 FULLS: Sec=F (SOC 2 Type II + HIPAA CONFIRMED on trust.skyvern.com [Enterprise tier] + credential-vaults Azure-Key-Vault/1Password/Bitwarden + team-workspaces + pentest-reports), Dep=F (OPEN-SOURCE AGPL self-host via Docker/pip/Postgres/SQLite + Skyvern Cloud managed), Model=F (MODEL-AGNOSTIC BYO-LLM: OpenAI/Anthropic/Gemini/Ollama-local/Azure-GPT4o/AWS-Bedrock + multi-model-setups), Ext=F (Playwright-compatible SDK + REST-API + CLI + open-source + MCP-support + Zapier/Make/N8N), Comp=F (vision-based browser nav, unseen-sites, CAPTCHA/2FA/dynamic-UIs). 8 P's: Int=P (Zapier/Make/N8N + MCP + web; not deep-native-connectors), Orch=P (autonomous multi-step + verify + branch-handling + workflow-builder; single-goal-focused), Know=P (extract-JSON-schema + contextual-info-matching + LLM-inference), HITL=P (human-in-the-loop OPTIONAL Enterprise-plan for compliance-steps + interactable-livestream; not default), Obs=P (livestream watch-runs + dashboard; debug-mode-with-approval is roadmap), Mem=P (route-memorization = learned-workflow-persistence + prompt-caching; not general-agent-memory), Trig=P (API/SDK/dashboard + Zapier/Make/N8N), Eval=P (built+publishes Web-Bench benchmark [5,750 tasks/top-1000-sites] + WebVoyager 85.8% + validate-primitive). 1 N: Pack=N (workflow-builder + community-examples, no prebuilt-agent marketplace). ⭐ NEW BROWSER-LANE #1 (net-new): 9.0, above Notte 8.5. Use cases: healthcare (EMR/prior-auth/claims), insurance, fintech (KYC/invoices), HR, gov-forms, invoices/utility-bills, procurement, job-applications. Pricing: FREE (5,000 credits/mo, no card), Hobby $29/mo (30k credits, 10 concurrent), Pro $149/mo (150k credits, 25 concurrent, team, residential-proxy), Enterprise custom (unlimited, HIPAA, SOC2 Type II, HITL); open-source free self-host → self_serve. Domain skyvern.com. Score 9.0 (5F/8P/1N).
Capability coverage
9.0 / 14 capabilities · 64%
| Integrations & Tool CallingSkyvern connects workflows to other apps through Zapier, Make, and N8N and supports Model Context Protocol, and acts across any website, real integration short of a documented library of deep native business connectors, so partial. | Partial |
|---|---|
| Workflow OrchestrationSkyvern autonomously plans and executes multi step browser workflows, verifying results and adapting to branching paths, and chains steps in a no code workflow builder, real orchestration short of documented autonomous multi agent coordination across systems, so partial. | Partial |
| Knowledge Grounding & RAGSkyvern extracts structured data with schemas and reasons over contextual information to infer answers while navigating, real grounding in page content and provided context short of a documented citation grounded retrieval system, so partial. | Partial |
| Human Oversight & GuardrailsSkyvern offers human in the loop as an optional enterprise feature for compliance sensitive steps and an interactable livestream to intervene, real oversight short of a runtime approval and guardrail enforcement engine enabled by default, so partial. | Partial |
| Security, Identity & GovernanceSkyvern is SOC 2 Type II and HIPAA compliant per its trust center, integrates enterprise credential vaults like Azure Key Vault, 1Password, and Bitwarden, and offers team workspaces, so full. | Full |
| Observability & AuditabilitySkyvern provides a live stream to watch runs in real time and run visibility through its dashboard, real observability short of a documented full per action audit and traceability system, so partial. | Partial |
| Memory & State PersistenceSkyvern's route memorization compiles a successful task path into a deterministic script and replays it on future runs, self healing when sites change, real learned workflow persistence short of a documented general cross session agent memory, so partial. | Partial |
| Deployment & Data ResidencySkyvern is open source under a copyleft license and fully self hostable via Docker or pip with your own database, and also offers a managed cloud, so full. | Full |
| Prebuilt Agents, Templates & PacksSkyvern provides a no code workflow builder and community contributed examples, but a browsable library of prebuilt or cloneable agents could not be verified. | Unable to verify |
| Triggers & Channel CoverageSkyvern tasks are triggered via API, software development kit, dashboard, and through Zapier, Make, and N8N integrations, real triggering short of broad customer facing channel coverage, so partial. | Partial |
| Model Flexibility & RoutingSkyvern is model agnostic, letting users bring OpenAI, Anthropic, Gemini, Azure, AWS Bedrock, or local Ollama models and configure multi model setups, so full. | Full |
| APIs, SDKs & MCP ExtensibilitySkyvern offers a Playwright compatible software development kit, an API, a command line interface, an open source codebase, and Model Context Protocol support, so full. | Full |
| Testing, Debugging & OptimizationSkyvern built and publishes the Web Bench benchmark with thousands of tasks and provides a validate primitive to check page state, real testing and evaluation tooling short of a documented in product evaluation harness for user workflows, so partial. | Partial |
| Browser & Computer UseSkyvern navigates any website like a human using computer vision and vision language models to read the screen and interact, operating on unseen sites and resisting layout changes without selectors, so full. | Full |
Pricing
Free (5,000 credits/mo); Hobby $29/mo, Pro $149/mo; Enterprise custom
Related vendors
- Autotab — General AI agent that uses a mouse and keyboard like a person to do…
- Axiom.ai — No code browser automation tool and Chrome extension for building…
- Browser Use — Open-source library and cloud platform enabling AI agents to browse…
- Browserbase — Browser infrastructure platform for AI agents, providing hosted…
- CopyCat AI — AI powered robotic process automation that builds no code browser…
- Emergence AI — Enterprise agentic infrastructure whose autonomous multi agent…