Patronus AI
Also known as: Patronus, Patronus API
AI evaluation, guardrails, and agent debugging platform with research grade models for hallucination detection, embedded through an API to catch failures before users do.
Patronus AI is an automated platform for evaluating, monitoring, and guardrailing LLM applications and AI agents, built so teams can catch failures like hallucinations, prompt injection, and unsafe output before they reach users. Founded in 2023 in San Francisco by former Meta FAIR researchers Anand Kannappan and Rebecca Qian, it began as a self serve evaluation and security API and has since expanded toward agent simulation. Its central idea is that good evaluation is not just a safety net but a way to systematically improve AI products, and it backs that with proprietary models trained by its own research team.
At the core are research grade evaluators, most notably Lynx, a hallucination detection model that the company reports beats GPT-4o at catching inaccuracies in retrieval augmented generation, and GLIDER, a general purpose small model judge. These come in small variants for low latency real time guardrails and larger variants for deeper offline analysis. Developers embed the Patronus API directly in their code through a language agnostic interface and Python SDK, configure custom LLM judges by describing criteria in plain English, and get back not just a verdict but the exact span of text where a problem occurs. Specialized tools include CopyrightCatcher for reproduced protected content and the FinanceBench benchmark, and the platform adheres to OWASP and NIST standards.
Beyond guardrails, Patronus provides a full platform around evaluation. Patronus Experiments runs side by side A/B tests of prompts, models, and RAG configurations, while production logs and traces feed a dashboard for tracking results and comparing performance over time. Percival is an agent debugger that automatically detects more than twenty failure modes in agentic execution traces and suggests prompt and workflow fixes. In late 2025 the company moved into simulation with what it calls Digital World Models, reinforcement learning environments, and generative simulators that stress test agents against adversarial and chaotic scenarios, a direction funded by a $50M round in 2026.
Patronus is offered as a usage based, pay as you go service. New users sign up, create an API key, and start with five dollars in free credits, then pay per API call, with published rates of around ten dollars per thousand calls for smaller evaluators and twenty dollars per thousand for larger ones. Because cost scales linearly with API traffic, high volume production guardrailing needs careful cost modeling. Enterprise plans add higher rate limits, custom evaluation models, webhooks, and professional services. Customers include AngelList, Pearson, HP, and Fortune 500 firms in regulated industries.
Vendor details
Canonical URL
https://www.patronus.ai
Category
Agent infrastructure
Subcategory
Evaluation and guardrails
Funding status
Founded in 2023 in San Francisco by former Meta AI (FAIR) researchers Anand Kannappan (CEO) and Rebecca Qian (CTO). Raised a $17M Series A and a $50M round in 2026 to expand its agent simulation platform. Customers include AngelList, Pearson, HP, and Fortune 500 companies in finance, healthcare, and legal, with partners including NVIDIA, MongoDB, and IBM. Independent.
Company status
independent
Use cases & customers
Primary use cases
Target customers
Deployment options
Integrations
Embedded into application code through a programming language agnostic API and Python SDK, with custom LLM judges, webhooks, and a web dashboard for logs and experiments. Evaluates RAG and agent pipelines, and ships open research models including the Lynx hallucination detector and the GLIDER judge.
In practice
Your RAG chatbot sometimes states facts not in the source documents. You wire Patronus Lynx in as a real time guardrail, and it flags the hallucinated span before the answer reaches the user.
Your agent fails intermittently across multi step traces and you cannot tell why. Percival inspects the execution traces, identifies which of twenty plus failure modes occurred, and suggests prompt and workflow fixes.
You are choosing between two prompts and three models for a regulated use case. Patronus Experiments runs them side by side against your criteria so you can pick the configuration that scores best before shipping.
Sources & related URLs
Related / legacy domains
Capability coverage
6.0 / 14 capabilities · 43%
| Integrations & Tool CallingEmbeds into application code through a language agnostic API, Python SDK, and webhooks and evaluates RAG and agent pipelines, but it is an evaluation and guardrails layer rather than a tool calling hub. | Partial |
|---|---|
| Workflow OrchestrationEvaluates and guardrails agent output but does not orchestrate production agent execution, sequencing, or branching. | Unable to verify |
| Knowledge Grounding & RAGEvaluates RAG systems for hallucination and groundedness but does not itself provide retrieval or knowledge grounding. | Unable to verify |
| Human Oversight & GuardrailsCore product. Real time guardrails screen responses for hallucinations, prompt injection, safety, and copyright violations using research grade evaluators and custom LLM judges, flagging the exact problem span before output reaches users. | Full |
| Security, Identity & GovernancePositioned as an AI security platform with prompt injection and safety detection and adherence to OWASP and NIST standards, though it focuses on AI output security rather than identity, access governance, or infrastructure compliance certifications. | Partial |
| Observability & AuditabilityProduction logs and traces with a dashboard to track, filter, and compare evaluation performance over time, plus agentic trace analysis through Percival, though it is evaluation centric observability rather than full distributed tracing. | Partial |
| Memory & State PersistenceLogs evaluation results but does not provide an agent memory or state persistence layer. | Unable to verify |
| Deployment & Data ResidencyThe platform API is cloud SaaS, but flagship evaluator models like Lynx are open source and can be self hosted, giving a partial on premises path for evaluation. | Partial |
| Prebuilt Agents, Templates & PacksA library of prebuilt evaluator models (Lynx, GLIDER), specialized detectors like CopyrightCatcher, benchmarks like FinanceBench, and reinforcement learning environments and generative simulators, reusable assets beyond simple metric config, though not prebuilt production agents. | Partial |
| Triggers & Channel CoverageOperates inline on responses in real time and supports webhooks for events, though it has no conversational channel coverage or agent invocation runtime. | Partial |
| Model Flexibility & RoutingModel agnostic, evaluating and benchmarking output from any LLM and letting teams compare models side by side, but it is not a model routing gateway. | Partial |
| APIs, SDKs & MCP ExtensibilityA language agnostic API, Python SDK, webhooks, custom evaluator uploads, and open source evaluator models give a solid extensibility surface, though no MCP server is documented. | Partial |
| Testing, Debugging & OptimizationCore product. Research grade evaluators (Lynx, GLIDER) and custom LLM judges, Patronus Experiments for A/B testing prompts, models, and RAG configs, the Percival agent debugger that detects 20 plus failure modes and suggests fixes, and adversarial simulation to stress test agents. | Full |
| Browser & Computer UseNot applicable. Patronus is an evaluation and guardrails platform and does not provide browser automation or computer use. | Unable to verify |
Pricing
Pay as you go · $5 free credits · ~$10–$20 per 1,000 API calls
Pay as you go per API call (evaluation request), priced by evaluator size; cost scales linearly with API traffic
Included quota
No subscription. $5 in free credits on signup. Published pay as you go rates: about $10 per 1,000 API calls for small evaluators and $20 per 1,000 for large evaluators. Enterprise adds higher rate limits, custom evaluation models, webhooks, and professional services.
What is public
The pay as you go model, $5 free credits, and launch era per call rates are public; current exact rates and enterprise pricing are not fully itemized.
Billing mechanics
Consumption based per evaluation API call, priced by evaluator size (small versus large), with no monthly subscription floor. Enterprise contracts add higher limits, custom models, and services.
Cost watchouts
Every screened response is a billable API call, so always on production guardrails multiply cost with traffic; larger evaluators cost more per call and add response latency.
Variable cost rationale
Pricing is pure consumption, so cost scales linearly with the number of evaluation API calls and the evaluator size. High volume production guardrailing, where every live response is screened, and use of the larger Lynx evaluator can drive substantial and unpredictable bills without careful cost modeling.
Additional watchouts
Real time guardrailing screens every response, so consumption and cost scale directly with production traffic, and the larger Lynx evaluator costs more and adds latency. Model cost carefully for high volume deployments.
Overage / add-ons
Pure consumption pricing; you pay per evaluation API call with no monthly commitment, and cost scales linearly with call volume and evaluator size. The larger Lynx 70B evaluator costs more per call than the smaller evaluators.
Sales call required
No — self-serve available
Free / trial
Self serve signup with $5 in free credits, no subscription. Pay per API call after that. Lynx hallucination model is open source.
Lowest paid plan
Pay as you go: approximately $10 per 1,000 calls (small evaluators), $20 per 1,000 (large evaluators)
Commercial notes
Self serve, developer first with free credits and pay as you go, removing the need to manage open source evaluator models and infrastructure, scaling to enterprise contracts for regulated industries. Differentiates on research grade proprietary evaluators like Lynx and GLIDER.
Key ambiguities
Current exact per call rates (versus the 2024 launch figures) and enterprise pricing are not published on a clean rate card.
Cancellation / refund
Pay as you go has no commitment to cancel. Enterprise terms are contractual.
Support SLA / resale
Self serve and community for pay as you go; higher rate limits, professional services, and enterprise support on enterprise contracts.
Missing data
Current exact per call rates, enterprise pricing, and volume discount tiers are not fully public.
Related vendors
- AgentOps — Agent observability and reliability platform with broad model and…
- Agno — High-performance agent runtime and framework (formerly Phidata) with…
- Apify — Cloud platform for web scraping and automation with 45,000+ prebuilt…
- Arcade — Authenticated tool calling platform and MCP runtime that handles…
- Arize AI — AI observability and evaluation platform that traces, evaluates, and…
- Braintrust — AI evaluation and observability platform with self-serve pricing,…