Back to vendors

Patronus AI

Q: Who is Patronus AI for?

AI engineering teams, enterprise

Also known as: Patronus, Patronus API

Visit site

Agent infrastructureindependentVerified 2026-06-30

AI evaluation, guardrails, and agent debugging platform with research grade models for hallucination detection, embedded through an API to catch failures before users do.

Patronus AI is an automated platform for evaluating, monitoring, and guardrailing LLM applications and AI agents, built so teams can catch failures like hallucinations, prompt injection, and unsafe output before they reach users. Founded in 2023 in San Francisco by former Meta FAIR researchers Anand Kannappan and Rebecca Qian, it began as a self serve evaluation and security API and has since expanded toward agent simulation. Its central idea is that good evaluation is not just a safety net but a way to systematically improve AI products, and it backs that with proprietary models trained by its own research team.

At the core are research grade evaluators, most notably Lynx, a hallucination detection model that the company reports beats GPT-4o at catching inaccuracies in retrieval augmented generation, and GLIDER, a general purpose small model judge. These come in small variants for low latency real time guardrails and larger variants for deeper offline analysis. Developers embed the Patronus API directly in their code through a language agnostic interface and Python SDK, configure custom LLM judges by describing criteria in plain English, and get back not just a verdict but the exact span of text where a problem occurs. Specialized tools include CopyrightCatcher for reproduced protected content and the FinanceBench benchmark, and the platform adheres to OWASP and NIST standards.

Beyond guardrails, Patronus provides a full platform around evaluation. Patronus Experiments runs side by side A/B tests of prompts, models, and RAG configurations, while production logs and traces feed a dashboard for tracking results and comparing performance over time. Percival is an agent debugger that automatically detects more than twenty failure modes in agentic execution traces and suggests prompt and workflow fixes. In late 2025 the company moved into simulation with what it calls Digital World Models, reinforcement learning environments, and generative simulators that stress test agents against adversarial and chaotic scenarios, a direction funded by a $50M round in 2026.

Patronus is offered as a usage based, pay as you go service. New users sign up, create an API key, and start with five dollars in free credits, then pay per API call, with published rates of around ten dollars per thousand calls for smaller evaluators and twenty dollars per thousand for larger ones. Because cost scales linearly with API traffic, high volume production guardrailing needs careful cost modeling. Enterprise plans add higher rate limits, custom evaluation models, webhooks, and professional services. Customers include AngelList, Pearson, HP, and Fortune 500 firms in regulated industries.

Vendor details

Canonical URL

https://www.patronus.ai

Subcategory

Evaluation and guardrails

Funding status

Founded in 2023 in San Francisco by former Meta AI (FAIR) researchers Anand Kannappan (CEO) and Rebecca Qian (CTO). Raised a $17M Series A and a $50M round in 2026 to expand its agent simulation platform. Customers include AngelList, Pearson, HP, and Fortune 500 companies in finance, healthcare, and legal, with partners including NVIDIA, MongoDB, and IBM. Independent.

Company status

independent

Use cases & customers

Primary use cases

hallucination detectionAI guardrailsagent evaluationagent debuggingagent simulation

Target customers

AI engineering teamsenterprise

Deployment options

SaaS

Integrations

Embedded into application code through a programming language agnostic API and Python SDK, with custom LLM judges, webhooks, and a web dashboard for logs and experiments. Evaluates RAG and agent pipelines, and ships open research models including the Lynx hallucination detector and the GLIDER judge.

In practice

Your RAG chatbot sometimes states facts not in the source documents. You wire Patronus Lynx in as a real time guardrail, and it flags the hallucinated span before the answer reaches the user.

Your agent fails intermittently across multi step traces and you cannot tell why. Percival inspects the execution traces, identifies which of twenty plus failure modes occurred, and suggests prompt and workflow fixes.

You are choosing between two prompts and three models for a regulated use case. Patronus Experiments runs them side by side against your criteria so you can pick the configuration that scores best before shipping.

Sources & related URLs

Related / legacy domains

https://app.patronus.ai https://www.patronus.ai/announcements/patronus-ai-launches-industry-first-self-serve-api-for-ai-evaluation-and-guardrails

Research sources

https://www.patronus.ai https://www.patronus.ai/announcements/patronus-ai-launches-industry-first-self-serve-api-for-ai-evaluation-and-guardrails https://venturebeat.com/ai/patronus-ai-launches-worlds-first-self-serve-api-to-stop-ai-hallucinations

Capability coverage

6.0 / 14 capabilities · 43%

Integrations & Tool CallingEmbeds into application code through a language agnostic API, Python SDK, and webhooks and evaluates RAG and agent pipelines, but it is an evaluation and guardrails layer rather than a tool calling hub.	Partial
Workflow OrchestrationEvaluates and guardrails agent output but does not orchestrate production agent execution, sequencing, or branching.	Unable to verify
Knowledge Grounding & RAGEvaluates RAG systems for hallucination and groundedness but does not itself provide retrieval or knowledge grounding.	Unable to verify
Human Oversight & GuardrailsCore product. Real time guardrails screen responses for hallucinations, prompt injection, safety, and copyright violations using research grade evaluators and custom LLM judges, flagging the exact problem span before output reaches users.	Full
Security, Identity & GovernancePositioned as an AI security platform with prompt injection and safety detection and adherence to OWASP and NIST standards, though it focuses on AI output security rather than identity, access governance, or infrastructure compliance certifications.	Partial
Observability & AuditabilityProduction logs and traces with a dashboard to track, filter, and compare evaluation performance over time, plus agentic trace analysis through Percival, though it is evaluation centric observability rather than full distributed tracing.	Partial
Memory & State PersistenceLogs evaluation results but does not provide an agent memory or state persistence layer.	Unable to verify
Deployment & Data ResidencyThe platform API is cloud SaaS, but flagship evaluator models like Lynx are open source and can be self hosted, giving a partial on premises path for evaluation.	Partial
Prebuilt Agents, Templates & PacksA library of prebuilt evaluator models (Lynx, GLIDER), specialized detectors like CopyrightCatcher, benchmarks like FinanceBench, and reinforcement learning environments and generative simulators, reusable assets beyond simple metric config, though not prebuilt production agents.	Partial
Triggers & Channel CoverageOperates inline on responses in real time and supports webhooks for events, though it has no conversational channel coverage or agent invocation runtime.	Partial
Model Flexibility & RoutingModel agnostic, evaluating and benchmarking output from any LLM and letting teams compare models side by side, but it is not a model routing gateway.	Partial
APIs, SDKs & MCP ExtensibilityA language agnostic API, Python SDK, webhooks, custom evaluator uploads, and open source evaluator models give a solid extensibility surface, though no MCP server is documented.	Partial
Testing, Debugging & OptimizationCore product. Research grade evaluators (Lynx, GLIDER) and custom LLM judges, Patronus Experiments for A/B testing prompts, models, and RAG configs, the Percival agent debugger that detects 20 plus failure modes and suggests fixes, and adversarial simulation to stress test agents.	Full
Browser & Computer UseNot applicable. Patronus is an evaluation and guardrails platform and does not provide browser automation or computer use.	Unable to verify

Recent platform changes

No recent material changes tracked yet.

View all changes for Patronus AI →

Pricing

Pay as you go · $5 free credits · ~$10–$20 per 1,000 API calls

Pay as you go per API call (evaluation request), priced by evaluator size; cost scales linearly with API traffic

Public — partialHigh variable costTrial available

Included quota

No subscription. $5 in free credits on signup. Published pay as you go rates: about $10 per 1,000 API calls for small evaluators and $20 per 1,000 for large evaluators. Enterprise adds higher rate limits, custom evaluation models, webhooks, and professional services.

What is public

The pay as you go model, $5 free credits, and launch era per call rates are public; current exact rates and enterprise pricing are not fully itemized.

Billing mechanics

Consumption based per evaluation API call, priced by evaluator size (small versus large), with no monthly subscription floor. Enterprise contracts add higher limits, custom models, and services.

Cost watchouts

Every screened response is a billable API call, so always on production guardrails multiply cost with traffic; larger evaluators cost more per call and add response latency.

Variable cost rationale

Pricing is pure consumption, so cost scales linearly with the number of evaluation API calls and the evaluator size. High volume production guardrailing, where every live response is screened, and use of the larger Lynx evaluator can drive substantial and unpredictable bills without careful cost modeling.

Additional watchouts

Real time guardrailing screens every response, so consumption and cost scale directly with production traffic, and the larger Lynx evaluator costs more and adds latency. Model cost carefully for high volume deployments.

Overage / add-ons

Pure consumption pricing; you pay per evaluation API call with no monthly commitment, and cost scales linearly with call volume and evaluator size. The larger Lynx 70B evaluator costs more per call than the smaller evaluators.

Sales call required

No — self-serve available

Free / trial

Self serve signup with $5 in free credits, no subscription. Pay per API call after that. Lynx hallucination model is open source.

Lowest paid plan

Pay as you go: approximately $10 per 1,000 calls (small evaluators), $20 per 1,000 (large evaluators)

Commercial notes

Self serve, developer first with free credits and pay as you go, removing the need to manage open source evaluator models and infrastructure, scaling to enterprise contracts for regulated industries. Differentiates on research grade proprietary evaluators like Lynx and GLIDER.

Key ambiguities

Current exact per call rates (versus the 2024 launch figures) and enterprise pricing are not published on a clean rate card.

Cancellation / refund

Pay as you go has no commitment to cancel. Enterprise terms are contractual.

Support SLA / resale

Self serve and community for pay as you go; higher rate limits, professional services, and enterprise support on enterprise contracts.

Missing data

Current exact per call rates, enterprise pricing, and volume discount tiers are not fully public.

Verified 2026-06-30

Official pricing page

Data confidence: high