PROLOGUE: THE PROBLEM THAT MADE OPENROUTER INEVITABLE

There is a peculiar irony at the heart of the 2026 AI landscape. The world has
never had more powerful language models available to developers. You can reach
for Anthropic's Claude Sonnet 4.6, Google's Gemini 3.1 Pro Preview, OpenAI's
GPT-5.5, Meta's Llama 4 Scout with its ten-million-token context window,
DeepSeek's V4 Pro at a fraction of the cost of any Western equivalent, or any
of more than five hundred other models spanning text, image, audio, and video
generation. The embarrassment of riches is real and growing.

And yet this very abundance creates a new kind of poverty: the poverty of
integration.

Every provider has its own API. Every provider has its own authentication
scheme, its own rate-limiting philosophy, its own billing portal, its own
quirky error codes, and its own way of structuring a chat completion request.
If you want to experiment with three different models to find the best one for
your use case, you need three different accounts, three different API keys, and
three different integration layers. If one provider goes down, your application
goes down with it. If a new, better, cheaper model appears on the scene — and
in 2026 that happens roughly every two weeks — you need to refactor your
integration from scratch.

This is the problem that OpenRouter was built to solve. It is, in the most
literal sense, a router for the AI world: a single, unified gateway that sits
in front of the entire ecosystem of large language models and presents them all
through one coherent, standardized interface. Think of it as the USB-C port of
the AI world. No matter what device you plug in, the connector is the same.

By May 2026, OpenRouter has grown from a clever idea into critical
infrastructure. It processes over twenty trillion tokens per week — a fourfold
increase from the five trillion it handled in April 2025. It has raised $168
million in total funding. It is reportedly in discussions for a $120 million
round at a $1.3 billion valuation, with Google's CapitalG as lead investor.
And it has become the de facto standard for developers who need to work with
multiple AI models without losing their minds.

What follows is a thorough, honest, and occasionally irreverent exploration of
OpenRouter as it stands in May 2026: its origins, its architecture, its
features, its limitations, its pricing, and the practical craft of using it
in real code. By the end, you will have a complete picture of one of the most
interesting infrastructure plays in the AI industry today.

CHAPTER ONE: THE ORIGIN STORY — FROM OPENSEA TO THE AI SWITCHBOARD


To understand OpenRouter, you need to understand the moment in history that
made it inevitable. The year was 2022, and the AI world was still largely
organized around a single gravitational center: OpenAI. GPT-3 had demonstrated
that large language models could do genuinely useful things, and GPT-4 was on
the horizon. Most developers who wanted to build AI-powered applications simply
signed up for an OpenAI account and called it a day.

But the landscape was shifting. Anthropic, founded by former OpenAI researchers,
was building Claude. Google was working on what would become the Gemini family.
Meta was preparing to open-source the Llama series, which would unleash a
Cambrian explosion of fine-tuned variants, specialized models, and community
experiments. Mistral AI was being founded in Paris by researchers who believed
that smaller, more efficient models could punch far above their weight class.
And in China, a research lab called DeepSeek was quietly building models that
would eventually shake the entire industry's assumptions about cost and
capability.

Alex Atallah was watching all of this unfold with a developer's eye. Atallah
was the co-founder and CTO of OpenSea, the NFT marketplace that had become one
of the defining companies of the Web3 era. He stepped down from OpenSea in
July 2022, looking for his next act. What he found was a problem hiding inside
an opportunity.

Before building OpenRouter, Atallah launched a project called Window AI, an
open-source Chrome extension that allowed users to plug their preferred language
model into any web application that supported it. Window AI was a fascinating
experiment in user-controlled AI, but it was also a proof of concept for a
deeper insight: the future of AI applications was not going to be monolithic.
Developers and users were going to want to choose their models, switch between
them, and compare them. The infrastructure to support that choice simply did
not exist yet.

In early 2023, Atallah, together with his co-founder Louis Vichy, launched
OpenRouter. The initial vision was relatively modest: a place to collect and
help people understand different models. The OpenAI API had already established
a de facto standard for how chat completion requests should be structured, so
OpenRouter made the smart decision to be fully compatible with that standard.
Any code that already talked to OpenAI could talk to OpenRouter with a change
of two lines.

The timing was perfect. Within months of launch, the AI model landscape
exploded in exactly the way Atallah had anticipated. New models appeared almost
weekly. Providers multiplied. The need for a unified gateway became not just
convenient but genuinely urgent for serious developers.

The growth numbers tell the story clearly. In May 2025, OpenRouter was
processing roughly five trillion tokens per week and generating approximately
five million dollars in annualized revenue. By October 2025, that had doubled
to ten million. By early 2026, Sacra estimated annualized revenue at fifty
million dollars. By April 2026, weekly token throughput had crossed twenty
trillion — a fourfold year-over-year increase. The platform that started as a
model aggregator had become a piece of critical infrastructure for the AI
industry.

Funding followed the growth curve. A combined seed and Series A round of forty
million dollars was announced in June 2025, at a five-hundred-million-dollar
valuation, with participation from Andreessen Horowitz, Menlo Ventures, Sequoia
Capital, Figma, and Fred Ehrsam. By March 2026, total funding raised had
reached $168 million. As of May 2026, OpenRouter is reportedly in discussions
for a $120 million round at a $1.3 billion valuation, with Google's CapitalG
as a lead investor — a remarkable trajectory for a company that is, at its
core, a very smart proxy server.

CHAPTER TWO: WHAT OPENROUTER ACTUALLY IS
At its technical core, OpenRouter is an API proxy and routing layer. When your
application sends a request to OpenRouter, OpenRouter translates that request
into the format expected by the underlying provider, forwards it, receives the
response, translates it back into a standardized format, and returns it to your
application. Your code never needs to know whether it is talking to Anthropic's
servers or Google's infrastructure. It just sends a request and receives a
response.

The API itself is designed to be a drop-in replacement for the OpenAI API. The
base URL changes from "https://api.openai.com/v1" to
"https://openrouter.ai/api/v1", and the API key changes from your OpenAI key
to your OpenRouter key. Everything else — the structure of the request body,
the shape of the response, the way streaming works, the format of tool calls —
can remain identical. For developers who have already built applications on top
of the OpenAI SDK, the migration path to OpenRouter is measured in minutes,
not days.

OpenRouter maintains a model registry that is updated continuously. Each model
is identified by a string in the format "provider/model-name", such as
"anthropic/claude-sonnet-4-6", "openai/gpt-5.5", "google/gemini-3-1-pro-
preview", or "deepseek/deepseek-v4-pro". This naming convention is clean,
predictable, and self-documenting.

The platform also provides a web interface at openrouter.ai where you can
browse models, compare their capabilities and pricing, inspect their context
window sizes, check which features they support, and read about their data
retention policies. This transparency is one of OpenRouter's genuine strengths.

As of May 2026, OpenRouter provides access to over five hundred models from
more than sixty providers. The platform supports over ten modalities including
text, vision, audio, and — as of April 2026 — video generation. It processes
more than one trillion tokens daily and has demonstrated sustained ten to one
hundred percent month-over-month growth over two years.

CHAPTER THREE: THE FEATURE LANDSCAPE — EVERYTHING THE PLATFORM CAN DO

OpenRouter's feature set has grown considerably since its early days as a
simple model aggregator. Understanding these features in depth is essential for
using the platform effectively.


UNIFIED API AND OPENAI COMPATIBILITY

The foundational feature is the unified API. Every model on OpenRouter is
accessible through the same endpoint, the same authentication mechanism, and
the same request structure. The chat completions endpoint at
"https://openrouter.ai/api/v1/chat/completions" accepts a JSON body that is
structurally identical to what the OpenAI API expects. This means that the
entire ecosystem of tools, libraries, and frameworks built around the OpenAI
API works with OpenRouter out of the box. LangChain, LlamaIndex, AutoGen,
CrewAI, and countless other orchestration frameworks can be pointed at
OpenRouter with minimal configuration changes.


INTELLIGENT ROUTING AND PROVIDER SELECTION

When you request a specific model, OpenRouter does not necessarily send your
request to a single, fixed endpoint. Many popular models are served by multiple
providers simultaneously. By default, OpenRouter uses intelligent routing to
select the best provider for each request, optimizing for a combination of
cost, speed, and availability.

This routing behavior is highly configurable. You can pass a "provider" object
in your request body to express preferences. The "order" field lets you specify
a ranked list of preferred providers. The "only" field restricts routing to a
specific set of providers. The "avoid" field excludes providers you do not want
to use. The "require_parameters" field, when set to true, ensures that
OpenRouter only routes to providers that fully support all the parameters you
are passing, preventing silent parameter drops.


AUTO EXACTO: ADAPTIVE QUALITY ROUTING (MARCH 2026)

One of the most significant routing advances of 2026 is Auto Exacto, launched
in March 2026 and now on by default for all tool-calling requests. OpenRouter
observed that cheaper hosts for open-source models sometimes have higher
latency or silently drop tool-calling schemas — a subtle but devastating problem
for agentic applications.

Auto Exacto addresses this by dynamically reordering providers based on real-
world performance signals collected from billions of requests. These signals
include real-time tokens-per-second throughput, tool-calling success rates, and
internal benchmark data. The system re-evaluates providers approximately every
five minutes, ensuring routing decisions reflect current conditions rather than
stale assumptions.

Auto Exacto is an evolution of the earlier "Exacto" endpoints, which were hand-
curated and showed a ten to twenty percent improvement in benchmark scores but
required manual updates and explicit selection. Auto Exacto automates this
process entirely. Users who prefer price-weighted routing can opt out by using
the "provider.sort" parameter, appending ":floor" to the model slug, or setting
a default sort preference in their account settings.


AUTOMATIC FALLBACKS AND RESILIENCE

OpenRouter supports automatic failover between models using a fallback mechanism.
You can provide an array of model IDs in priority order. If the primary model
fails due to downtime, rate limiting, context length violations, or moderation
flags, OpenRouter automatically tries the next model in the list. This
transforms what would otherwise be a hard failure into a graceful degradation,
keeping your application running even when individual providers experience
problems.


THE AUTO ROUTER

OpenRouter offers a special model identifier, "openrouter/auto", which activates
its own automatic model selection logic. When you use this identifier,
OpenRouter analyzes your request and selects what it considers the best and
most cost-effective model for the task at hand, based on its own evaluation
data and performance benchmarks.


STREAMING RESPONSES

OpenRouter fully supports server-sent event streaming, the mechanism by which
language models can send their responses token by token as they are generated.
Streaming is essential for chat interfaces and any application where perceived
responsiveness matters. Streaming responses are billed per token, identically
to non-streaming requests.


TOOL CALLING AND FUNCTION CALLING

OpenRouter standardizes the tool calling interface across all models that
support it. With Auto Exacto active, tool-calling requests are automatically
routed to the providers most likely to handle them correctly, addressing the
reliability problems that have historically plagued tool use with open-source
models served by third-party inference providers.


STRUCTURED OUTPUTS

OpenRouter supports structured output enforcement through the "response_format"
parameter. You can specify a JSON schema, and OpenRouter will route your request
to providers that support schema-constrained generation.


MULTIMODAL INPUTS AND OUTPUTS

OpenRouter supports text, images, audio, PDF documents, and — as of April 2026
— video generation. The interface for passing these inputs is standardized
across providers. The new Audio APIs, launched May 1, 2026, provide access to
text-to-speech and audio transcription through a single endpoint covering
multiple providers. The Video Generation feature, launched April 15, 2026,
supports text-to-video, image-to-video, and reference-image-guided generation,
with models including Google's Veo 3.1 Lite, Kling Video O1 from Kuaishou,
and Alibaba's Wan 2.6.


WEB SEARCH AND FETCH

OpenRouter provides consistent web search and page fetch
capabilities across every tool-calling model on the platform. This means any
model that supports tool calling can now search the web and retrieve page
content through OpenRouter's standardized interface, with multiple search and
fetch engine options available.


ZERO DATA RETENTION

For applications handling sensitive information, OpenRouter offers a Zero Data
Retention mode, which can be enabled globally for your account or on a per-
request basis. When ZDR is active, OpenRouter restricts routing to only those
provider endpoints that have committed to not storing your data.


RESPONSE CACHING

A new Response Caching header allows identical API
requests to be cached, resulting in faster responses at no additional cost.
This is particularly valuable for applications that repeatedly send similar
prompts, such as classification tasks or template-based generation.


WORKSPACES

OpenRouter introduced Workspaces, allowing teams to organize
their API keys, usage tracking, and configuration settings into separate
environments. Guardrail definitions can be copied between workspaces, making
it easier to maintain consistent safety configurations across projects.


THE AGENT SDK

The OpenRouter Agent SDK, released April 24, 2026, is a TypeScript toolkit
specifically designed for building multi-turn agentic workflows. It provides
a "callModel" function that transforms a chat completion into a multi-step
agent with tool calls, stop conditions, and cost tracking across all 300+ models
on the platform. This SDK is covered in depth in Chapter Thirteen.


HUMAN-IN-THE-LOOP TOOLS

Documentation for a new human-in-the-loop tool type was released May 6, 2026.
This enables agents to pause execution and await human input before continuing,
which is essential for workflows where certain decisions require human judgment
rather than autonomous model action.


CLI ACCOUNT CREATION

Users can now create OpenRouter accounts and API keys via the command line
interface using Stripe Projects. This is particularly
useful for automated deployment pipelines and developer tooling that needs to
provision OpenRouter credentials programmatically.


BRING YOUR OWN KEYS

If you already have direct API relationships with providers like OpenAI or
Anthropic, you can configure OpenRouter to use your own provider API keys rather
than OpenRouter's pooled keys. In this mode, OpenRouter charges a five percent
usage fee on the underlying provider cost. The BYOK free tier allows up to one
million free requests per month before the fee applies.


USAGE TRACKING AND ANALYTICS

OpenRouter provides a dashboard where you can track your spending per model,
per API key, and over time. You can create multiple API keys and assign them to
different projects or environments, making it easy to separate production costs
from development and testing costs.

CHAPTER FOUR: THE MODEL ECOSYSTEM — 500+ MODELS AND COUNTING

The breadth of OpenRouter's model catalog is genuinely staggering.
Over five hundred models from more than sixty providers, spanning text, vision,
audio, and video. Understanding the major families is essential for making good
choices.


ANTHROPIC CLAUDE

The Claude family represents some of the most capable models on the platform.
Claude Sonnet 4.6 has emerged as the practical frontier workhorse for balanced
production use, offering strong performance at a price point that makes it
viable for high-volume workloads. Claude Opus 4.7 sits at the top of the
hierarchy for complex reasoning and coding tasks. Claude Haiku 4.5 provides
a fast, economical option for simpler tasks. The Claude models are known for
their strong instruction following, nuanced handling of context, and generally
reliable behavior.


OPENAI GPT

GPT-4o remains a workhorse model, valued for its speed, cost-effectiveness,
and strong multilingual capabilities. GPT-5.5 represents the current frontier
of OpenAI's capabilities, with a one-million-plus token context window and
optimization for deep reasoning and accuracy in complex tasks. GPT-4.1 and
GPT-4.1-mini offer more economical options for teams that need strong
performance without frontier-model pricing. OpenAI has also released open-
weight models: gpt-oss-120b, a 117-billion-parameter Mixture-of-Experts model
for high-reasoning use cases, and gpt-oss-20b, a 21-billion-parameter model
released under the Apache 2.0 license.


GOOGLE GEMINI AND GEMMA

Gemini 3.1 Pro Preview stands out for multimodal and tool-rich workflows and
is one of the featured models on OpenRouter. Gemini 3.1 Flash Lite is Google's
high-efficiency multimodal model, optimized for low-latency, high-volume
workloads at half the cost of Gemini 3 Flash. Gemma 4 27B and Gemma 4 31B are
available for free on OpenRouter, making them excellent options for
experimentation and development. Nano Banana (Gemini 2.5 Flash Image) is
Google's state-of-the-art image generation model with contextual understanding.


DEEPSEEK

DeepSeek has emerged as one of the most consequential stories in the 2026 AI
landscape. The Chinese research lab has produced models that deliver performance
comparable to much more expensive proprietary models at a fraction of the cost.
DeepSeek V3 remains a strong option for general coding tasks. DeepSeek V4 Pro,
released April 24, 2026, targets advanced reasoning, coding, and long-horizon
agent workflows with a one-million-token context window, and saw 13.6 billion
tokens consumed on the day after its launch — nearly four times the previous
day's volume. DeepSeek R1 is available for free and is known for strong
reasoning and chain-of-thought capabilities. DeepSeek's pricing is aggressive:
V4 Pro costs 97% less than OpenAI's GPT-5.5.


META LLAMA

Llama 4 Scout offers an industry-leading ten-million-token context window,
making it well-suited for analyzing entire codebases or very long documents.
Llama 4 Maverick is a large-scale multimodal Mixture-of-Experts model. Llama
3.3 70B is available for free and serves as a solid general-purpose option.
The Llama family, being open-weight, is available through numerous inference
providers on OpenRouter.


MISTRAL AI

Mistral Small is available for free on OpenRouter and is valued for its
efficiency and strong instruction following. It is particularly well-suited
for use cases where you need to chain multiple model calls together, as its
lower cost makes such patterns economically viable.


QWEN (ALIBABA)

Qwen3 235B is Alibaba's largest model and is available for free on OpenRouter,
making it one of the most capable free options on the platform. The Qwen family
is recognized for strong multilingual performance.


NVIDIA

NVIDIA's Nemotron 3 Super is a 120-billion-parameter open hybrid Mixture-of-
Experts model with a one-million-token context window, designed for complex
multi-agent applications.


OTHER NOTABLE MODELS

Grok 4.3 from xAI is available on the platform. CoBuddy from Baidu is a code
generation model optimized for coding tasks and AI agent workflows. OpenRouter
also maintains its own proprietary models, including Optimus Alpha and Quasar
Alpha, the latter noted for particularly fast token generation.


THE FREE TIER

OpenRouter offers over thirty models with no per-token cost, including DeepSeek
R1, Llama 3.3 70B, Qwen3 235B, Gemma 4 27B, Gemma 4 31B, and Mistral Small.
Free models are subject to rate limits — fifty requests per day for free-tier
accounts, one thousand requests per day for pay-as-you-go accounts with at
least ten dollars in credits — but for development and prototyping these limits
are rarely a binding constraint.

CHAPTER FIVE: PRICING, CREDITS, AND THE ECONOMICS OF AGGREGATION

OpenRouter's pricing model in 2026 is based on a prepaid credit system, with
platform fees applied to credit purchases rather than per-request markups. The
company states that it passes through provider pricing without markup, meaning
the token prices shown in the model catalog reflect direct provider costs.


CREDIT PURCHASES AND PLATFORM FEES

You preload credits to your account, which can then be used across all available
models and providers. Credits expire after one year. The platform fee structure
as of 2026 is:

  - Credit purchases via card: 5.5% fee, minimum charge of $0.80
  - Crypto purchases: 5% fee, no minimum
  - Bring Your Own Key (BYOK): 5% usage fee on underlying provider cost,
    with a free tier of up to one million requests per month

OpenRouter states that this fee structure means most users will see lower total
costs compared to previous arrangements.


FREE TIER

Free-tier users have access to over thirty free models and four providers, with
a limit of fifty requests per day and no platform fees. This tier is sufficient
for experimentation and light development work.


PAY-AS-YOU-GO

For users with at least ten dollars in credits, there are generally no limits
on paid models and a higher limit of one thousand requests per day for free
models. There are no minimum spends or lock-ins.


ENTERPRISE

Enterprise pricing is based on volume, prepayment credits, and annual
commitments. Enterprise plans include volume discounts, SLAs, customized usage
limits, invoicing, purchase orders, and EU in-region routing for GDPR
compliance. OpenRouter accepts credit and debit cards, crypto, and bank
transfers for pay-as-you-go; enterprise plans support invoicing and purchase
orders.


MODEL PRICING (PER MILLION TOKENS, MAY 2026)

Claude Models:
  Claude Haiku 4.5:          $1.00 input  /  $5.00 output
  Claude Sonnet 4.6:         $3.00 input  /  $15.00 output
  Claude Opus 4.7:           $5.00 input  /  $25.00 output
  Claude Opus 4.6 (Fast):    $30.00 input /  $150.00 output

GPT Models:
  GPT-4.1-mini:              $0.40 input  /  $1.60 output
  GPT-4.1:                   $2.00 input  /  $8.00 output
  GPT-4o:                    $2.50 input  /  $10.00 output
  GPT-5.5:                   $5.00 input  /  $30.00 output

DeepSeek Models:
  DeepSeek V3:               $0.14 input  /  $0.28 output
  DeepSeek V4 Flash:         $0.14 input  /  $0.28 output
  DeepSeek V4 Pro:           $0.435 input /  $0.87 output

Gemini Models:
  Gemini Flash Latest:       $0.50 input  /  $3.00 output
  Gemini Pro Latest:         $2.00 input  /  $12.00 output
  Gemini 3.1 Pro Preview:    $2.00 input  /  $12.00 output

The cost differential between a frontier proprietary model and a competitive
open-weight alternative is now enormous. DeepSeek V4 Pro at $0.435 per million
input tokens versus Claude Opus 4.7 at $5.00 per million input tokens represents
roughly an eleven-fold difference. For high-volume workloads, this gap is the
difference between a viable business and an unviable one.

An important billing note: you are only billed for successful model runs, even
when routing or fallback is enabled. If OpenRouter tries three providers before
finding one that works, you pay only for the successful response.

CHAPTER SIX: THE 2026 FEATURE WAVE — WHAT IS NEW THIS YEAR

2026 has been OpenRouter's most ambitious year for feature development. The
platform has moved well beyond its original identity as a text-model aggregator
and is now positioning itself as a comprehensive AI infrastructure layer.

Here is a chronological summary of the major releases:

March 12, 2026 — Auto Exacto: Adaptive Quality Routing
  Automatically routes tool-calling requests to the highest-quality physical
  server based on real-world performance signals. Now on by default.

April 15, 2026 — Video Generation
  Text-to-video, image-to-video, and reference-image-guided generation.
  Models include Google's Veo 3.1 Lite, Kling Video O1, and Alibaba's Wan 2.6.

April 22, 2026 — Workspaces
  Team-based organization of API keys, usage tracking, and configuration.
  Guardrail definitions can be copied between workspaces.

April 24, 2026 — Agent SDK
  TypeScript SDK for building multi-turn agentic workflows with tool calls,
  stop conditions, streaming, and cost tracking across 300+ models.

April 29, 2026 — CLI Account Creation
  Create OpenRouter accounts and API keys via the command line using Stripe
  Projects.

April 30, 2026 — Response Caching
  Cache identical API requests for faster responses at no additional cost.

May 1, 2026 — Audio APIs
  Text-to-speech and audio transcription endpoints covering multiple providers
  under a single API.

May 6, 2026 — Human-in-the-Loop SDK Tool
  New tool type that pauses agent execution to await human input.

May 6, 2026 — Responses API MCP Tool Routing
  The "namespace" field on function_call output items is now preserved through
  the Responses API pipeline, improving MCP tool routing.

May 7, 2026 — Consistent Web Search and Fetch
  Any tool-calling model can now search the web and fetch page content through
  OpenRouter's standardized interface.

The pace of these releases reflects both the competitive pressure OpenRouter
faces and the genuine ambition of its roadmap. The platform is no longer just
routing text; it is becoming the universal adapter for every modality of AI
output.

CHAPTER SEVEN: PRIVACY, DATA HANDLING, AND THE COMPLIANCE CONVERSATION

OpenRouter's privacy posture in 2026 has improved in some respects and remained
complicated in others. This section gives an honest accounting of where things
stand.


WHAT OPENROUTER DOES BY DEFAULT

By default, OpenRouter does not log your prompts or completions. It stores only
request metadata — timestamps, model used, token counts, latency — for billing
and operational purposes. This is a reasonable default that protects most users
in most situations.


THE PROMPT LOGGING OPT-IN — READ THIS CAREFULLY

Users can opt into prompt logging in exchange for a one percent discount on
usage costs. This sounds innocuous, but the terms are not. Enabling prompt
logging grants OpenRouter an irrevocable right to commercial use of those inputs
and outputs. This language is broader than what most direct providers use, and
it has drawn legitimate criticism. The implication is that your logged data
could be used for purposes beyond displaying your history in the dashboard,
including potentially selling anonymized fragments to third parties.

The practical advice is simple: do not enable prompt logging unless you have
carefully reviewed the terms and are certain that the data you are sending
contains nothing sensitive. For most professional use cases, the one percent
discount is not worth the trade-off.


ZERO DATA RETENTION

OpenRouter's Zero Data Retention mode, when enabled, restricts routing to only
those provider endpoints that have committed to not storing your data. This can
be set globally for your account or on a per-request basis. Using ZDR reduces
the pool of available providers and may affect pricing and latency, but for
sensitive workloads it is the appropriate choice.


PROVIDER-SPECIFIC DATA RETENTION

When you send a request through OpenRouter, it is routed to an underlying
provider, and that provider processes your data under its own data retention
policy. OpenRouter does not alter those policies. OpenRouter does display
provider data retention policies in its interface, which is helpful, but the
responsibility for understanding and accepting those policies rests with the
application developer.


SESSION RECORDING

OpenRouter uses PostHog for session recording, which captures user interactions
including mouse movements, scrolling, and input field content on the web
interface. This raw data passes through an external service before being
anonymized. For organizations subject to GDPR or ISO 27001, this is a genuine
concern. The practical mitigation is to filter traffic to ".posthog.com" at the
network level if your organization's policies require it.


GDPR AND EU IN-REGION ROUTING

For enterprise customers, OpenRouter supports EU in-region routing, ensuring
prompts and completions are processed within the European Union. This feature
is not enabled by default and requires an enterprise account configuration.
Critics note that OpenRouter's standard tier, which runs on Cloudflare's global
edge network, does not provide data residency guarantees, and the lack of a
built-in per-request audit log showing the processing region makes proving GDPR
compliance difficult for standard-tier users.


HIPAA

OpenRouter does not publish a HIPAA Business Associate Agreement, making it
unsuitable for use cases involving Protected Health Information without such an
agreement in place. Teams working with PHI should use direct provider APIs with
appropriate BAAs in place.


DISCLAIMER OF RESPONSIBILITY FOR UNDERLYING PROVIDERS

OpenRouter explicitly disclaims contractual liability if an underlying model
provider retains and trains on user data. This shifts the burden of proof and
potential claims to the user. This is a standard posture for aggregators but
worth understanding clearly before deploying OpenRouter in regulated contexts.


PRACTICAL RECOMMENDATIONS

For most professional use cases:
  - Disable prompt and chat logging in your account settings
  - Enable Zero Data Retention for any workload involving sensitive information
  - Use explicit provider routing to restrict requests to providers whose data
    policies you have reviewed and accepted
  - For GDPR-sensitive workloads, use an enterprise account with EU in-region
    routing enabled
  - For HIPAA-covered workloads, do not use OpenRouter without a BAA in place
  - Consider filtering traffic to ".posthog.com" if session recording is a
    concern for your organization

CHAPTER EIGHT: THE COMPETITIVE LANDSCAPE — OPENROUTER VS. THE FIELD

OpenRouter is not the only player in the AI gateway and LLM routing space.
Understanding the competitive landscape helps you make an informed choice about
which tool is right for your situation.


OPENROUTER

Best for: Rapid prototyping, broad model access, and simplified billing for
developers and startups. It acts as a managed aggregator with minimal setup.

Key strengths: Over 500 models from 60+ providers, unified API, automatic
routing and fallbacks, Auto Exacto quality routing, new Agent SDK, video and
audio support, free tier with 30+ models.

Key limitations: No self-hosting option. Credit purchase fees. Adds roughly
25-40ms of latency over direct provider calls. Less customization for
infrastructure and control plane compared to self-hosted solutions. Privacy
and compliance complexity for regulated industries.


LITELLM

Best for: Teams that require full control, data sovereignty, custom routing
logic, and open-source compliance.

LiteLLM is an open-source Python library that provides a unified interface to
over 100 LLM providers. It can be used as an SDK or run as a proxy server.
It supports OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and local models
through Ollama. It includes cost tracking, automatic retries, and fallbacks.

Key limitations: Operational costs for infrastructure and DevOps. Production
latency overhead of 20-40ms. Enterprise features may require an annual fee.


PORTKEY

Best for: Enterprises and production-grade AI applications that demand
comprehensive governance, deep observability, and robust reliability features.

Portkey is an enterprise-focused AI gateway that connects to over 1,600 LLMs
across 200+ providers. It offers hierarchical budgets, virtual keys, RBAC, SSO,
audit logs, rate limits, semantic caching, guardrails with PII redaction, and
flexible deployment including fully airgapped options. It is SOC2, ISO 27001,
HIPAA, and GDPR compliant.

Key limitations: Steeper learning curve. Can be overkill for simple use cases.
Pricing at higher tiers can be significant.


OTHER NOTABLE ALTERNATIVES IN 2026

Bifrost (by Maxim AI): A high-performance, open-source AI gateway built in Go,
offering extremely low latency — 11 microseconds of overhead at 5,000 RPS —
and strong governance features. Considered a top OpenRouter alternative for
latency-sensitive production workloads.

Helicone: Focuses on observability first, providing excellent request logging,
monitoring, and cost tracking, with added gateway features.

Cloudflare AI Gateway: Leverages Cloudflare's edge network for proxying and
caching LLM requests, offering low-latency responses, rate limiting, and cost
controls.

Vercel AI Gateway: Integrates deeply with the Vercel ecosystem, offering
budgeting, usage monitoring, load balancing, and automatic fallbacks.

Kong AI Gateway: Builds on Kong's API management framework, extending it to
LLM traffic for enterprises already using Kong.

TrueFoundry AI Gateway: Stands out for enterprises needing multi-provider
routing, centralized governance, cost tracking, and MCP integration.

AWS Bedrock: A cloud-native solution for AWS users, providing access to
foundation models within the AWS infrastructure.


THE DECISION FRAMEWORK

When choosing an AI gateway, consider:
  - Deployment model: Do you need managed SaaS, self-hosted, or hybrid?
  - Performance: How sensitive is your application to the proxy latency?
  - Provider coverage: Do you need access to a specific set of models?
  - Governance: Do you need RBAC, audit logs, and compliance certifications?
  - Observability: How much visibility do you need into request-level details?
  - Agentic support: Do you need multi-turn agent workflows and MCP integration?
  - Cost: What is the total cost of ownership including operational overhead?

For most individual developers and small teams, OpenRouter is the right starting
point. For enterprises with strict compliance requirements, Portkey or a self-
hosted LiteLLM deployment may be more appropriate. For teams that need the
absolute minimum latency overhead, Bifrost is worth evaluating.

CHAPTER NINE: THE CHINESE MODEL SURGE — A 2026 PHENOMENON

One of the most striking stories in the 2026 AI landscape is the dramatic rise
of Chinese AI models on OpenRouter. The numbers are remarkable.

In October 2024, Chinese-developed models accounted for approximately 1.2% of
total token consumption on OpenRouter. By February 2026, that figure had risen
to 61%. By April 2026, Chinese models collectively processed over 45% of all
tokens on the platform — and during the week of March 30 to April 5, 2026, all
six of the top models by token consumption were from China.

DeepSeek is the primary driver of this trend. DeepSeek V3.2 ranked fourth in
weekly token volume in April 2026, consuming 1.22 trillion tokens. DeepSeek
V4 Pro, released April 24, 2026, saw 13.6 billion tokens consumed on the day
after its launch — nearly four times the previous day's volume.

Several factors explain this surge. First, pricing: DeepSeek V4 Pro costs 97%
less than OpenAI's GPT-5.5, a price differential that is simply impossible to
ignore for cost-conscious developers. Second, performance: DeepSeek V4 Pro has
shown comparable performance to GPT-5.4-high and Gemini 3.1 Pro in agent-based
web development tasks. Third, focus: Chinese models have concentrated their
development on programming and agent-driven workflows, which happen to be the
fastest-growing categories of token usage on OpenRouter — programming expanded
from 11% to over 50% of total token usage throughout 2025.

The geopolitical and strategic implications of this shift are significant and
beyond the scope of this article, but the practical implications for developers
are clear: Chinese models, particularly DeepSeek, represent a genuinely
compelling option for cost-sensitive workloads, and ignoring them on the basis
of national origin alone means leaving substantial value on the table.

Teams with data sovereignty concerns should note that DeepSeek's servers are
located in China, and routing sensitive data through them may not be appropriate
depending on your organization's policies and the regulatory environment in
which you operate.

CHAPTER TEN: GETTING STARTED — INSTALLATION AND CONFIGURATION

Getting started with OpenRouter is refreshingly simple.

Step one: Create an account at openrouter.ai. Registration requires only an
email address. Once registered, navigate to the API Keys section of your
dashboard and generate a new key. This key is a long string beginning with
"sk-or-v1-". Treat it with the same care as any other secret credential: never
commit it to version control, never hardcode it in your source files, and store
it in environment variables or a secrets management system.

Step two: Add credits to your account if you want to use paid models. If you
are just experimenting, start with the free models. Over thirty free models are
available with no credit required.

Step three: Install your preferred SDK or library.

For Python (using the OpenAI-compatible library):

```bash
pip install openai python-dotenv

For TypeScript/JavaScript (using the OpenAI-compatible library):

npm install openai dotenv

For TypeScript/JavaScript (using OpenRouter's native SDK):

npm install @openrouter/sdk dotenv

For the new Agent SDK:

npm install @openrouter/agent zod dotenv

Step four: Create a ".env" file in your project directory:

OPENROUTER_API_KEY=sk-or-v1-your-actual-key-here

You are now ready to make your first API call.

CHAPTER ELEVEN: PYTHON EXAMPLES — FROM BASIC TO PRODUCTION-GRADE

Python remains the dominant language for AI development. The following examples progress from the simplest possible usage to patterns appropriate for production applications.

EXAMPLE 1: BASIC CHAT COMPLETION

# basic_completion.py
# The simplest possible OpenRouter interaction.
# The only OpenRouter-specific elements are the base_url and api_key.
# Everything else is standard OpenAI API usage.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

# Initialize the OpenAI client, redirected to OpenRouter's endpoint.
# The base_url is the only structural difference from using OpenAI directly.
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)


def ask_model(prompt: str, model: str = "deepseek/deepseek-v3") -> str:
    """
    Send a single prompt to the specified model and return the response text.

    Args:
        prompt: The user's message to send to the model.
        model:  The OpenRouter model identifier. Defaults to DeepSeek V3,
                which offers excellent capability at very low cost in 2026.

    Returns:
        The model's response as a plain string.
    """
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
    )
    # The response structure mirrors the OpenAI API exactly.
    return response.choices[0].message.content


if __name__ == "__main__":
    # Try a free model first — no credits required.
    answer = ask_model(
        prompt="Explain the difference between a transformer and an RNN "
               "in three sentences.",
        model="meta-llama/llama-3.3-70b-instruct:free",
    )
    print(answer)

EXAMPLE 2: STREAMING RESPONSES

# streaming_completion.py
# Demonstrates streaming responses from OpenRouter.
# Streaming is critical for chat interfaces and any UX where
# perceived responsiveness matters more than raw throughput.

import os
import sys
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)


def stream_response(
    prompt: str,
    model: str = "anthropic/claude-sonnet-4-6",
) -> None:
    """
    Stream a model's response to stdout, printing each token as it arrives.
    This creates the familiar typing effect seen in chat interfaces.

    Streaming responses are billed per token, identically to non-streaming
    requests — there is no streaming surcharge on OpenRouter.

    Args:
        prompt: The user's message.
        model:  The model to use. Claude Sonnet 4.6 is the practical
                frontier workhorse for balanced production use in 2026.
    """
    # Setting stream=True transforms the response from a single object
    # into an iterator that yields chunks as the model generates them.
    stream = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )

    # Each chunk contains a delta with the new tokens generated since
    # the last chunk. We print without a newline and flush immediately
    # so the output appears progressively rather than all at once.
    for chunk in stream:
        delta_content = chunk.choices[0].delta.content
        if delta_content is not None:
            sys.stdout.write(delta_content)
            sys.stdout.flush()

    # Print a final newline after the stream is complete.
    print()


if __name__ == "__main__":
    stream_response(
        "Write a short poem about the joy of debugging code at 2am."
    )

EXAMPLE 3: FALLBACK ROUTING FOR RESILIENCE

# resilient_completion.py
# Demonstrates OpenRouter's fallback routing mechanism.
# By specifying a list of fallback models, we ensure our application
# continues to function even when individual providers experience issues.
# OpenRouter only bills for the successful response, not for failed attempts.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)


def resilient_completion(prompt: str) -> str:
    """
    Send a prompt with a fallback chain of models.
    OpenRouter will try each model in order if the previous one fails.

    The fallback chain here goes from a premium model to progressively
    more economical alternatives, ensuring we always get a response
    while preferring the highest-quality option when available.

    Args:
        prompt: The user's message.

    Returns:
        The model's response as a string.
    """
    response = client.chat.completions.create(
        # The primary model is specified in the standard 'model' field.
        model="anthropic/claude-opus-4-7",
        messages=[{"role": "user", "content": prompt}],
        # The 'extra_body' parameter passes OpenRouter-specific extensions
        # that are not part of the standard OpenAI API schema.
        extra_body={
            # The 'models' array defines the fallback chain.
            # OpenRouter will try these in order if the primary model fails.
            "models": [
                "openai/gpt-5.5",
                "google/gemini-3-1-pro-preview",
                "deepseek/deepseek-v4-pro",
                # A free model as the final fallback — always available.
                "meta-llama/llama-3.3-70b-instruct:free",
            ],
            # 'fallback' tells OpenRouter to use the models array as a
            # sequential fallback chain rather than load-balancing.
            "route": "fallback",
        },
    )
    return response.choices[0].message.content


if __name__ == "__main__":
    result = resilient_completion(
        "What are the key differences between supervised and "
        "unsupervised learning?"
    )
    print(result)

EXAMPLE 4: PROVIDER ROUTING WITH AUTO EXACTO AWARENESS

# provider_routing.py
# Demonstrates fine-grained provider selection in OpenRouter.
# As of March 2026, Auto Exacto is on by default for tool-calling requests,
# automatically routing to the highest-quality physical server.
# This example shows how to control routing for non-tool-calling requests
# and how to opt out of Auto Exacto when price-optimized routing is preferred.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)


def quality_optimized_completion(prompt: str) -> str:
    """
    Send a prompt with quality-optimized provider routing.
    For tool-calling requests, Auto Exacto handles this automatically.
    For non-tool-calling requests, we can specify provider preferences manually.

    Args:
        prompt: The user's message.

    Returns:
        The model's response as a string.
    """
    response = client.chat.completions.create(
        model="meta-llama/llama-4-scout",
        messages=[{"role": "user", "content": prompt}],
        extra_body={
            "provider": {
                # Specify preferred providers in order of preference.
                # OpenRouter will use the first available provider in this list.
                "order": ["Together", "Fireworks", "Lepton"],
                # Ensure only providers that fully support all request
                # parameters are considered. Prevents silent parameter drops.
                "require_parameters": True,
                # Exclude specific providers if needed for compliance.
                "avoid": [],
            },
        },
    )
    return response.choices[0].message.content


def price_optimized_completion(prompt: str) -> str:
    """
    Send a prompt with price-optimized routing, opting out of Auto Exacto.
    Use this when cost is the primary concern and tool-calling reliability
    is not required.

    Args:
        prompt: The user's message.

    Returns:
        The model's response as a string.
    """
    response = client.chat.completions.create(
        # Appending ':floor' to the model slug opts out of Auto Exacto
        # and uses price-weighted routing instead.
        model="meta-llama/llama-4-scout:floor",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content


if __name__ == "__main__":
    result = quality_optimized_completion(
        "Summarize the main advantages of the Mixture-of-Experts "
        "architecture for large language models."
    )
    print("Quality-optimized result:")
    print(result)

EXAMPLE 5: TOOL CALLING WITH AUTO EXACTO

# tool_calling.py
# Demonstrates a complete tool calling workflow with OpenRouter.
# As of March 2026, Auto Exacto automatically routes tool-calling requests
# to the providers with the highest tool-calling success rates, addressing
# the reliability problems that historically plagued tool use with open-
# source models served by third-party inference providers.

import os
import json
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)


# ---------------------------------------------------------------------------
# Tool implementation: the actual Python function that does real work.
# ---------------------------------------------------------------------------

def get_current_weather(location: str, unit: str = "celsius") -> dict:
    """
    Simulate fetching current weather data for a given location.
    In a real application, this would call a weather API.

    Args:
        location: The city and country, e.g., "Munich, Germany".
        unit:     Temperature unit, either "celsius" or "fahrenheit".

    Returns:
        A dictionary containing simulated weather data.
    """
    # Simulated response for demonstration purposes.
    weather_data = {
        "location": location,
        "temperature": 18 if unit == "celsius" else 64,
        "unit": unit,
        "condition": "partly cloudy",
        "humidity": 65,
        "wind_speed_kmh": 12,
    }
    return weather_data


def search_web(query: str) -> dict:
    """
    Simulate a web search. In a real application, this would use
    OpenRouter's new web search feature (launched May 7, 2026) or
    a dedicated search API like Brave Search or Tavily.

    Args:
        query: The search query string.

    Returns:
        A dictionary containing simulated search results.
    """
    return {
        "query": query,
        "results": [
            f"Result 1 for '{query}': Relevant information found.",
            f"Result 2 for '{query}': Additional context available.",
        ],
    }


# ---------------------------------------------------------------------------
# Tool schemas: JSON descriptions that tell the model what tools are available.
# ---------------------------------------------------------------------------

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": (
                "Retrieves the current weather conditions for a specified "
                "location. Use this when the user asks about weather."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and country, e.g., 'Munich, Germany'.",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to use.",
                    },
                },
                "required": ["location"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": (
                "Searches the web for information on a given query. "
                "Use this to find current information or facts."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query.",
                    },
                },
                "required": ["query"],
            },
        },
    },
]

# Map from tool name strings to actual Python callables.
AVAILABLE_TOOLS = {
    "get_current_weather": get_current_weather,
    "search_web": search_web,
}


def run_tool_calling_conversation(user_message: str) -> str:
    """
    Run a complete tool calling conversation turn.

    Auto Exacto (active since March 2026) automatically routes this request
    to the provider with the highest tool-calling success rate for the chosen
    model, so we do not need to manually specify provider preferences for
    reliability.

    Args:
        user_message: The user's input message.

    Returns:
        The model's final response after any tool calls have been resolved.
    """
    messages = [{"role": "user", "content": user_message}]

    # First API call: give the model the user's message and tool definitions.
    # Auto Exacto will route this to the best provider for tool calling.
    response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=messages,
        tools=TOOLS,
        tool_choice="auto",
    )

    assistant_message = response.choices[0].message

    # Check whether the model has requested any tool calls.
    if not assistant_message.tool_calls:
        return assistant_message.content

    # Add the assistant's message (including tool call requests) to history.
    messages.append(assistant_message)

    # Process each tool call the model has requested.
    for tool_call in assistant_message.tool_calls:
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)

        if function_name not in AVAILABLE_TOOLS:
            raise ValueError(f"Unknown tool requested by model: {function_name}")

        tool_function = AVAILABLE_TOOLS[function_name]
        tool_result = tool_function(**function_args)

        # Add the tool result to the conversation history.
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": function_name,
            "content": json.dumps(tool_result),
        })

    # Second API call: send the tool results back to the model.
    final_response = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=messages,
    )

    return final_response.choices[0].message.content


if __name__ == "__main__":
    user_query = "What is the weather like in Munich right now? I prefer Celsius."
    print(f"User: {user_query}")
    print()
    response = run_tool_calling_conversation(user_query)
    print(f"Assistant: {response}")

EXAMPLE 6: STRUCTURED OUTPUT WITH JSON SCHEMA

# structured_output.py
# Demonstrates enforcing a JSON schema on model responses via OpenRouter.
# Structured output is essential for applications that need to parse
# model responses programmatically rather than display them as text.

import os
import json
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY"),
)

# Define the JSON schema that the model's response must conform to.
TECH_ANALYSIS_SCHEMA = {
    "type": "object",
    "properties": {
        "technology_name": {
            "type": "string",
            "description": "The name of the technology being analyzed.",
        },
        "maturity_level": {
            "type": "string",
            "enum": ["emerging", "growing", "mature", "declining"],
            "description": "The current maturity level of the technology.",
        },
        "adoption_score": {
            "type": "number",
            "minimum": 1,
            "maximum": 10,
            "description": "Adoption score from 1 (niche) to 10 (ubiquitous).",
        },
        "key_use_cases": {
            "type": "array",
            "items": {"type": "string"},
            "description": "The primary use cases for this technology.",
        },
        "main_providers": {
            "type": "array",
            "items": {"type": "string"},
            "description": "The main companies or projects providing this technology.",
        },
        "summary": {
            "type": "string",
            "description": "A two-to-three sentence summary of the technology.",
        },
        "recommendation": {
            "type": "string",
            "description": "A practical recommendation for enterprise adoption.",
        },
    },
    "required": [
        "technology_name", "maturity_level", "adoption_score",
        "key_use_cases", "main_providers", "summary", "recommendation"
    ],
}


def analyze_technology(technology: str) -> dict:
    """
    Request a structured technology analysis from the model,
    enforcing a JSON schema on the response.

    Args:
        technology: The name of the technology to analyze.

    Returns:
        A dictionary containing the structured analysis.
    """
    response = client.chat.completions.create(
        # Gemini 3.1 Pro Preview is well-suited for structured analytical tasks.
        model="google/gemini-3-1-pro-preview",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior technology analyst specializing in AI "
                    "infrastructure. Provide accurate, balanced analyses. "
                    "Always respond in the exact JSON format specified."
                ),
            },
            {
                "role": "user",
                "content": f"Analyze this technology for enterprise adoption: {technology}",
            },
        ],
        # The response_format parameter enforces structured output.
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "technology_analysis",
                "schema": TECH_ANALYSIS_SCHEMA,
                "strict": True,
            },
        },
    )

    raw_json = response.choices[0].message.content
    return json.loads(raw_json)


if __name__ == "__main__":
    analysis = analyze_technology("OpenRouter")

    print(f"Technology:    {analysis['technology_name']}")
    print(f"Maturity:      {analysis['maturity_level']}")
    print(f"Adoption:      {analysis['adoption_score']}/10")
    print(f"Summary:       {analysis['summary']}")
    print()
    print("Key Use Cases:")
    for use_case in analysis["key_use_cases"]:
        print(f"  - {use_case}")
    print()
    print("Main Providers:")
    for provider in analysis["main_providers"]:
        print(f"  - {provider}")
    print()
    print(f"Recommendation: {analysis['recommendation']}")

EXAMPLE 7: PRODUCTION-GRADE CLIENT WITH RETRY LOGIC

# production_client.py
# A production-grade OpenRouter client with error handling, retry logic,
# and structured logging. This demonstrates the patterns needed to use
# OpenRouter reliably in a real application.

import os
import time
import logging
from typing import Optional
from openai import OpenAI, RateLimitError, APIStatusError, APIConnectionError
from dotenv import load_dotenv

load_dotenv()

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
)
logger = logging.getLogger("openrouter_client")


class OpenRouterClient:
    """
    A production-grade wrapper around the OpenRouter API.

    This class encapsulates the OpenAI client configured for OpenRouter
    and adds retry logic, error handling, and logging appropriate for
    production use. It also demonstrates how to use Zero Data Retention
    for sensitive workloads.
    """

    DEFAULT_MAX_RETRIES = 3
    DEFAULT_INITIAL_BACKOFF_SECONDS = 1.0
    DEFAULT_BACKOFF_MULTIPLIER = 2.0

    def __init__(
        self,
        api_key: Optional[str] = None,
        max_retries: int = DEFAULT_MAX_RETRIES,
        initial_backoff: float = DEFAULT_INITIAL_BACKOFF_SECONDS,
        zero_data_retention: bool = False,
    ) -> None:
        """
        Initialize the OpenRouter client.

        Args:
            api_key:              The OpenRouter API key. If not provided,
                                  reads from OPENROUTER_API_KEY env variable.
            max_retries:          Maximum retry attempts for transient errors.
            initial_backoff:      Initial wait time in seconds before first retry.
            zero_data_retention:  If True, restricts routing to ZDR-compliant
                                  providers only. Use for sensitive workloads.
        """
        resolved_key = api_key or os.getenv("OPENROUTER_API_KEY")
        if not resolved_key:
            raise ValueError(
                "OpenRouter API key must be provided either as a parameter "
                "or via the OPENROUTER_API_KEY environment variable."
            )

        self._client = OpenAI(
            base_url="https://openrouter.ai/api/v1",
            api_key=resolved_key,
        )
        self._max_retries = max_retries
        self._initial_backoff = initial_backoff
        self._zero_data_retention = zero_data_retention

        if zero_data_retention:
            logger.info(
                "Zero Data Retention mode enabled. Routing restricted to "
                "ZDR-compliant providers only."
            )

    def complete(
        self,
        prompt: str,
        model: str,
        system_prompt: Optional[str] = None,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
    ) -> str:
        """
        Send a completion request with automatic retry on transient failures.

        This method implements exponential backoff for rate limit errors
        and provider unavailability, while failing fast on permanent errors
        such as invalid requests or authentication failures.

        Remember: OpenRouter only bills for successful model runs, so
        failed attempts in the retry chain do not incur charges.

        Args:
            prompt:        The user's message.
            model:         The OpenRouter model identifier.
            system_prompt: Optional system prompt to set the model's behavior.
            temperature:   Sampling temperature (0.0 to 2.0).
            max_tokens:    Maximum tokens to generate. None means model default.

        Returns:
            The model's response text.

        Raises:
            APIStatusError:     For permanent API errors (4xx except 429).
            APIConnectionError: If the connection to OpenRouter fails entirely.
            RuntimeError:       If all retry attempts are exhausted.
        """
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        messages.append({"role": "user", "content": prompt})

        request_params: dict = {
            "model": model,
            "messages": messages,
            "temperature": temperature,
        }
        if max_tokens is not None:
            request_params["max_tokens"] = max_tokens

        # Add Zero Data Retention routing if configured.
        # This restricts routing to providers that do not store your data.
        if self._zero_data_retention:
            request_params["extra_body"] = {
                "provider": {
                    "data_collection": "deny",
                }
            }

        backoff = self._initial_backoff
        last_exception: Optional[Exception] = None

        for attempt in range(self._max_retries + 1):
            try:
                if attempt > 0:
                    logger.info(
                        "Retry attempt %d/%d for model %s (backoff: %.1fs)",
                        attempt, self._max_retries, model, backoff,
                    )
                    time.sleep(backoff)
                    backoff *= self.DEFAULT_BACKOFF_MULTIPLIER

                response = self._client.chat.completions.create(
                    **request_params
                )

                logger.info(
                    "Successful completion | model=%s | tokens=%d",
                    model,
                    response.usage.total_tokens if response.usage else 0,
                )

                content = response.choices[0].message.content
                if content is None:
                    raise ValueError("Model returned a null content field.")
                return content

            except RateLimitError as exc:
                # 429 errors are transient; retry with backoff.
                logger.warning(
                    "Rate limit hit on attempt %d | model=%s",
                    attempt + 1, model,
                )
                last_exception = exc

            except APIStatusError as exc:
                if exc.status_code >= 500:
                    # 5xx errors indicate provider issues; retry.
                    logger.warning(
                        "Provider error on attempt %d | model=%s | status=%d",
                        attempt + 1, model, exc.status_code,
                    )
                    last_exception = exc
                else:
                    # 4xx errors (except 429) are permanent; do not retry.
                    logger.error(
                        "Permanent API error | model=%s | status=%d",
                        model, exc.status_code,
                    )
                    raise

            except APIConnectionError as exc:
                # Connection errors may be transient; retry.
                logger.warning(
                    "Connection error on attempt %d | model=%s",
                    attempt + 1, model,
                )
                last_exception = exc

        logger.error(
            "All %d retry attempts failed for model %s",
            self._max_retries, model,
        )
        raise RuntimeError(
            f"OpenRouter request failed after {self._max_retries} retries."
        ) from last_exception


if __name__ == "__main__":
    # Standard client for general use.
    client = OpenRouterClient()

    result = client.complete(
        prompt="Explain the CAP theorem in distributed systems.",
        model="deepseek/deepseek-v3",
        system_prompt=(
            "You are a senior distributed systems engineer. "
            "Explain concepts clearly and concisely."
        ),
        temperature=0.3,
    )
    print("Standard client result:")
    print(result)
    print()

    # ZDR client for sensitive workloads.
    # Only routes to providers that do not store your data.
    zdr_client = OpenRouterClient(zero_data_retention=True)

    sensitive_result = zdr_client.complete(
        prompt="Summarize the key principles of data minimization under GDPR.",
        model="anthropic/claude-sonnet-4-6",
        temperature=0.2,
    )
    print("ZDR client result:")
    print(sensitive_result)

CHAPTER TWELVE: TYPESCRIPT EXAMPLES — SDK AND RAW FETCH

EXAMPLE 1: BASIC COMPLETION WITH THE OPENAI-COMPATIBLE LIBRARY

// basic-completion.ts
// Demonstrates a simple chat completion using OpenRouter via the
// OpenAI-compatible TypeScript client. The only OpenRouter-specific
// elements are the baseURL and the API key source.

import OpenAI from "openai";
import * as dotenv from "dotenv";

dotenv.config();

// Initialize the client with OpenRouter's endpoint.
const client = new OpenAI({
    baseURL: "https://openrouter.ai/api/v1",
    apiKey: process.env.OPENROUTER_API_KEY,
    // OpenRouter recommends including these headers to identify your
    // application in their analytics and for rate limit management.
    defaultHeaders: {
        "HTTP-Referer": "https://your-application.example.com",
        "X-Title": "Your Application Name",
    },
});

/**
 * Sends a prompt to the specified model and returns the response text.
 *
 * @param prompt - The user's message to send to the model.
 * @param model  - The OpenRouter model identifier string.
 * @returns      A promise that resolves to the model's response text.
 */
async function askModel(
    prompt: string,
    model: string = "deepseek/deepseek-v3"
): Promise<string> {
    const response = await client.chat.completions.create({
        model,
        messages: [
            {
                role: "user",
                content: prompt,
            },
        ],
    });

    const content = response.choices[0]?.message?.content;
    if (!content) {
        throw new Error("Model returned an empty response.");
    }

    return content;
}

async function main(): Promise<void> {
    try {
        // Use a free model for zero-cost experimentation.
        const answer = await askModel(
            "What makes TypeScript superior to plain JavaScript for "
            + "large-scale application development?",
            "qwen/qwen3-235b-a22b:free"
        );
        console.log(answer);
    } catch (error) {
        console.error("Error calling OpenRouter:", error);
        process.exit(1);
    }
}

main();

EXAMPLE 2: STREAMING WITH THE NATIVE OPENROUTER SDK

// openrouter-sdk-streaming.ts
// Demonstrates using OpenRouter's native TypeScript SDK for streaming.
// The native SDK, released alongside the Agent SDK in 2026, provides
// stronger typing for OpenRouter-specific features.

import { OpenRouter } from "@openrouter/sdk";
import * as dotenv from "dotenv";

dotenv.config();

const openRouter = new OpenRouter({
    apiKey: process.env.OPENROUTER_API_KEY,
});

/**
 * Demonstrates a streaming chat completion using the OpenRouter SDK.
 * The 'for await' loop integrates naturally with TypeScript's async
 * iteration protocol.
 *
 * @param prompt - The user's message.
 * @param model  - The model identifier to use.
 */
async function streamWithNativeSDK(
    prompt: string,
    model: string = "anthropic/claude-sonnet-4-6"
): Promise<void> {
    console.log(`Streaming response from ${model}:\n`);

    const stream = await openRouter.chat.send({
        model,
        messages: [
            {
                role: "user",
                content: prompt,
            },
        ],
        stream: true,
    });

    for await (const chunk of stream) {
        const deltaContent = chunk.choices[0]?.delta?.content;
        if (deltaContent) {
            process.stdout.write(deltaContent);
        }
    }

    console.log();
}

async function main(): Promise<void> {
    try {
        await streamWithNativeSDK(
            "Explain the concept of zero-knowledge proofs in a way "
            + "that a software developer with no cryptography background "
            + "can understand."
        );
    } catch (error) {
        console.error("Streaming error:", error);
        process.exit(1);
    }
}

main();

EXAMPLE 3: RAW FETCH WITH FULL TYPE SAFETY

// fetch-example.ts
// Demonstrates calling OpenRouter directly via the fetch API.
// This approach requires no npm dependencies beyond the standard
// runtime environment, making it ideal for edge functions and browsers.
// Compatible with Cloudflare Workers, Vercel Edge Functions, and
// any environment with the standard fetch API (Node.js 18+).

// Type definitions mirroring the OpenAI chat completion response format.
interface ChatMessage {
    role: "user" | "assistant" | "system" | "tool";
    content: string;
}

interface ChatCompletionChoice {
    index: number;
    message: ChatMessage;
    finish_reason: string;
}

interface UsageStats {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
}

interface ChatCompletionResponse {
    id: string;
    model: string;
    choices: ChatCompletionChoice[];
    usage: UsageStats;
}

interface OpenRouterRequestBody {
    model: string;
    messages: Array<{ role: string; content: string }>;
    stream?: boolean;
    temperature?: number;
    max_tokens?: number;
}

/**
 * Calls the OpenRouter API using the native fetch API.
 * No npm dependencies required.
 *
 * @param prompt      - The user's message.
 * @param model       - The OpenRouter model identifier.
 * @param apiKey      - The OpenRouter API key.
 * @param temperature - Sampling temperature (default: 0.7).
 * @returns           A promise resolving to the model's response text.
 */
async function callOpenRouterWithFetch(
    prompt: string,
    model: string,
    apiKey: string,
    temperature: number = 0.7
): Promise<string> {
    const requestBody: OpenRouterRequestBody = {
        model,
        messages: [
            {
                role: "user",
                content: prompt,
            },
        ],
        temperature,
        stream: false,
    };

    const response = await fetch(
        "https://openrouter.ai/api/v1/chat/completions",
        {
            method: "POST",
            headers: {
                "Authorization": `Bearer ${apiKey}`,
                "Content-Type": "application/json",
                "HTTP-Referer": "https://your-application.example.com",
                "X-Title": "Your Application Name",
            },
            body: JSON.stringify(requestBody),
        }
    );

    // Always check the HTTP status before attempting to parse the response.
    if (!response.ok) {
        const errorText = await response.text();
        throw new Error(
            `OpenRouter API error ${response.status}: ${errorText}`
        );
    }

    const data = (await response.json()) as ChatCompletionResponse;

    const content = data.choices[0]?.message?.content;
    if (!content) {
        throw new Error("OpenRouter returned an empty response.");
    }

    return content;
}

async function main(): Promise<void> {
    const apiKey = process.env.OPENROUTER_API_KEY ?? "";

    if (!apiKey) {
        console.error(
            "Error: OPENROUTER_API_KEY environment variable is not set."
        );
        process.exit(1);
    }

    try {
        const result = await callOpenRouterWithFetch(
            "What are the three most important principles of clean code?",
            "google/gemini-3-1-pro-preview",
            apiKey
        );
        console.log(result);
    } catch (error) {
        console.error("Error:", error);
        process.exit(1);
    }
}

main();

CHAPTER THIRTEEN: THE OPENROUTER AGENT SDK — MULTI-TURN AGENTIC WORKFLOWS

The OpenRouter Agent SDK, released April 24, 2026, is one of the most significant additions to the platform's feature set. It provides a TypeScript toolkit specifically designed for building multi-turn agentic workflows, with the "callModel" function at its core.

The "callModel" function transforms a chat completion into a multi-step agent that can call tools, handle multi-turn loops, enforce stop conditions, track costs, and stream progress — all across any of the 300+ models on OpenRouter. This is the foundation for building sophisticated AI agents without having to implement the orchestration loop yourself.

Key concepts in the Agent SDK:

callModel: The core function. Handles the iterative process of calling a model, inspecting its output for tool requests, executing those tools, and feeding results back to the model until the task is complete or a stop condition is met.

tool(): Defines a tool with a name, description, Zod schema for input validation, and an execute function. The SDK handles argument parsing and validation automatically.

Stop conditions: Composable conditions like stepCountIs(), maxCost(), and hasToolCall() prevent infinite loops and manage costs. Custom stop functions can also be defined.

Streaming: getTextStream(), getToolCallsStream(), and getReasoningStream() provide real-time progress updates during multi-step agent runs.

Cost tracking: Every response from callModel includes token counts and cost data, allowing you to monitor expenses per agent run.

EXAMPLE: MULTI-AGENT RESEARCH AND SUMMARIZATION WORKFLOW

// multi-agent-workflow.ts
// Demonstrates a multi-agent workflow using the OpenRouter Agent SDK.
// This example shows a Research Agent that gathers information using
// web search, and a Summarizer Agent that condenses the findings.
//
// Installation:
//   npm install @openrouter/agent zod dotenv
//
// The Agent SDK was released April 24, 2026, and requires
// @openrouter/agent version 1.0.0 or later.

import { callModel, tool, stepCountIs, maxCost } from "@openrouter/agent";
import { z } from "zod";
import * as dotenv from "dotenv";

dotenv.config();

// ---------------------------------------------------------------------------
// Tool Definitions
// ---------------------------------------------------------------------------

/**
 * Web search tool. In a real application, this would call a search API
 * such as Brave Search, Tavily, or OpenRouter's built-in web search
 * (launched May 7, 2026).
 */
const webSearchTool = tool({
    name: "webSearch",
    description: "Searches the web for information on a given query. "
        + "Use this to find current facts, news, or technical information.",
    schema: z.object({
        query: z.string().describe("The search query to execute."),
        max_results: z.number().optional().describe(
            "Maximum number of results to return. Defaults to 5."
        ),
    }),
    execute: async ({ query, max_results = 5 }) => {
        console.log(`  [Tool] Searching web for: "${query}"`);

        // In a real application, replace this with an actual search API call.
        // OpenRouter's web search feature (May 2026) can be used here.
        const mockResults = Array.from({ length: max_results }, (_, i) => ({
            title: `Result ${i + 1} for "${query}"`,
            snippet: `Relevant information about ${query} from source ${i + 1}.`,
            url: `https://example.com/result-${i + 1}`,
        }));

        return {
            query,
            results: mockResults,
            total_found: mockResults.length,
        };
    },
});

/**
 * Document fetch tool. Retrieves the content of a specific URL.
 * In a real application, this would make an HTTP request.
 */
const fetchDocumentTool = tool({
    name: "fetchDocument",
    description: "Fetches and returns the content of a web page or document.",
    schema: z.object({
        url: z.string().url().describe("The URL to fetch content from."),
    }),
    execute: async ({ url }) => {
        console.log(`  [Tool] Fetching document: ${url}`);

        // Simulated document content for demonstration.
        return {
            url,
            content: `Simulated content from ${url}. In a real application, `
                + "this would contain the actual page content retrieved via HTTP.",
            word_count: 150,
        };
    },
});

// ---------------------------------------------------------------------------
// Agent Functions
// ---------------------------------------------------------------------------

/**
 * Research Agent: Gathers information on a topic using web search tools.
 * Uses Claude Haiku 4.5 for cost efficiency — research tasks often involve
 * many tool calls, so keeping per-call costs low is important.
 *
 * @param topic - The research topic.
 * @returns     The gathered research as a string.
 */
async function researchAgent(topic: string): Promise<string> {
    console.log(`\n[Research Agent] Starting research on: "${topic}"`);

    const result = await callModel({
        // Claude Haiku 4.5 at $1.00/$5.00 per million tokens is well-suited
        // for tool-heavy research tasks where many calls are made.
        model: "anthropic/claude-haiku-4-5",
        messages: [
            {
                role: "system",
                content: "You are a thorough research assistant. Use the "
                    + "webSearch and fetchDocument tools to gather comprehensive "
                    + "information on the given topic. Synthesize your findings "
                    + "into a detailed research summary.",
            },
            {
                role: "user",
                content: `Research the following topic thoroughly: ${topic}`,
            },
        ],
        tools: [webSearchTool, fetchDocumentTool],
        // Stop after 8 steps or if cost exceeds $0.10 — whichever comes first.
        // This prevents runaway agent loops in production.
        stopWhen: [stepCountIs(8), maxCost(0.10)],
    });

    const researchOutput = await result.getText();

    // Log cost information for monitoring.
    console.log(
        `[Research Agent] Complete. Cost: $${result.cost?.toFixed(6) ?? "unknown"}`
    );

    return researchOutput;
}

/**
 * Summarizer Agent: Condenses research findings into a structured summary.
 * Uses Gemini 3.1 Pro Preview for its strong analytical and synthesis
 * capabilities.
 *
 * @param rawResearch - The research output from the Research Agent.
 * @param topic       - The original research topic (for context).
 * @returns           A structured summary as a string.
 */
async function summarizerAgent(
    rawResearch: string,
    topic: string
): Promise<string> {
    console.log("\n[Summarizer Agent] Starting summarization...");

    const result = await callModel({
        // Gemini 3.1 Pro Preview at $2.00/$12.00 per million tokens offers
        // strong analytical capabilities for synthesis tasks.
        model: "google/gemini-3-1-pro-preview",
        messages: [
            {
                role: "system",
                content: "You are an expert analyst and technical writer. "
                    + "Condense research findings into clear, structured summaries "
                    + "with key insights, practical implications, and recommendations.",
            },
            {
                role: "user",
                content: `Topic: ${topic}\n\nResearch findings to summarize:\n\n${rawResearch}`,
            },
        ],
        // Summarization is a single-turn task; limit to 3 steps.
        stopWhen: [stepCountIs(3), maxCost(0.05)],
    });

    const summary = await result.getText();

    console.log(
        `[Summarizer Agent] Complete. Cost: $${result.cost?.toFixed(6) ?? "unknown"}`
    );

    return summary;
}

/**
 * Fact-Checker Agent: Validates key claims in the summary.
 * Uses DeepSeek V3 for cost efficiency — fact-checking involves
 * straightforward verification tasks that do not require frontier models.
 *
 * @param summary - The summary to fact-check.
 * @returns       The fact-checked and annotated summary.
 */
async function factCheckerAgent(summary: string): Promise<string> {
    console.log("\n[Fact-Checker Agent] Starting fact-checking...");

    const result = await callModel({
        // DeepSeek V3 at $0.14/$0.28 per million tokens is extremely
        // cost-effective for straightforward verification tasks.
        model: "deepseek/deepseek-v3",
        messages: [
            {
                role: "system",
                content: "You are a meticulous fact-checker. Review the provided "
                    + "summary, identify any claims that should be verified, and "
                    + "annotate the summary with confidence levels and any caveats.",
            },
            {
                role: "user",
                content: `Please fact-check this summary:\n\n${summary}`,
            },
        ],
        stopWhen: [stepCountIs(3), maxCost(0.02)],
    });

    const checkedSummary = await result.getText();

    console.log(
        `[Fact-Checker Agent] Complete. Cost: $${result.cost?.toFixed(6) ?? "unknown"}`
    );

    return checkedSummary;
}

// ---------------------------------------------------------------------------
// Workflow Orchestration
// ---------------------------------------------------------------------------

/**
 * Runs a complete multi-agent research workflow.
 * The three agents collaborate sequentially:
 *   1. Research Agent gathers raw information.
 *   2. Summarizer Agent condenses the findings.
 *   3. Fact-Checker Agent validates the summary.
 *
 * @param topic - The topic to research and summarize.
 */
async function runResearchWorkflow(topic: string): Promise<void> {
    console.log("=".repeat(60));
    console.log(`Multi-Agent Research Workflow`);
    console.log(`Topic: ${topic}`);
    console.log("=".repeat(60));

    const startTime = Date.now();

    // Step 1: Research Agent gathers information.
    const rawResearch = await researchAgent(topic);

    // Step 2: Summarizer Agent condenses the findings.
    const summary = await summarizerAgent(rawResearch, topic);

    // Step 3: Fact-Checker Agent validates the summary.
    const finalOutput = await factCheckerAgent(summary);

    const elapsedSeconds = ((Date.now() - startTime) / 1000).toFixed(1);

    console.log("\n" + "=".repeat(60));
    console.log("FINAL WORKFLOW OUTPUT");
    console.log("=".repeat(60));
    console.log(`Topic:    ${topic}`);
    console.log(`Duration: ${elapsedSeconds}s`);
    console.log();
    console.log(finalOutput);
}

// ---------------------------------------------------------------------------
// Entry Point
// ---------------------------------------------------------------------------

runResearchWorkflow(
    "The impact of DeepSeek's pricing strategy on the enterprise AI market in 2026"
).catch((error) => {
    console.error("Workflow error:", error);
    process.exit(1);
});

EXAMPLE: HUMAN-IN-THE-LOOP AGENT (MAY 2026 FEATURE)

// human-in-the-loop-agent.ts
// Demonstrates the new human-in-the-loop tool type, released May 6, 2026.
// This enables agents to pause execution and await human input before
// continuing, which is essential for workflows where certain decisions
// require human judgment rather than autonomous model action.
//
// This is particularly relevant for enterprise workflows where AI agents
// need approval before taking consequential actions.

import { callModel, tool, stepCountIs } from "@openrouter/agent";
import { z } from "zod";
import * as readline from "readline";
import * as dotenv from "dotenv";

dotenv.config();

// ---------------------------------------------------------------------------
// Human-in-the-Loop Tool
// ---------------------------------------------------------------------------

/**
 * Creates a readline interface for reading human input from the terminal.
 * In a production application, this would be replaced with a web UI,
 * a Slack integration, or another human-facing interface.
 */
function createHumanInputReader(): readline.Interface {
    return readline.createInterface({
        input: process.stdin,
        output: process.stdout,
    });
}

/**
 * Human approval tool: pauses agent execution and waits for human input.
 * The agent will call this tool when it needs human judgment before
 * proceeding with a consequential action.
 */
const humanApprovalTool = tool({
    name: "requestHumanApproval",
    description: "Pauses execution and requests human approval before "
        + "proceeding with a consequential action. Use this when the action "
        + "has significant real-world consequences that require human judgment.",
    schema: z.object({
        action_description: z.string().describe(
            "A clear description of the action requiring approval."
        ),
        reason: z.string().describe(
            "Why this action requires human approval."
        ),
        risk_level: z.enum(["low", "medium", "high"]).describe(
            "The risk level of the proposed action."
        ),
    }),
    execute: async ({ action_description, reason, risk_level }) => {
        const rl = createHumanInputReader();

        console.log("\n" + "!".repeat(60));
        console.log("HUMAN APPROVAL REQUIRED");
        console.log("!".repeat(60));
        console.log(`Risk Level:  ${risk_level.toUpperCase()}`);
        console.log(`Action:      ${action_description}`);
        console.log(`Reason:      ${reason}`);
        console.log();

        return new Promise<{ approved: boolean; feedback: string }>(
            (resolve) => {
                rl.question(
                    "Approve this action? (yes/no) and optional feedback: ",
                    (answer) => {
                        rl.close();
                        const parts = answer.trim().split(/\s+/);
                        const approved = parts[0].toLowerCase() === "yes";
                        const feedback = parts.slice(1).join(" ") || "";

                        console.log(
                            `\nHuman decision: ${approved ? "APPROVED" : "REJECTED"}`
                        );
                        if (feedback) {
                            console.log(`Feedback: ${feedback}`);
                        }

                        resolve({ approved, feedback });
                    }
                );
            }
        );
    },
});

/**
 * Simulated action execution tool.
 * In a real application, this would perform actual system operations.
 */
const executeActionTool = tool({
    name: "executeAction",
    description: "Executes an approved action in the system.",
    schema: z.object({
        action: z.string().describe("The action to execute."),
        parameters: z.record(z.string()).optional().describe(
            "Optional parameters for the action."
        ),
    }),
    execute: async ({ action, parameters }) => {
        console.log(`\n[System] Executing action: ${action}`);
        if (parameters) {
            console.log(`[System] Parameters: ${JSON.stringify(parameters)}`);
        }
        // Simulate action execution.
        return {
            success: true,
            message: `Action "${action}" executed successfully.`,
            timestamp: new Date().toISOString(),
        };
    },
});

// ---------------------------------------------------------------------------
// Human-in-the-Loop Agent
// ---------------------------------------------------------------------------

/**
 * Runs an agent that requires human approval for consequential actions.
 *
 * @param task - The task for the agent to complete.
 */
async function runHumanInTheLoopAgent(task: string): Promise<void> {
    console.log("=".repeat(60));
    console.log("Human-in-the-Loop Agent");
    console.log(`Task: ${task}`);
    console.log("=".repeat(60));

    const result = await callModel({
        model: "anthropic/claude-sonnet-4-6",
        messages: [
            {
                role: "system",
                content: "You are a careful AI assistant that always requests "
                    + "human approval before taking any action with medium or "
                    + "high risk. Use the requestHumanApproval tool to pause "
                    + "and get approval, then use executeAction to proceed if "
                    + "approved. If rejected, explain why you cannot complete "
                    + "the task and suggest alternatives.",
            },
            {
                role: "user",
                content: task,
            },
        ],
        tools: [humanApprovalTool, executeActionTool],
        stopWhen: [stepCountIs(10)],
    });

    const finalResponse = await result.getText();

    console.log("\n" + "=".repeat(60));
    console.log("Agent Final Response:");
    console.log("=".repeat(60));
    console.log(finalResponse);
}

// Entry point.
runHumanInTheLoopAgent(
    "Please delete all log files older than 30 days from the production server "
    + "and send a summary report to the operations team."
).catch((error) => {
    console.error("Agent error:", error);
    process.exit(1);
});

CHAPTER FOURTEEN: HONEST LIABILITIES AND CRITICISMS

OpenRouter is a genuinely useful platform, but intellectual honesty requires acknowledging its real limitations and risks.

DEPENDENCY ON A THIRD-PARTY AGGREGATOR

When you build your application on top of OpenRouter, you are trusting a single company with your routing, your billing, and potentially your API keys. If OpenRouter experiences an outage, your application experiences an outage. If OpenRouter changes its pricing or terms, you are affected. The forty million dollar Series A and the reported $120 million follow-on round suggest the company is well-capitalized, but the dependency risk is real and should be factored into architectural decisions for mission-critical applications.

PROXY LATENCY

Every request that passes through OpenRouter adds approximately 25-40ms of latency compared to calling a provider directly. For most applications this overhead is negligible. For latency-critical applications where every millisecond matters, this additional hop may be significant. Auto Exacto helps by routing to the fastest high-quality provider, but it cannot eliminate the proxy overhead entirely.

THE PROMPT LOGGING OPT-IN CLAUSE

The irrevocable right to commercial use of inputs and outputs when prompt logging is enabled remains one of the most criticized aspects of OpenRouter's terms of service. The one percent discount offered in exchange for this right is, for most professional users, not worth the trade-off. The practical advice is simple: do not enable prompt logging.

GDPR COMPLEXITY

OpenRouter's standard tier does not provide data residency guarantees. EU in- region routing is available but requires an enterprise account and explicit configuration. For organizations subject to GDPR, the lack of built-in per- request audit logs showing the processing region makes compliance documentation difficult. Some GDPR specialists argue that OpenRouter's standard approach is insufficient because it sends data to various backends, making it difficult to define the data processor's location for regulatory purposes.

NO HIPAA BAA

OpenRouter does not publish a HIPAA Business Associate Agreement. Teams working with Protected Health Information should not use OpenRouter without such an agreement in place.

DISCLAIMER OF RESPONSIBILITY FOR UNDERLYING PROVIDERS

OpenRouter explicitly disclaims contractual liability if an underlying model provider retains and trains on user data. This shifts the burden of proof and potential claims to the user. This is a standard posture for aggregators but worth understanding clearly before deploying OpenRouter in regulated contexts.

QUANTIZED MODELS

Some models available through OpenRouter may be quantized versions of the original models, meaning they have been compressed in ways that can subtly affect response quality. The difference is often imperceptible, but for applications where response quality is paramount, this is worth being aware of. Auto Exacto helps by routing to providers with better benchmark scores, but it cannot guarantee that a given provider is serving the full-precision model.

CUSTOMER SUPPORT

Some user reviews mention difficulties with customer support, including locked accounts and challenges obtaining refunds. This is a common growing pain for fast-scaling startups, but it is worth noting for teams evaluating OpenRouter for mission-critical applications.

CHAPTER FIFTEEN: THE BIGGER PICTURE — WHAT COMES NEXT

OpenRouter occupies a fascinating position in the AI ecosystem. It is not a model lab: it does not train models. It is not an inference provider: it does not run GPU clusters. It is infrastructure for infrastructure — a meta-layer that makes the entire ecosystem more accessible and more resilient.

The platform's trajectory is remarkable. From $5M in annualized revenue in May 2025 to $50M in early 2026. From 5 trillion tokens per week to 20 trillion. From a text-only aggregator to a platform supporting text, vision, audio, and video. From a simple proxy to a full agentic development platform with its own SDK, human-in-the-loop tools, web search, and MCP integration.

The Chinese model surge is perhaps the most significant trend shaping OpenRouter's near-term future. When Chinese models account for 61% of token consumption and all six top models by weekly token volume are from China, the platform is no longer primarily a gateway to Western AI providers. It has become a genuinely global marketplace for AI inference, with all the geopolitical and strategic complexity that implies.

The MCP (Model Context Protocol) integration is another trend worth watching. As AI agents become more sophisticated, the ability to connect them to external tools and data sources through standardized protocols becomes increasingly important. OpenRouter's support for MCP, combined with its new Agent SDK and human-in-the-loop tools, positions it well for the agentic AI era.

The enterprise market represents OpenRouter's next major growth frontier. The features introduced in 2026 — workspaces, EU in-region routing, response caching, the Agent SDK, human-in-the-loop tools — are all signals of a platform maturing toward enterprise requirements. The reported $120 million round at a $1.3 billion valuation, if completed, would provide the capital to accelerate this transition.

The competitive landscape will intensify. Portkey, LiteLLM, Bifrost, and a growing field of specialized alternatives are all competing for the same developer mindshare. OpenRouter's advantages — the breadth of its model catalog, the simplicity of its unified API, the quality of its routing intelligence, and the network effects of its usage data — are real but not insurmountable.

What is clear is that the problem OpenRouter solves is not going away. The AI model ecosystem is not converging toward a single dominant model or provider. It is diversifying, specializing, and becoming more complex by the week. In that environment, a well-designed routing and aggregation layer is not a luxury: it is infrastructure as essential as a load balancer or a message queue.

EPILOGUE: SHOULD YOU USE IT?

For individual developers and small teams building AI-powered applications, OpenRouter is an easy recommendation. The free tier with thirty- plus models makes experimentation genuinely free. The unified API eliminates the overhead of managing multiple provider relationships. The fallback routing and Auto Exacto make applications more resilient with minimal additional code. The model catalog gives you access to the entire frontier of AI capability — including the remarkable cost-performance of DeepSeek V4 Pro at $0.435 per million input tokens — through a single interface.

For larger organizations and regulated industries, the calculus is more nuanced. The privacy and compliance considerations are real and require careful evaluation. The Zero Data Retention feature and explicit provider routing controls go a long way toward addressing these concerns, but they require active configuration rather than passive reliance on defaults. Legal and compliance teams should review the terms of service carefully, particularly the prompt logging opt-in clause, before deploying OpenRouter in contexts involving sensitive data. For GDPR-sensitive workloads, an enterprise account with EU in-region routing is necessary. For HIPAA-covered workloads, a BAA must be in place.

For production applications at scale, OpenRouter is a powerful tool for prototyping, experimentation, and multi-model workflows. The 25-40ms proxy latency is acceptable for most use cases. The Agent SDK opens up sophisticated agentic workflows that would previously have required significant custom engineering. The response caching feature reduces costs for repetitive workloads.

The USB-C port of the AI world has grown up. It now supports not just text, but audio, video, web search, agentic workflows, human-in-the-loop decision points, and a global marketplace of over five hundred models. It processes twenty trillion tokens per week and is growing fast. It has real limitations that deserve honest acknowledgment, and real strengths that deserve genuine appreciation.

Alex Atallah saw the fragmentation coming before most people recognized it as a problem, and he built something that addresses it with growing sophistication. In a landscape that is becoming more complex by the week, that kind of focused, well-executed infrastructure is genuinely valuable.

Plug in.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Monday, May 11, 2026

OPENROUTER IN 2026: THE DEFINITIVE GUIDE TO THE AI WORLD'S UNIVERSAL ADAPTER

`PROLOGUE: THE PROBLEM THAT MADE OPENROUTER INEVITABLE`

`CHAPTER ONE: THE ORIGIN STORY — FROM OPENSEA TO THE AI SWITCHBOARD`

`CHAPTER TWO: WHAT OPENROUTER ACTUALLY IS`

`CHAPTER THREE: THE FEATURE LANDSCAPE — EVERYTHING THE PLATFORM CAN DO`

`CHAPTER FOUR: THE MODEL ECOSYSTEM — 500+ MODELS AND COUNTING`

`CHAPTER FIVE: PRICING, CREDITS, AND THE ECONOMICS OF AGGREGATION`

`CHAPTER SIX: THE 2026 FEATURE WAVE — WHAT IS NEW THIS YEAR`

`CHAPTER SEVEN: PRIVACY, DATA HANDLING, AND THE COMPLIANCE CONVERSATION`

`CHAPTER EIGHT: THE COMPETITIVE LANDSCAPE — OPENROUTER VS. THE FIELD`

`CHAPTER NINE: THE CHINESE MODEL SURGE — A 2026 PHENOMENON`

`CHAPTER TEN: GETTING STARTED — INSTALLATION AND CONFIGURATION`