OpenAI-compatible inference for developers

One API pass.
Unlimited requests.
Five dollars.

Route requests across healthy AI providers, use a built-in web chat, and stop calculating token bills before every experiment.

Join the waitlist Read the quickstart

OpenAI-compatible Live routing Prompt-free API logs

Live routing receiptConnecting

1 Request

POST /v1/chat/completions
Model: auto
Stream: true

2 Route evaluation

Waiting for the live provider catalog.

AutoBest healthy route

FastLowest latency route

ReasoningHighest quality route

3 Selected provider

Selection begins when providers report healthy.

4 Streamed response

Gateway health is being established.

Live API example · cURL

curl https://api.inferencepass.com/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCEPASS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role":"user","content":"Explain routing simply."}],
    "stream": true
  }'

Response · streamed

data: {"id":"chatcmpl_...","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{"content":"A routed API "}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{"content":"selects the healthiest provider."}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]

Stable public selectors

Three modes. One simple choice.

Auto

The best healthy route for reliability and simplicity.

Balanced latency
Provider failover
Free-tier default

Fast

The lowest-latency compatible provider for realtime work.

Latency-prioritized
Streaming friendly
Paid plan

Reasoning

Higher-quality routes for analysis, coding, and complex tasks.

Quality-prioritized
Tool-aware routing
Paid plan

One account

API and chat. All in one workspace.

Create keys, inspect request metadata, manage your subscription, and explore the routing engine in the included web chat.

New chatMode: Auto · Live routing

API access and chat, together.

Test the same routing modes your application uses. Start a conversation, inspect the selected provider, then move the prompt into your code.

Explain an unfamiliar codebaseDesign a JSON schemaPlan an API migration

Transparent pricing

No calculator. No surprise invoice.

Free

$0 / month

25 verified requests per day
Auto routing mode
API access and web chat
No credit card required

Notify me

Unlimited

$5 / month

No request quota
No token quota
Auto, Fast, and Reasoning modes
JSON mode, tools, streaming, API, and chat
Cancel any time

Coming soon - notify me

Verified June 15, 2026

InferencePass vs metered routing.

Feature	InferencePass	LLM7	OpenRouter
Paid price	$5/month fixed	$12/month Pro	Metered by model
Request quota	Unlimited	Published rate limits	Varies
Token quota	Unlimited	5B/day on Pro	Paid token usage
API and chat	One account	Separate products	Included
Routing modes	Auto, Fast, Reasoning	Default, Fast, Pro	Provider/model routing

Competitor details are sourced from LLM7 pricing and OpenRouter pricing. Limits and prices can change; check their current pages before purchasing.

Drop-in setup

Integrate in minutes.

cURL

curl https://api.inferencepass.com/v1/chat/completions \
 -H "Authorization: Bearer $INFERENCEPASS_API_KEY"

Python

from openai import OpenAI
client = OpenAI(
  api_key=os.environ["INFERENCEPASS_API_KEY"],
  base_url="https://api.inferencepass.com/v1"
)

JavaScript

const client = new OpenAI({
  apiKey: process.env.INFERENCEPASS_API_KEY,
  baseURL: "https://api.inferencepass.com/v1"
});

Common questions

The important details, plainly.

Is InferencePass OpenAI-compatible?

Yes. The v1 launch supports chat completions, streaming, JSON response modes, system messages, and OpenAI-style tools and function calls.

What does Unlimited include?

Unlimited includes API requests, routed model modes, and browser chat without a monthly request or token allowance. Service-level concurrency, burst, context, and acceptable-use protections still apply.

Is web chat included?

Yes. Free and Unlimited accounts can use the browser chat. It calls the same gateway and routing modes available through the API.

Does the API store prompts?

No. The API gateway records operational metadata such as provider, latency, and token counts, but it does not persist API prompts or generated responses.

ONE API PASS$5MONTHLY

Closed beta. Open soon.

Drop your email and we will let you know when access opens.

Join the waitlist Read the docs

Free-tier requests are powered in part by pollinations.ai.

Operator? Sign in

One API pass.Unlimited requests.Five dollars.