Auto
The best healthy route for reliability and simplicity.
- Balanced latency
- Provider failover
- Free-tier default
OpenAI-compatible inference for developers
Route requests across healthy AI providers, use a built-in web chat, and stop calculating token bills before every experiment.
curl https://api.inferencepass.com/v1/chat/completions \
-H "Authorization: Bearer $INFERENCEPASS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role":"user","content":"Explain routing simply."}],
"stream": true
}'data: {"id":"chatcmpl_...","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{"content":"A routed API "}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{"content":"selects the healthiest provider."}}]}
data: {"id":"chatcmpl_...","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]Stable public selectors
The best healthy route for reliability and simplicity.
The lowest-latency compatible provider for realtime work.
Higher-quality routes for analysis, coding, and complex tasks.
One account
Create keys, inspect request metadata, manage your subscription, and explore the routing engine in the included web chat.
Test the same routing modes your application uses. Start a conversation, inspect the selected provider, then move the prompt into your code.
Transparent pricing
Verified June 15, 2026
| Feature | InferencePass | LLM7 | OpenRouter |
|---|---|---|---|
| Paid price | $5/month fixed | $12/month Pro | Metered by model |
| Request quota | Unlimited | Published rate limits | Varies |
| Token quota | Unlimited | 5B/day on Pro | Paid token usage |
| API and chat | One account | Separate products | Included |
| Routing modes | Auto, Fast, Reasoning | Default, Fast, Pro | Provider/model routing |
Competitor details are sourced from LLM7 pricing and OpenRouter pricing. Limits and prices can change; check their current pages before purchasing.
Drop-in setup
curl https://api.inferencepass.com/v1/chat/completions \
-H "Authorization: Bearer $INFERENCEPASS_API_KEY"from openai import OpenAI
client = OpenAI(
api_key=os.environ["INFERENCEPASS_API_KEY"],
base_url="https://api.inferencepass.com/v1"
)const client = new OpenAI({
apiKey: process.env.INFERENCEPASS_API_KEY,
baseURL: "https://api.inferencepass.com/v1"
});Common questions
Yes. The v1 launch supports chat completions, streaming, JSON response modes, system messages, and OpenAI-style tools and function calls.
Unlimited includes API requests, routed model modes, and browser chat without a monthly request or token allowance. Service-level concurrency, burst, context, and acceptable-use protections still apply.
Yes. Free and Unlimited accounts can use the browser chat. It calls the same gateway and routing modes available through the API.
No. The API gateway records operational metadata such as provider, latency, and token counts, but it does not persist API prompts or generated responses.
Get a free API key in seconds. No credit card required.