Developer documentation

A small API surface on purpose.

InferencePass v1 focuses on OpenAI-compatible chat completions, model discovery, streaming, JSON output, and tools.

Base URL and authentication

Send bearer-authenticated requests to https://api.inferencepass.com/v1. API keys begin with ip_live_ and are displayed once when created.

First request
curl https://api.inferencepass.com/v1/chat/completions \
  -H "Authorization: Bearer $INFERENCEPASS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Supported in v1

The launch API supports system messages, temperature, max_tokens, streaming, JSON response formats, and OpenAI-style tool definitions and tool calls.

  • GET /v1/models
  • POST /v1/chat/completions
  • Server-sent event streaming

Deliberately out of scope

Images, audio, embeddings, fine-tuning, and the Responses API are not part of v1. Unsupported paths return a clear OpenAI-shaped error rather than silently changing behavior.