API reference

POST /v1/chat/completions

Create a routed chat completion using OpenAI-compatible messages, streaming, structured output, and tools.

Request body

model and messages are required. model accepts Auto, Fast, Reasoning, or a currently published concrete model identifier. stream, temperature, max_tokens, response_format, tools, and tool_choice are optional.

  • Maximum input context: 128K tokens
  • Maximum requested output: 64K tokens
  • Unknown fields are rejected where they would create ambiguous behavior

Streaming

Streaming responses use text/event-stream and end with data: [DONE]. The router may try one alternate provider before any response data is emitted. It never retries after the first chunk reaches the client.

Errors

Authentication, entitlement, validation, rate, and upstream failures use OpenAI-style error objects with a human-readable message, type, optional parameter, and stable code.