Request body

model and messages are required. model accepts Auto, Fast, Reasoning, or a currently published concrete model identifier. stream, temperature, max_tokens, response_format, tools, and tool_choice are optional.

Maximum input context: 128K tokens
Maximum requested output: 64K tokens
Unknown fields are rejected where they would create ambiguous behavior

Streaming

Streaming responses use text/event-stream and end with data: [DONE]. The router may try one alternate provider before any response data is emitted. It never retries after the first chunk reaches the client.

Errors

Authentication, entitlement, validation, rate, and upstream failures use OpenAI-style error objects with a human-readable message, type, optional parameter, and stable code.