Query Parameters
Optional custom identifier for the stream. If not provided, a unique ID will be generated automatically.
Body
Request body for initiating asynchronous chat completion with multi-turn conversation support.
Contains message history, tool definitions, system prompts, and response configuration. The request initiates a background process that streams events, which can be observed via the stream endpoint.
The request body defines the complete conversation context and AI behavior parameters for generating asynchronous responses.
Chat request body model for handling chat interactions.
Array of messages composing the chat conversation. Each message should have a 'role' (user, or assistant) and 'content'.
Model to use for the chat completion. If not provided, the default model will be used.
Whether to stream the response back to the client.
List of tools to use for the response.
Define how the model should choose tools for the response.
Context to provide to the tools, such as documents, databases connection strings, or data relevant to tool usage.
List of MCP servers to use for tool retrieval. Each server can have its own configuration.
Format of the response. Can be text, json_schema, or tool call.
System message configuration, including default prompt and citations.
Thinking configuration, enabling reasoning capabilities for the model.
Priority of the request, used for prioritizing responses.
Random seed for reproducibility.
Minimum probability threshold for token selection. Tokens with probability below this value are filtered out.
Nucleus sampling parameter. Only tokens with cumulative probability up to this value are considered.
Controls randomness in generation. Higher values make output more random, lower values more deterministic.
Limits token selection to the top K most likely tokens at each step.
Penalty applied to tokens that have already appeared in the sequence to reduce repetition.
Penalty applied based on whether a token has appeared in the text, encouraging topic diversity.
Penalty applied based on how frequently a token appears in the text, reducing repetitive content.
Maximum number of tokens to generate in the response.
Correlation ID for tracking the request across systems.
Response
Chat stream initiated successfully
Response model for initiated asynchronous chat completion streams
Unique identifier for the initiated stream
Initial status of the stream (typically 'pending')
pending
, processing
, completed
, failed
, cancelled
, error
Confirmation message for successful stream initiation