x-api-key) or OAuth2 JWT.
Where Invoke Agent runs a single one-shot task, the gateway models a conversation: a long-lived thread you can send follow-ups to, queue messages against, stop mid-run, edit, and resume from interactive cards.
Why the gateway
Two properties make the gateway the right surface for a chat-style experience:- Fast time-to-first-token (TTFT). The gateway is a lightweight router. It streams a
connectedevent the instant the request lands, then its router LLM acknowledges and starts answering immediately, before any heavy downstream work begins. You get a first token in roughly one network round-trip instead of waiting for a full agent to spin up and run. - Effectively unlimited context window. The router never stuffs the whole conversation into one LLM context. The actual work is delegated to sub-executions - each a fresh, bounded agent run - and their results are summarized back into the conversation. Long threads are compacted as they grow, so a conversation can run far longer than any single model’s context limit without degrading.
Conversation model
A conversation is identified by aconversation_id (the execution id of its first turn). Within a conversation:
- Run a turn with Run Gateway Turn or its streaming variant, passing
id= yourconversation_idto start or continue the same conversation. The stream path is the fast-first-token entrypoint. - Send a follow-up with Send Conversation Message. While a turn is running, the follow-up is queued; when idle, the response tells you to start a fresh turn.
- Run-state (run-state) is a cheap snapshot you poll to decide what to render: is a turn live, how deep is the queue.
- Drain (drain/stream) runs the queued follow-ups for an idle conversation, streaming each turn.
- Stop (stop) cancels the active run; the queue survives unless you clear it.
- Edit (edit) rewrites a prior user message, trims the tail, and re-runs the turn.
- Answer (answers) resumes a turn that paused on an
ask_user_questionscard.
How it works: the gateway is a router
The agent you talk to through the gateway is a router. When you send a message, the gateway’s own LLM decides what to do with it: answer inline, ask you a clarifying question, or dispatch the actual work to the downstream agent as a separate execution (a sub-execution). So one conversation (the parent execution, keyed byconversation_id = “root”) can spawn many sub-executions, one per task the router dispatches. Each sub-execution is a full agent run with its own id, status, result, and token usage.
Sub-executions: sync vs async
The router picks a mode per task:- Sync (foreground). The router drives the downstream agent inline and streams its events onto the root conversation as they happen, so you see the child’s
tool_call_*,chunk, andsub_task_finishedevents live on the same SSE. The router waits for the child before continuing the turn. - Async (background). The router dispatches the task and keeps going without waiting. You get the sub-execution id back immediately, so you can query it on its own with Get Task. When it finishes, its result is pushed to the root conversation (a
sub_task_finishedevent on a live stream, and it appears under the conversation’ssub_executionson the next Get Conversation).
sub_executions (in-flight) and finished_sub_executions (done). Either way the root conversation is the single place that accumulates every result.
Sending messages while a turn runs (the queue)
A conversation runs one turn at a time. If you send a follow-up while a turn is still running, it does not interrupt or run in parallel - it goes into the conversation’s queue and runs as the next turn once the current one finishes. Send Conversation Message takes amode:
mode=auto(default): queue the message if a turn is running; if the conversation is idle, the response isstarted, meaning you should run the turn yourself with Run Gateway Turn (id=conversation_id).mode=queue: always enqueue, even if idle.
{ action, message_id, queue_depth } - queued with the new depth, or started when idle.
How queued messages actually run:
- If a live stream is attached (the turn that was running has an open SSE), it drains the queue itself: when the turn finishes it pops the next message and runs it as the following turn on the same stream. Each drain is announced with a
gateway_queue_updatedevent (last_action: "drain"). - If nothing is holding a stream (you queued over plain REST then disconnected, or queued while idle), call Drain Queue to run the pending messages and stream them. A background reaper is the backstop: it resumes a conversation whose queue is non-empty but has no live run.
queue_depth + previews), Cancel Queued Message (drop one), and Clear Queue (drop all). Stop cancels the running turn and, by default, keeps the queue.
Getting results
There are three ways to read what an agent produced, depending on how you called it:- From the stream. On any streaming endpoint, the terminal
task_finishedevent carries the finalized execution, including itsresult. Foreground sub-task results also arrive on the stream assub_task_finishedevents. - Poll the conversation. Call Get Conversation at any time. It returns the parent execution (with
statusandresult), the full activity thread, aggregate token usage, and every sub-execution’s own(execution, activity, usage)triple - so a single call gives you the downstream agents’ results too. - Poll the task directly. Every execution id (the
conversation_idand each sub-execution id) is a task. Use Get Task to fetch one execution’s status + result, or Get Task Thread for its activity log.
Without holding a stream
For a request/response integration that doesn’t keep an SSE open:- Run the turn with Run Gateway Turn (sync). It blocks until the turn completes and returns the execution with its
result. - Add follow-ups with Send Conversation Message using
mode=queue, then run them with Drain Queue (a background reaper also picks up an orphaned queue). - Check progress any time with Run State (
is_running,queue_depth) or Get Task on theconversation_id(statuscompleted/failed/stopped), and read results with Get Conversation (parentresultplus each sub-execution’sresult).
Streaming
The streaming endpoints (run a turn, drain, answers, edit-stream) return Server-Sent Events. The stream opens with aconnected event (fast first byte), then each data: line is a JSON TaskUpdateEvent (task_created, chunk, tool_call_request, sub_task_created, sub_task_finished, task_finished, and so on) - the same event shape as Invoke Agent (Stream).
Auth and identity
Every route is agent-scoped and authorized by your API key’s access to that agent, exactly like the rest of the v1 API. There is no end-user session, so message and answer bodies accept an optionaluser object if you want to attribute the turn to a specific person; if you omit it, the turn runs without an attributed end-user (continuations keep whatever user the conversation already carries).
