Skip to main content
The agent gateway is the conversational execution surface of the platform. It is the same engine the xpander chat app drives, now exposed on the REST API behind your normal API key (x-api-key) or OAuth2 JWT. Where Invoke Agent runs a single one-shot task, the gateway models a conversation: a long-lived thread you can send follow-ups to, queue messages against, stop mid-run, edit, and resume from interactive cards.

Why the gateway

Two properties make the gateway the right surface for a chat-style experience:
  • Fast time-to-first-token (TTFT). The gateway is a lightweight router. It streams a connected event the instant the request lands, then its router LLM acknowledges and starts answering immediately, before any heavy downstream work begins. You get a first token in roughly one network round-trip instead of waiting for a full agent to spin up and run.
  • Effectively unlimited context window. The router never stuffs the whole conversation into one LLM context. The actual work is delegated to sub-executions - each a fresh, bounded agent run - and their results are summarized back into the conversation. Long threads are compacted as they grow, so a conversation can run far longer than any single model’s context limit without degrading.
The trade-off versus one-shot invoke: the gateway optimizes for an interactive, growing conversation; invoke optimizes for a single self-contained task you call and forget.

Conversation model

A conversation is identified by a conversation_id (the execution id of its first turn). Within a conversation:
  • Run a turn with Run Gateway Turn or its streaming variant, passing id = your conversation_id to start or continue the same conversation. The stream path is the fast-first-token entrypoint.
  • Send a follow-up with Send Conversation Message. While a turn is running, the follow-up is queued; when idle, the response tells you to start a fresh turn.
  • Run-state (run-state) is a cheap snapshot you poll to decide what to render: is a turn live, how deep is the queue.
  • Drain (drain/stream) runs the queued follow-ups for an idle conversation, streaming each turn.
  • Stop (stop) cancels the active run; the queue survives unless you clear it.
  • Edit (edit) rewrites a prior user message, trims the tail, and re-runs the turn.
  • Answer (answers) resumes a turn that paused on an ask_user_questions card.

How it works: the gateway is a router

The agent you talk to through the gateway is a router. When you send a message, the gateway’s own LLM decides what to do with it: answer inline, ask you a clarifying question, or dispatch the actual work to the downstream agent as a separate execution (a sub-execution). So one conversation (the parent execution, keyed by conversation_id = “root”) can spawn many sub-executions, one per task the router dispatches. Each sub-execution is a full agent run with its own id, status, result, and token usage.

Sub-executions: sync vs async

The router picks a mode per task:
  • Sync (foreground). The router drives the downstream agent inline and streams its events onto the root conversation as they happen, so you see the child’s tool_call_*, chunk, and sub_task_finished events live on the same SSE. The router waits for the child before continuing the turn.
  • Async (background). The router dispatches the task and keeps going without waiting. You get the sub-execution id back immediately, so you can query it on its own with Get Task. When it finishes, its result is pushed to the root conversation (a sub_task_finished event on a live stream, and it appears under the conversation’s sub_executions on the next Get Conversation).
The parent execution tracks its children in sub_executions (in-flight) and finished_sub_executions (done). Either way the root conversation is the single place that accumulates every result.

Sending messages while a turn runs (the queue)

A conversation runs one turn at a time. If you send a follow-up while a turn is still running, it does not interrupt or run in parallel - it goes into the conversation’s queue and runs as the next turn once the current one finishes. Send Conversation Message takes a mode:
  • mode=auto (default): queue the message if a turn is running; if the conversation is idle, the response is started, meaning you should run the turn yourself with Run Gateway Turn (id = conversation_id).
  • mode=queue: always enqueue, even if idle.
The response is { action, message_id, queue_depth } - queued with the new depth, or started when idle. How queued messages actually run:
  • If a live stream is attached (the turn that was running has an open SSE), it drains the queue itself: when the turn finishes it pops the next message and runs it as the following turn on the same stream. Each drain is announced with a gateway_queue_updated event (last_action: "drain").
  • If nothing is holding a stream (you queued over plain REST then disconnected, or queued while idle), call Drain Queue to run the pending messages and stream them. A background reaper is the backstop: it resumes a conversation whose queue is non-empty but has no live run.
Manage the queue with Run State (current queue_depth + previews), Cancel Queued Message (drop one), and Clear Queue (drop all). Stop cancels the running turn and, by default, keeps the queue.

Getting results

There are three ways to read what an agent produced, depending on how you called it:
  1. From the stream. On any streaming endpoint, the terminal task_finished event carries the finalized execution, including its result. Foreground sub-task results also arrive on the stream as sub_task_finished events.
  2. Poll the conversation. Call Get Conversation at any time. It returns the parent execution (with status and result), the full activity thread, aggregate token usage, and every sub-execution’s own (execution, activity, usage) triple - so a single call gives you the downstream agents’ results too.
  3. Poll the task directly. Every execution id (the conversation_id and each sub-execution id) is a task. Use Get Task to fetch one execution’s status + result, or Get Task Thread for its activity log.

Without holding a stream

For a request/response integration that doesn’t keep an SSE open:
  1. Run the turn with Run Gateway Turn (sync). It blocks until the turn completes and returns the execution with its result.
  2. Add follow-ups with Send Conversation Message using mode=queue, then run them with Drain Queue (a background reaper also picks up an orphaned queue).
  3. Check progress any time with Run State (is_running, queue_depth) or Get Task on the conversation_id (status completed / failed / stopped), and read results with Get Conversation (parent result plus each sub-execution’s result).

Streaming

The streaming endpoints (run a turn, drain, answers, edit-stream) return Server-Sent Events. The stream opens with a connected event (fast first byte), then each data: line is a JSON TaskUpdateEvent (task_created, chunk, tool_call_request, sub_task_created, sub_task_finished, task_finished, and so on) - the same event shape as Invoke Agent (Stream).

Auth and identity

Every route is agent-scoped and authorized by your API key’s access to that agent, exactly like the rest of the v1 API. There is no end-user session, so message and answer bodies accept an optional user object if you want to attribute the turn to a specific person; if you omit it, the turn runs without an attributed end-user (continuations keep whatever user the conversation already carries).