> ## Documentation Index
> Fetch the complete documentation index at: https://docs.xpander.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Gateway

> The multi-turn conversational surface for talking with an agent over a long-lived conversation.

The **agent gateway** is the conversational execution surface of the platform. It is the same engine the xpander chat app drives, now exposed on the REST API behind your normal API key (`x-api-key`) or OAuth2 JWT.

Where [Invoke Agent](/api-reference/v1/agents/invoke-sync) runs a single one-shot task, the gateway models a **conversation**: a long-lived thread you can send follow-ups to, queue messages against, stop mid-run, edit, and resume from interactive cards.

## Why the gateway

Two properties make the gateway the right surface for a chat-style experience:

* **Fast time-to-first-token (TTFT).** The gateway is a lightweight router. It streams a `connected` event the instant the request lands, then its router LLM acknowledges and starts answering immediately, before any heavy downstream work begins. You get a first token in roughly one network round-trip instead of waiting for a full agent to spin up and run.
* **Effectively unlimited context window.** The router never stuffs the whole conversation into one LLM context. The actual work is delegated to **sub-executions** - each a fresh, bounded agent run - and their results are summarized back into the conversation. Long threads are compacted as they grow, so a conversation can run far longer than any single model's context limit without degrading.

The trade-off versus one-shot [invoke](/api-reference/v1/agents/invoke-sync): the gateway optimizes for an interactive, growing conversation; invoke optimizes for a single self-contained task you call and forget.

## Conversation model

A conversation is identified by a `conversation_id` (the execution id of its first turn). Within a conversation:

* **Run a turn** with [Run Gateway Turn](/api-reference/v1/agents/gateway/run-turn) or its [streaming variant](/api-reference/v1/agents/gateway/run-turn-stream), passing `id` = your `conversation_id` to start or continue the same conversation. The stream path is the fast-first-token entrypoint.
* **Send a follow-up** with [Send Conversation Message](/api-reference/v1/agents/gateway/send-message). While a turn is running, the follow-up is **queued**; when idle, the response tells you to start a fresh turn.
* **Run-state** ([run-state](/api-reference/v1/agents/gateway/run-state)) is a cheap snapshot you poll to decide what to render: is a turn live, how deep is the queue.
* **Drain** ([drain/stream](/api-reference/v1/agents/gateway/drain-stream)) runs the queued follow-ups for an idle conversation, streaming each turn.
* **Stop** ([stop](/api-reference/v1/agents/gateway/stop)) cancels the active run; the queue survives unless you clear it.
* **Edit** ([edit](/api-reference/v1/agents/gateway/edit-message)) rewrites a prior user message, trims the tail, and re-runs the turn.
* **Answer** ([answers](/api-reference/v1/agents/gateway/submit-answers)) resumes a turn that paused on an `ask_user_questions` card.

## How it works: the gateway is a router

The agent you talk to through the gateway is a **router**. When you send a message, the gateway's own LLM decides what to do with it: answer inline, ask you a clarifying question, or dispatch the actual work to the **downstream agent** as a separate execution (a **sub-execution**).

```mermaid theme={"dark"}
flowchart LR
  U([You]) -- message --> C[Conversation<br/>parent execution<br/>conversation_id]
  C -- router LLM decides --> C
  C -- answer inline --> U
  C -- sync sub-execution --> S[Downstream agent]
  C -- async sub-execution --> A[Downstream agent]
  S -- events stream to root --> C
  A -. result pushed to root when done .-> C
```

So one conversation (the **parent execution**, keyed by `conversation_id` = "root") can spawn many **sub-executions**, one per task the router dispatches. Each sub-execution is a full agent run with its own id, status, result, and token usage.

### Sub-executions: sync vs async

The router picks a mode per task:

* **Sync (foreground).** The router drives the downstream agent inline and **streams its events onto the root conversation** as they happen, so you see the child's `tool_call_*`, `chunk`, and `sub_task_finished` events live on the same SSE. The router waits for the child before continuing the turn.
* **Async (background).** The router dispatches the task and keeps going without waiting. You get the **sub-execution id** back immediately, so you can query it on its own with [Get Task](/api-reference/v1/tasks/get-task). When it finishes, its **result is pushed to the root** conversation (a `sub_task_finished` event on a live stream, and it appears under the conversation's `sub_executions` on the next [Get Conversation](/api-reference/v1/agents/gateway/get-conversation)).

The parent execution tracks its children in `sub_executions` (in-flight) and `finished_sub_executions` (done). Either way the root conversation is the single place that accumulates every result.

## Sending messages while a turn runs (the queue)

A conversation runs **one turn at a time**. If you send a follow-up while a turn is still running, it does not interrupt or run in parallel - it goes into the conversation's **queue** and runs as the next turn once the current one finishes.

[Send Conversation Message](/api-reference/v1/agents/gateway/send-message) takes a `mode`:

* `mode=auto` (default): queue the message if a turn is running; if the conversation is idle, the response is `started`, meaning you should run the turn yourself with [Run Gateway Turn](/api-reference/v1/agents/gateway/run-turn-stream) (`id` = `conversation_id`).
* `mode=queue`: always enqueue, even if idle.

The response is `{ action, message_id, queue_depth }` - `queued` with the new depth, or `started` when idle.

```mermaid theme={"dark"}
sequenceDiagram
  participant C as Client
  participant Conv as Conversation (root)
  C->>Conv: invoke (id = conversation_id), SSE open, turn running
  C->>Conv: POST /messages (follow-up while running)
  Conv-->>C: 200 { action: "queued", queue_depth: 1 }
  Note over Conv: current turn finishes
  Conv-->>C: gateway_queue_updated (drain) then the next turn's events
  Note over Conv: drains the queue in order until empty
```

How queued messages actually run:

* If a **live stream** is attached (the turn that was running has an open SSE), it drains the queue itself: when the turn finishes it pops the next message and runs it as the following turn on the **same** stream. Each drain is announced with a `gateway_queue_updated` event (`last_action: "drain"`).
* If **nothing** is holding a stream (you queued over plain REST then disconnected, or queued while idle), call [Drain Queue](/api-reference/v1/agents/gateway/drain-stream) to run the pending messages and stream them. A background reaper is the backstop: it resumes a conversation whose queue is non-empty but has no live run.

Manage the queue with [Run State](/api-reference/v1/agents/gateway/run-state) (current `queue_depth` + previews), [Cancel Queued Message](/api-reference/v1/agents/gateway/cancel-queued) (drop one), and [Clear Queue](/api-reference/v1/agents/gateway/clear-queue) (drop all). [Stop](/api-reference/v1/agents/gateway/stop) cancels the running turn and, by default, keeps the queue.

## Getting results

There are three ways to read what an agent produced, depending on how you called it:

1. **From the stream.** On any streaming endpoint, the terminal `task_finished` event carries the finalized execution, including its `result`. Foreground sub-task results also arrive on the stream as `sub_task_finished` events.
2. **Poll the conversation.** Call [Get Conversation](/api-reference/v1/agents/gateway/get-conversation) at any time. It returns the parent execution (with `status` and `result`), the full activity thread, aggregate token usage, and **every sub-execution's own `(execution, activity, usage)` triple** - so a single call gives you the downstream agents' results too.
3. **Poll the task directly.** Every execution id (the `conversation_id` and each sub-execution id) is a task. Use [Get Task](/api-reference/v1/tasks/get-task) to fetch one execution's status + result, or [Get Task Thread](/api-reference/v1/tasks/get-thread) for its activity log.

### Without holding a stream

For a request/response integration that doesn't keep an SSE open:

1. Run the turn with [Run Gateway Turn](/api-reference/v1/agents/gateway/run-turn) (sync). It blocks until the turn completes and returns the execution with its `result`.
2. Add follow-ups with [Send Conversation Message](/api-reference/v1/agents/gateway/send-message) using `mode=queue`, then run them with [Drain Queue](/api-reference/v1/agents/gateway/drain-stream) (a background reaper also picks up an orphaned queue).
3. Check progress any time with [Run State](/api-reference/v1/agents/gateway/run-state) (`is_running`, `queue_depth`) or [Get Task](/api-reference/v1/tasks/get-task) on the `conversation_id` (status `completed` / `failed` / `stopped`), and read results with [Get Conversation](/api-reference/v1/agents/gateway/get-conversation) (parent `result` plus each sub-execution's `result`).

## Streaming

The streaming endpoints ([run a turn](/api-reference/v1/agents/gateway/run-turn-stream), drain, answers, edit-stream) return **Server-Sent Events**. The stream opens with a `connected` event (fast first byte), then each `data:` line is a JSON `TaskUpdateEvent` (`task_created`, `chunk`, `tool_call_request`, `sub_task_created`, `sub_task_finished`, `task_finished`, and so on) - the same event shape as [Invoke Agent (Stream)](/api-reference/v1/agents/invoke-stream).

## Auth and identity

Every route is agent-scoped and authorized by your API key's access to that agent, exactly like the rest of the v1 API. There is no end-user session, so message and answer bodies accept an optional `user` object if you want to attribute the turn to a specific person; if you omit it, the turn runs without an attributed end-user (continuations keep whatever user the conversation already carries).