> ## Documentation Index
> Fetch the complete documentation index at: https://docs.xpander.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Self-Hosted Kubernetes

> Deploy Xpander on your own Kubernetes cluster using the official Helm chart. Outbound-only networking, full data control.

<Note>
  [Contact our team](https://cal.com/team/xpander-ai/activate) to unlock self-hosted locations on your account before starting.
</Note>

## Prerequisites

| Requirement            | Minimum version                                |
| ---------------------- | ---------------------------------------------- |
| **Kubernetes**         | 1.20                                           |
| **Helm**               | 3.12                                           |
| **Ingress controller** | NGINX Ingress Controller or equivalent         |
| **Storage class**      | For persistent volumes (Redis, PostgreSQL)     |
| **TLS**                | cert-manager, or manually-managed certificates |

You also need an environment set up in the Xpander Console to get your `organizationId`, `environmentId`, and `deploymentManagerApiKey`. Create one at [app.xpander.ai/environments](https://app.xpander.ai/environments).

## Architecture

The Helm chart deploys eight application services plus two data stores:

| Component            | Role                                                          |
| -------------------- | ------------------------------------------------------------- |
| **Agent Controller** | Main API endpoint, orchestrates agent execution               |
| **AI Gateway**       | Routes LLM provider requests (OpenAI, Anthropic, etc.)        |
| **Agent Worker**     | Task execution runtime that invokes tools and processes steps |
| **MCP**              | Model Context Protocol server for tool exposure               |
| **Chat**             | Web chat UI backed by Chainlit                                |
| **Code Runner**      | Sandboxed environment for code execution tools                |
| **AWS Operator**     | Manages AWS-specific integrations                             |
| **API**              | Public REST API surface                                       |
| **Redis**            | Cache and session state                                       |
| **PostgreSQL**       | Persistent data store for agents, threads, tasks, and memory  |

In addition, the chart enables an in-cluster Docker registry and a metrics-server by default. Disable them with `dockerRegistry.enabled=false` and `metricsServer.enabled=false` if your cluster already provides equivalents.

When you set a `domain` and enable ingress, the chart creates a hostname for each application service:

* `agent-controller.{domain}` (SDK and REST calls)
* `ai-gateway.{domain}` (LLM provider routing)
* `mcp.{domain}`, `chat.{domain}`, `agent-worker.{domain}`, `code-runner.{domain}`, `aws-operator.{domain}`, `api.{domain}` (internal traffic; you typically only call the first two)

## Networking: outbound only

Your cluster makes outbound connections to Xpander. Xpander never initiates inbound connections to your cluster.

**Outbound destinations** (HTTPS, port 443):

```
166.117.85.46
15.197.85.80
```

Configure your firewall to allow egress from the cluster to these IPs on port 443. All data stays inside your infrastructure.

## Install with Helm

Add the Helm repository:

```bash theme={"dark"}
helm repo add xpander https://charts.xpander.ai
helm repo update
```

Install with the IDs from your Xpander Console environment:

```bash theme={"dark"}
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --set global.organizationId=<your-org-id> \
  --set global.environmentId=<your-env-id> \
  --set secrets.static.deploymentManagerApiKey=<your-deployment-api-key>
```

To expose the deployment via ingress, add `ingress.enabled=true` and a `domain`:

```bash theme={"dark"}
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --set ingress.enabled=true \
  --set domain=xpander.my-company.com \
  --set global.organizationId=<your-org-id> \
  --set global.environmentId=<your-env-id> \
  --set secrets.static.deploymentManagerApiKey=<your-deployment-api-key>
```

### Production install with LLM keys

For production, add your LLM provider keys and use a values file for cert-manager, storage, and resource limits:

```yaml theme={"dark"}
# xpander-values.yaml
domain: "xpander.production.com"

global:
  organizationId: "<your-org-id>"
  environmentId: "<your-env-id>"
  env:
    LOG_LEVEL: "info"
    ENVIRONMENT: "production"

secrets:
  static:
    deploymentManagerApiKey: "<your-deployment-api-key>"

agent-worker:
  env:
    AGENTS_OPENAI_API_KEY: "sk-..."
    ANTHROPIC_API_KEY: "sk-ant-..."

ingress:
  enabled: true
  tls:
    enabled: true
    source: "cert-manager"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"

resources:
  agentController:
    limits:
      cpu: "1000m"
      memory: "1Gi"

redis:
  storage:
    size: "32Gi"
    storageClass: "fast-ssd"
```

```bash theme={"dark"}
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --values xpander-values.yaml
```

<Tip>**Store LLM keys as Kubernetes secrets in production.** Create a secret with `kubectl create secret generic ai-service-keys --from-literal=openai-api-key=... --from-literal=anthropic-api-key=...`, then reference it in the values file using `envFromSecretKeys`.</Tip>

## Configuration reference

### Required parameters

| Parameter                                | Description                  |
| ---------------------------------------- | ---------------------------- |
| `global.organizationId`                  | Your Xpander organization ID |
| `global.environmentId`                   | Your Xpander environment ID  |
| `secrets.static.deploymentManagerApiKey` | Deployment manager API key   |

### Common optional parameters

| Parameter                                | Default                 | Description                                  |
| ---------------------------------------- | ----------------------- | -------------------------------------------- |
| `domain`                                 | `""` (ingress disabled) | Base domain for ingress hostnames            |
| `ingress.enabled`                        | `false`                 | Expose services via ingress                  |
| `ingress.tls.enabled`                    | `false`                 | Enable TLS on ingress                        |
| `ingress.tls.source`                     | `self-signed`           | `self-signed`, `cert-manager`, or `external` |
| `agent-worker.env.AGENTS_OPENAI_API_KEY` | `""`                    | OpenAI API key                               |
| `agent-worker.env.ANTHROPIC_API_KEY`     | `""`                    | Anthropic API key                            |
| `redis.storage.size`                     | `8Gi`                   | Redis PVC size                               |
| `redis.storage.storageClass`             | cluster default         | Storage class for Redis PVC                  |

## Verify the deployment

Check pod status:

```bash theme={"dark"}
kubectl -n xpander get pods
```

You should see one pod for each application service (agent-controller, ai-gateway, agent-worker, mcp, chat, code-runner, aws-operator, api) plus `redis` and `postgres` StatefulSet pods. With default settings the in-cluster `docker-registry` and `metrics-server` pods also run.

Test the public health endpoints through port-forwarding:

```bash theme={"dark"}
kubectl -n xpander port-forward service/xpander-agent-controller 9016:9016 &
kubectl -n xpander port-forward service/xpander-ai-gateway 9018:9018 &

curl http://localhost:9016/health
curl http://localhost:9018/health
```

If ingress is enabled, test through the hostnames:

```bash theme={"dark"}
curl https://agent-controller.xpander.my-company.com/health
curl https://ai-gateway.xpander.my-company.com/health
```

Then return to [app.xpander.ai/environments](https://app.xpander.ai/environments), click **Complete setup** on your environment, and confirm the status changes to **Connected**.

## Connect the SDK to your deployment

Point the SDK at your Agent Controller hostname and use the Agent Controller API key (not your Xpander cloud key):

```python theme={"dark"}
from xpander_sdk import Backend, Configuration
from agno.agent import Agent

config = Configuration(
    api_key="<agent-controller-api-key>",
    organization_id="<your-org-id>",
    base_url="https://agent-controller.xpander.my-company.com",
)

backend = Backend(configuration=config)
agent = Agent(**backend.get_args(agent_id="<agent-id>"))

result = await agent.arun(input="What can you help me with?")
```

<Tip>**Use the Agent Controller hostname, not the root domain.** The SDK needs `base_url=https://agent-controller.{domain}`. The root domain will not resolve to the API.</Tip>

## Upgrade

Pull the latest chart and upgrade in place, reusing your existing values:

```bash theme={"dark"}
helm repo update
helm upgrade xpander xpander/xpander \
  --namespace xpander \
  --reuse-values
```

To change configuration during an upgrade, use the values file:

```bash theme={"dark"}
helm upgrade xpander xpander/xpander \
  --namespace xpander \
  --values xpander-values.yaml
```

## Troubleshoot

<AccordionGroup>
  <Accordion title="Pods stuck in Pending">
    Usually a storage class or PVC issue. Check PVC status:

    ```bash theme={"dark"}
    kubectl -n xpander get pvc
    kubectl -n xpander describe pvc
    ```

    Confirm your cluster has a default storage class, or set `redis.storage.storageClass` and the PostgreSQL equivalent explicitly.
  </Accordion>

  <Accordion title="Ingress not accessible">
    Check the ingress is created and pointing to services:

    ```bash theme={"dark"}
    kubectl -n xpander describe ingress
    ```

    Verify your DNS records point to the ingress controller's external IP and that the controller itself is running (it usually lives in its own namespace, e.g. `ingress-nginx`).
  </Accordion>

  <Accordion title="Health checks failing">
    Check component logs:

    ```bash theme={"dark"}
    kubectl -n xpander logs deployment/xpander-agent-controller
    kubectl -n xpander logs deployment/xpander-ai-gateway
    kubectl -n xpander logs deployment/xpander-agent-worker
    ```

    The Agent Controller needs to reach Xpander's outbound IPs on port 443. If egress is blocked, health checks fail.
  </Accordion>

  <Accordion title="LLM calls fail with missing API key">
    Verify the keys are reaching the agent-worker pod:

    ```bash theme={"dark"}
    kubectl -n xpander exec deployment/xpander-agent-worker -- env | grep API_KEY
    ```

    If missing, re-run the upgrade with `--set agent-worker.env.AGENTS_OPENAI_API_KEY=...` or check that your secret is correctly mounted through `envFromSecretKeys`.
  </Accordion>
</AccordionGroup>

## Uninstall

```bash theme={"dark"}
helm uninstall xpander --namespace xpander
kubectl delete namespace xpander
```

Deleting the namespace also removes PVCs and permanently deletes stored data. Back up PostgreSQL first if needed.

## What's next

<CardGroup cols={2}>
  <Card title="Monitor Runs" icon="chart-line" href="/guides/observability/threads">
    Trace execution, debug failures, and review AI performance
  </Card>

  <Card title="SDK Configuration" icon="code" href="/api-reference/configuration/self-hosted">
    Full SDK configuration reference for self-hosted deployments.
  </Card>
</CardGroup>
